linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] livepatch: allow removal of a disabled patch
@ 2016-05-02 11:57 Miroslav Benes
  2016-05-02 15:08 ` Josh Poimboeuf
  2016-05-03 21:37 ` Josh Poimboeuf
  0 siblings, 2 replies; 21+ messages in thread
From: Miroslav Benes @ 2016-05-02 11:57 UTC (permalink / raw)
  To: jpoimboe, jeyu, jikos, pmladek, jslaby
  Cc: live-patching, linux-kernel, huawei.libin, minfei.huang, Miroslav Benes

Currently we do not allow patch module to unload since there is no
method to determine if a task is still running in the patched code.

The consistency model gives us the way because when the patching
finishes we know that all tasks were marked as safe to call a new
patched function. Thus every new call to the function calls the new
patched code and at the same time no task can be somewhere in the old
code, because it had to leave that code to be marked as safe.

We can safely let the patch module go after that.

Completion is used for synchronization between module removal and sysfs
infrastructure. Note that we still do not allow the removal for
immediate model, that is no consistency model.

With this change a call to try_module_get() is moved to
__klp_enable_patch from klp_register_patch to make module reference
counting symmetric (module_put() is in a patch disable path) and to
allow to take a new reference to a disabled module when being enabled.

Also all kobject_put(&patch->kobj) calls are moved outside of klp_mutex
lock protection to prevent a deadlock situation when
klp_unregister_patch is called and sysfs directories are removed. There
is no need to do the same for other kobject_put() callsites as we
currently do not have their sysfs counterparts.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
Based on Josh's v2.

There are still two things which need to be discussed.

1. Do we really need a completion? If I am not missing something
kobject_del() always waits for sysfs callers to leave thanks to kernfs
active protection. The only problem is in kobject_release() when
CONFIG_DEBUG_KOBJECT_RELEASE=y. In this case kobject_delayed_cleanup()
is delay-scheduled, which is a problem for us. For this we definitely
need the completion, right? JiriS, is this correct?

2. I refuse to remove a module when patch->immediate is set or one of
its func->immediate is set. We can do slightly better, because we can
allow the module to go if patch->immediate is false and all func with
immediate set to true are from some object which is not loaded. This is
for discussion because it needs more changes in the code (I'd like to
call klp_is_object_loaded() which is somewhere else and so on).


 Documentation/livepatch/livepatch.txt | 29 ++++---------
 include/linux/livepatch.h             |  3 ++
 kernel/livepatch/core.c               | 80 ++++++++++++++++++++++-------------
 kernel/livepatch/transition.c         | 12 +++++-
 samples/livepatch/livepatch-sample.c  |  1 -
 5 files changed, 72 insertions(+), 53 deletions(-)

diff --git a/Documentation/livepatch/livepatch.txt b/Documentation/livepatch/livepatch.txt
index bee86d0fe854..a05124258a73 100644
--- a/Documentation/livepatch/livepatch.txt
+++ b/Documentation/livepatch/livepatch.txt
@@ -272,8 +272,15 @@ section "Livepatch life-cycle" below for more details about these
 two operations.
 
 Module removal is only safe when there are no users of the underlying
-functions.  The immediate consistency model is not able to detect this;
-therefore livepatch modules cannot be removed. See "Limitations" below.
+functions. The immediate consistency model is not able to detect this. The
+code just redirects the functions at the very beginning and it does not
+check if the functions are in use. In other words, it knows when the
+functions get called but it does not know when the functions return.
+Therefore it cannot be decided when the livepatch module can be safely
+removed. This is solved by a hybrid consistency model. When the system is
+transitioned to a new patch state (patched/unpatched) it is guaranteed that
+no task sleeps or runs in the old code.
+
 
 5. Livepatch life-cycle
 =======================
@@ -444,24 +451,6 @@ See Documentation/ABI/testing/sysfs-kernel-livepatch for more details.
     There is work in progress to remove this limitation.
 
 
-  + Livepatch modules can not be removed.
-
-    The current implementation just redirects the functions at the very
-    beginning. It does not check if the functions are in use. In other
-    words, it knows when the functions get called but it does not
-    know when the functions return. Therefore it can not decide when
-    the livepatch module can be safely removed.
-
-    This will get most likely solved once a more complex consistency model
-    is supported. The idea is that a safe state for patching should also
-    mean a safe state for removing the patch.
-
-    Note that the patch itself might get disabled by writing zero
-    to /sys/kernel/livepatch/<patch>/enabled. It causes that the new
-    code will not longer get called. But it does not guarantee
-    that anyone is not sleeping anywhere in the new code.
-
-
   + Livepatch works reliably only when the dynamic ftrace is located at
     the very beginning of the function.
 
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index cd2dfd95e109..a9651c6e1b52 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -23,6 +23,7 @@
 
 #include <linux/module.h>
 #include <linux/ftrace.h>
+#include <linux/completion.h>
 
 #if IS_ENABLED(CONFIG_LIVEPATCH)
 
@@ -114,6 +115,7 @@ struct klp_object {
  * @list:	list node for global list of registered patches
  * @kobj:	kobject for sysfs resources
  * @enabled:	the patch is enabled (but operation may be incomplete)
+ * @finish:	for waiting till it is safe to remove the patch module
  */
 struct klp_patch {
 	/* external */
@@ -125,6 +127,7 @@ struct klp_patch {
 	struct list_head list;
 	struct kobject kobj;
 	bool enabled;
+	struct completion finish;
 };
 
 #define klp_for_each_object(patch, obj) \
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index d55701222b49..a649186fb4af 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -29,6 +29,7 @@
 #include <linux/livepatch.h>
 #include <linux/elf.h>
 #include <linux/moduleloader.h>
+#include <linux/completion.h>
 #include <asm/cacheflush.h>
 #include "patch.h"
 #include "transition.h"
@@ -371,6 +372,18 @@ static int __klp_enable_patch(struct klp_patch *patch)
 	    !list_prev_entry(patch, list)->enabled)
 		return -EBUSY;
 
+	/*
+	* A reference is taken on the patch module to prevent it from being
+	* unloaded.
+	*
+	* Note: For immediate (no consistency model) patches we don't allow
+	* patch modules to unload since there is no safe/sane method to
+	* determine if a thread is still running in the patched code contained
+	* in the patch module once the ftrace registration is successful.
+	*/
+	if (!try_module_get(patch->mod))
+		return -ENODEV;
+
 	pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n");
 	add_taint(TAINT_LIVEPATCH, LOCKDEP_STILL_OK);
 
@@ -459,6 +472,15 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 	mutex_lock(&klp_mutex);
 
+	if (!klp_is_patch_registered(patch)) {
+		/*
+		 * Module with the patch could either disappear meanwhile or is
+		 * not properly initialized yet.
+		 */
+		ret = -EINVAL;
+		goto err;
+	}
+
 	if (patch->enabled == val) {
 		/* already in requested state */
 		ret = -EINVAL;
@@ -515,10 +537,10 @@ static struct attribute *klp_patch_attrs[] = {
 
 static void klp_kobj_release_patch(struct kobject *kobj)
 {
-	/*
-	 * Once we have a consistency model we'll need to module_put() the
-	 * patch module here.  See klp_register_patch() for more details.
-	 */
+	struct klp_patch *patch;
+
+	patch = container_of(kobj, struct klp_patch, kobj);
+	complete(&patch->finish);
 }
 
 static struct kobj_type klp_ktype_patch = {
@@ -589,7 +611,6 @@ static void klp_free_patch(struct klp_patch *patch)
 	klp_free_objects_limited(patch, NULL);
 	if (!list_empty(&patch->list))
 		list_del(&patch->list);
-	kobject_put(&patch->kobj);
 }
 
 static int klp_init_func(struct klp_object *obj, struct klp_func *func)
@@ -700,11 +721,14 @@ static int klp_init_patch(struct klp_patch *patch)
 	mutex_lock(&klp_mutex);
 
 	patch->enabled = false;
+	init_completion(&patch->finish);
 
 	ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
 				   klp_root_kobj, "%s", patch->mod->name);
-	if (ret)
-		goto unlock;
+	if (ret) {
+		mutex_unlock(&klp_mutex);
+		return ret;
+	}
 
 	klp_for_each_object(patch, obj) {
 		ret = klp_init_object(patch, obj);
@@ -720,9 +744,12 @@ static int klp_init_patch(struct klp_patch *patch)
 
 free:
 	klp_free_objects_limited(patch, obj);
-	kobject_put(&patch->kobj);
-unlock:
+
 	mutex_unlock(&klp_mutex);
+
+	kobject_put(&patch->kobj);
+	wait_for_completion(&patch->finish);
+
 	return ret;
 }
 
@@ -736,23 +763,29 @@ static int klp_init_patch(struct klp_patch *patch)
  */
 int klp_unregister_patch(struct klp_patch *patch)
 {
-	int ret = 0;
+	int ret;
 
 	mutex_lock(&klp_mutex);
 
 	if (!klp_is_patch_registered(patch)) {
 		ret = -EINVAL;
-		goto out;
+		goto err;
 	}
 
 	if (patch->enabled) {
 		ret = -EBUSY;
-		goto out;
+		goto err;
 	}
 
 	klp_free_patch(patch);
 
-out:
+	mutex_unlock(&klp_mutex);
+
+	kobject_put(&patch->kobj);
+	wait_for_completion(&patch->finish);
+
+	return 0;
+err:
 	mutex_unlock(&klp_mutex);
 	return ret;
 }
@@ -765,12 +798,13 @@ EXPORT_SYMBOL_GPL(klp_unregister_patch);
  * Initializes the data structure associated with the patch and
  * creates the sysfs interface.
  *
+ * There is no need to take the reference on the patch module here. It is done
+ * later when the patch is enabled.
+ *
  * Return: 0 on success, otherwise error
  */
 int klp_register_patch(struct klp_patch *patch)
 {
-	int ret;
-
 	if (!patch || !patch->mod)
 		return -EINVAL;
 
@@ -783,21 +817,7 @@ int klp_register_patch(struct klp_patch *patch)
 	if (!klp_initialized())
 		return -ENODEV;
 
-	/*
-	 * A reference is taken on the patch module to prevent it from being
-	 * unloaded.  Right now, we don't allow patch modules to unload since
-	 * there is currently no method to determine if a thread is still
-	 * running in the patched code contained in the patch module once
-	 * the ftrace registration is successful.
-	 */
-	if (!try_module_get(patch->mod))
-		return -ENODEV;
-
-	ret = klp_init_patch(patch);
-	if (ret)
-		module_put(patch->mod);
-
-	return ret;
+	return klp_init_patch(patch);
 }
 EXPORT_SYMBOL_GPL(klp_register_patch);
 
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 92819bb0961b..6cc49d253195 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -197,13 +197,21 @@ void klp_complete_transition(void)
 	struct klp_func *func;
 	struct task_struct *g, *task;
 	unsigned int cpu;
+	bool is_immediate = false;
 
 	if (klp_transition_patch->immediate)
 		goto done;
 
-	klp_for_each_object(klp_transition_patch, obj)
-		klp_for_each_func(obj, func)
+	klp_for_each_object(klp_transition_patch, obj) {
+		klp_for_each_func(obj, func) {
 			func->transition = false;
+			if (func->immediate)
+				is_immediate = true;
+		}
+	}
+
+	if (klp_target_state == KLP_UNPATCHED && !is_immediate)
+		module_put(klp_transition_patch->mod);
 
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, task) {
diff --git a/samples/livepatch/livepatch-sample.c b/samples/livepatch/livepatch-sample.c
index e34f871e69b1..a84676aa7c62 100644
--- a/samples/livepatch/livepatch-sample.c
+++ b/samples/livepatch/livepatch-sample.c
@@ -82,7 +82,6 @@ static int livepatch_init(void)
 
 static void livepatch_exit(void)
 {
-	WARN_ON(klp_disable_patch(&patch));
 	WARN_ON(klp_unregister_patch(&patch));
 }
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-02 11:57 [RFC PATCH] livepatch: allow removal of a disabled patch Miroslav Benes
@ 2016-05-02 15:08 ` Josh Poimboeuf
  2016-05-03  8:16   ` Miroslav Benes
  2016-05-03 21:37 ` Josh Poimboeuf
  1 sibling, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-02 15:08 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jeyu, jikos, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Mon, May 02, 2016 at 01:57:22PM +0200, Miroslav Benes wrote:
> Currently we do not allow patch module to unload since there is no
> method to determine if a task is still running in the patched code.
> 
> The consistency model gives us the way because when the patching
> finishes we know that all tasks were marked as safe to call a new
> patched function. Thus every new call to the function calls the new
> patched code and at the same time no task can be somewhere in the old
> code, because it had to leave that code to be marked as safe.
> 
> We can safely let the patch module go after that.

I found this a little confusing because it talks about patching, whereas
we really want to remove the patch module after _unpatching_ it.

> Completion is used for synchronization between module removal and sysfs
> infrastructure. Note that we still do not allow the removal for
> immediate model, that is no consistency model.
> 
> With this change a call to try_module_get() is moved to
> __klp_enable_patch from klp_register_patch to make module reference
> counting symmetric (module_put() is in a patch disable path) and to
> allow to take a new reference to a disabled module when being enabled.

One minor issue here: for patch->immediate, if somebody enabled and
disabled the patch several times, the module ref count would keep
increasing.  Probably not a big deal... maybe we don't need to worry
about it.

> Also all kobject_put(&patch->kobj) calls are moved outside of klp_mutex
> lock protection to prevent a deadlock situation when
> klp_unregister_patch is called and sysfs directories are removed. There
> is no need to do the same for other kobject_put() callsites as we
> currently do not have their sysfs counterparts.
> 
> Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> ---
> Based on Josh's v2.
> 
> There are still two things which need to be discussed.
> 
> 1. Do we really need a completion? If I am not missing something
> kobject_del() always waits for sysfs callers to leave thanks to kernfs
> active protection. The only problem is in kobject_release() when
> CONFIG_DEBUG_KOBJECT_RELEASE=y. In this case kobject_delayed_cleanup()
> is delay-scheduled, which is a problem for us. For this we definitely
> need the completion, right? JiriS, is this correct?

Sysfs and kobjects hurt my brain; I need to take a refresher course, so
I don't have an answer to this yet :-)  Maybe Jiri S can enlighten us.

> 2. I refuse to remove a module when patch->immediate is set or one of
> its func->immediate is set. We can do slightly better, because we can
> allow the module to go if patch->immediate is false and all func with
> immediate set to true are from some object which is not loaded. This is
> for discussion because it needs more changes in the code (I'd like to
> call klp_is_object_loaded() which is somewhere else and so on).

I don't think we need to worry about doing that unless somebody
complains about it.  It's kind of obscure and patch removal isn't very
important anyway...

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-02 15:08 ` Josh Poimboeuf
@ 2016-05-03  8:16   ` Miroslav Benes
  2016-05-31 23:13     ` Jiri Kosina
  0 siblings, 1 reply; 21+ messages in thread
From: Miroslav Benes @ 2016-05-03  8:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: jeyu, jikos, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Mon, 2 May 2016, Josh Poimboeuf wrote:

> On Mon, May 02, 2016 at 01:57:22PM +0200, Miroslav Benes wrote:
> > Currently we do not allow patch module to unload since there is no
> > method to determine if a task is still running in the patched code.
> > 
> > The consistency model gives us the way because when the patching
> > finishes we know that all tasks were marked as safe to call a new
> > patched function. Thus every new call to the function calls the new
> > patched code and at the same time no task can be somewhere in the old
> > code, because it had to leave that code to be marked as safe.
> > 
> > We can safely let the patch module go after that.
> 
> I found this a little confusing because it talks about patching, whereas
> we really want to remove the patch module after _unpatching_ it.

You're right. I'll rephrase that.

> > Completion is used for synchronization between module removal and sysfs
> > infrastructure. Note that we still do not allow the removal for
> > immediate model, that is no consistency model.
> > 
> > With this change a call to try_module_get() is moved to
> > __klp_enable_patch from klp_register_patch to make module reference
> > counting symmetric (module_put() is in a patch disable path) and to
> > allow to take a new reference to a disabled module when being enabled.
> 
> One minor issue here: for patch->immediate, if somebody enabled and
> disabled the patch several times, the module ref count would keep
> increasing.  Probably not a big deal... maybe we don't need to worry
> about it.

That is unfortunately true. I don't have a solution in my pocket which 
would be 100% reliable. At the same time I don't see a practical problem. 
Yes, refcount could overflow, but that shouldn't be a problem, should it? 
Anyway, I'll note it in the changelog.

> > Also all kobject_put(&patch->kobj) calls are moved outside of klp_mutex
> > lock protection to prevent a deadlock situation when
> > klp_unregister_patch is called and sysfs directories are removed. There
> > is no need to do the same for other kobject_put() callsites as we
> > currently do not have their sysfs counterparts.
> > 
> > Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> > ---
> > Based on Josh's v2.
> > 
> > There are still two things which need to be discussed.
> > 
> > 1. Do we really need a completion? If I am not missing something
> > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > active protection. The only problem is in kobject_release() when
> > CONFIG_DEBUG_KOBJECT_RELEASE=y. In this case kobject_delayed_cleanup()
> > is delay-scheduled, which is a problem for us. For this we definitely
> > need the completion, right? JiriS, is this correct?
> 
> Sysfs and kobjects hurt my brain; I need to take a refresher course, so
> I don't have an answer to this yet :-)  Maybe Jiri S can enlighten us.

:)

> > 2. I refuse to remove a module when patch->immediate is set or one of
> > its func->immediate is set. We can do slightly better, because we can
> > allow the module to go if patch->immediate is false and all func with
> > immediate set to true are from some object which is not loaded. This is
> > for discussion because it needs more changes in the code (I'd like to
> > call klp_is_object_loaded() which is somewhere else and so on).
> 
> I don't think we need to worry about doing that unless somebody
> complains about it.  It's kind of obscure and patch removal isn't very
> important anyway...

Agreed.

Thanks,
Miroslav

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-02 11:57 [RFC PATCH] livepatch: allow removal of a disabled patch Miroslav Benes
  2016-05-02 15:08 ` Josh Poimboeuf
@ 2016-05-03 21:37 ` Josh Poimboeuf
  2016-05-03 22:31   ` Jiri Kosina
  1 sibling, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-03 21:37 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jeyu, jikos, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Mon, May 02, 2016 at 01:57:22PM +0200, Miroslav Benes wrote:
> 1. Do we really need a completion? If I am not missing something
> kobject_del() always waits for sysfs callers to leave thanks to kernfs
> active protection.

What do you mean by "kernfs active protection"?  I see that
kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
that a write to a sysfs file uses that lock.

I'm probably missing something...

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-03 21:37 ` Josh Poimboeuf
@ 2016-05-03 22:31   ` Jiri Kosina
  2016-05-04  2:39     ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Kosina @ 2016-05-03 22:31 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Miroslav Benes, jeyu, pmladek, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

On Tue, 3 May 2016, Josh Poimboeuf wrote:

> > 1. Do we really need a completion? If I am not missing something
> > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > active protection.
> 
> What do you mean by "kernfs active protection"?  I see that
> kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> that a write to a sysfs file uses that lock.
> 
> I'm probably missing something...

I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
he has on mind is per-kernfs_node active refcounting kernfs does (see 
kernfs_node->active, and especially it's usage in __kernfs_remove()).

More specifically, execution of store() and show() sysfs callbacks is 
guaranteed (by kernfs) to happen with that particular attribute's active 
reference held for reading (and that makes it impossible for that 
attribute to vanish prematurely).

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-03 22:31   ` Jiri Kosina
@ 2016-05-04  2:39     ` Josh Poimboeuf
  2016-05-04  3:36       ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-04  2:39 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Miroslav Benes, jeyu, pmladek, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> On Tue, 3 May 2016, Josh Poimboeuf wrote:
> 
> > > 1. Do we really need a completion? If I am not missing something
> > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > active protection.
> > 
> > What do you mean by "kernfs active protection"?  I see that
> > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > that a write to a sysfs file uses that lock.
> > 
> > I'm probably missing something...
> 
> I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> he has on mind is per-kernfs_node active refcounting kernfs does (see 
> kernfs_node->active, and especially it's usage in __kernfs_remove()).
> 
> More specifically, execution of store() and show() sysfs callbacks is 
> guaranteed (by kernfs) to happen with that particular attribute's active 
> reference held for reading (and that makes it impossible for that 
> attribute to vanish prematurely).

Thanks, that makes sense.

So what exactly is the problem the completion is trying to solve?  Is it
to ensure that the kobject has been cleaned up before it returns to the
caller, in case the user wants to call klp_register() again after
unregistering?

If so, that's quite an unusual use case which I think we should just
consider unsupported.  In fact, if you try to do it, kobject_init()
complains loudly because kobj->state_initialized is still 1 because
kobjects aren't meant to be reused like that.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-04  2:39     ` Josh Poimboeuf
@ 2016-05-04  3:36       ` Josh Poimboeuf
  2016-05-04 11:58         ` Miroslav Benes
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-04  3:36 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Miroslav Benes, jeyu, pmladek, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > 
> > > > 1. Do we really need a completion? If I am not missing something
> > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > active protection.
> > > 
> > > What do you mean by "kernfs active protection"?  I see that
> > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > that a write to a sysfs file uses that lock.
> > > 
> > > I'm probably missing something...
> > 
> > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> > he has on mind is per-kernfs_node active refcounting kernfs does (see 
> > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > 
> > More specifically, execution of store() and show() sysfs callbacks is 
> > guaranteed (by kernfs) to happen with that particular attribute's active 
> > reference held for reading (and that makes it impossible for that 
> > attribute to vanish prematurely).
> 
> Thanks, that makes sense.
> 
> So what exactly is the problem the completion is trying to solve?  Is it
> to ensure that the kobject has been cleaned up before it returns to the
> caller, in case the user wants to call klp_register() again after
> unregistering?
> 
> If so, that's quite an unusual use case which I think we should just
> consider unsupported.  In fact, if you try to do it, kobject_init()
> complains loudly because kobj->state_initialized is still 1 because
> kobjects aren't meant to be reused like that.

... and now I realize the point is actually to prevent the caller from
freeing klp_patch before kobject_cleanup() runs.

So yeah, it looks like we need the completion in case
CONFIG_DEBUG_KOBJECT_RELEASE is enabled.

Or alternatively we could convert patch->kobj to be dynamically
allocated instead of embedded in klp_patch.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-04  3:36       ` Josh Poimboeuf
@ 2016-05-04 11:58         ` Miroslav Benes
  2016-05-04 13:14           ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Miroslav Benes @ 2016-05-04 11:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Tue, 3 May 2016, Josh Poimboeuf wrote:

> On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > 
> > > > > 1. Do we really need a completion? If I am not missing something
> > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > > active protection.
> > > > 
> > > > What do you mean by "kernfs active protection"?  I see that
> > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > > that a write to a sysfs file uses that lock.
> > > > 
> > > > I'm probably missing something...
> > > 
> > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> > > he has on mind is per-kernfs_node active refcounting kernfs does (see 
> > > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > > 
> > > More specifically, execution of store() and show() sysfs callbacks is 
> > > guaranteed (by kernfs) to happen with that particular attribute's active 
> > > reference held for reading (and that makes it impossible for that 
> > > attribute to vanish prematurely).
> > 
> > Thanks, that makes sense.
> > 
> > So what exactly is the problem the completion is trying to solve?  Is it
> > to ensure that the kobject has been cleaned up before it returns to the
> > caller, in case the user wants to call klp_register() again after
> > unregistering?
> > 
> > If so, that's quite an unusual use case which I think we should just
> > consider unsupported.  In fact, if you try to do it, kobject_init()
> > complains loudly because kobj->state_initialized is still 1 because
> > kobjects aren't meant to be reused like that.
> 
> ... and now I realize the point is actually to prevent the caller from
> freeing klp_patch before kobject_cleanup() runs.

Exactly. Sorry I was so brief.

> So yeah, it looks like we need the completion in case
> CONFIG_DEBUG_KOBJECT_RELEASE is enabled.
> 
> Or alternatively we could convert patch->kobj to be dynamically
> allocated instead of embedded in klp_patch.

But that wouldn't help, would it? Dynamic kobjects registers generic 
release function dynamic_kobj_release() and that's it. We're in the same 
situation. I have got a feeling that dynamic kobjects are only for trivial 
cases.

Moreover we use container_of() several times in the code and that does not
work with dynamically allocated kobjects.

Anyway I am really confused now. When I read changelog of c817a67ecba7 
("kobject: delayed kobject release: help find buggy drivers") all makes 
perfect sense. But isn't our situation somewhat special, because we have 
refcounts completely under control? So we know that once we call 
kobject_put() we can let a patch go... I must be missing something.

It does not make sense to introduce completion just to satisfy a feature 
which was introduced to debug general cases.

Miroslav

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-04 11:58         ` Miroslav Benes
@ 2016-05-04 13:14           ` Josh Poimboeuf
  2016-05-04 14:35             ` Miroslav Benes
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-04 13:14 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Wed, May 04, 2016 at 01:58:47PM +0200, Miroslav Benes wrote:
> On Tue, 3 May 2016, Josh Poimboeuf wrote:
> 
> > On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> > > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > > 
> > > > > > 1. Do we really need a completion? If I am not missing something
> > > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > > > active protection.
> > > > > 
> > > > > What do you mean by "kernfs active protection"?  I see that
> > > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > > > that a write to a sysfs file uses that lock.
> > > > > 
> > > > > I'm probably missing something...
> > > > 
> > > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> > > > he has on mind is per-kernfs_node active refcounting kernfs does (see 
> > > > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > > > 
> > > > More specifically, execution of store() and show() sysfs callbacks is 
> > > > guaranteed (by kernfs) to happen with that particular attribute's active 
> > > > reference held for reading (and that makes it impossible for that 
> > > > attribute to vanish prematurely).
> > > 
> > > Thanks, that makes sense.
> > > 
> > > So what exactly is the problem the completion is trying to solve?  Is it
> > > to ensure that the kobject has been cleaned up before it returns to the
> > > caller, in case the user wants to call klp_register() again after
> > > unregistering?
> > > 
> > > If so, that's quite an unusual use case which I think we should just
> > > consider unsupported.  In fact, if you try to do it, kobject_init()
> > > complains loudly because kobj->state_initialized is still 1 because
> > > kobjects aren't meant to be reused like that.
> > 
> > ... and now I realize the point is actually to prevent the caller from
> > freeing klp_patch before kobject_cleanup() runs.
> 
> Exactly. Sorry I was so brief.
> 
> > So yeah, it looks like we need the completion in case
> > CONFIG_DEBUG_KOBJECT_RELEASE is enabled.
> > 
> > Or alternatively we could convert patch->kobj to be dynamically
> > allocated instead of embedded in klp_patch.
> 
> But that wouldn't help, would it? Dynamic kobjects registers generic 
> release function dynamic_kobj_release() and that's it. We're in the same 
> situation. I have got a feeling that dynamic kobjects are only for trivial 
> cases.

But the patch release doesn't need to do anything, right?

> Moreover we use container_of() several times in the code and that does not
> work with dynamically allocated kobjects.
> 
> Anyway I am really confused now. When I read changelog of c817a67ecba7 
> ("kobject: delayed kobject release: help find buggy drivers") all makes 
> perfect sense. But isn't our situation somewhat special, because we have 
> refcounts completely under control? So we know that once we call 
> kobject_put() we can let a patch go... I must be missing something.
> 
> It does not make sense to introduce completion just to satisfy a feature 
> which was introduced to debug general cases.

I think our situation is "special" because klp_patch and its embedded
kobject have separate lifetimes.  We have a kobject, which we own, which
is embedded in a klp_patch struct, which we don't own.

If I understand correctly, normally when you release a kobject, the
containing struct gets freed.  But we can't do that here because the
caller allocated the klp_patch.

So I really get the feeling that a dynamic kobject would be more
appropriate here.

That said, the sysfs and kobject stuff always throws me for a loop.  So
take what I'm saying with several grains of salt.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-04 13:14           ` Josh Poimboeuf
@ 2016-05-04 14:35             ` Miroslav Benes
  2016-05-04 16:14               ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Miroslav Benes @ 2016-05-04 14:35 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Wed, 4 May 2016, Josh Poimboeuf wrote:

> On Wed, May 04, 2016 at 01:58:47PM +0200, Miroslav Benes wrote:
> > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > 
> > > On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> > > > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > > > 
> > > > > > > 1. Do we really need a completion? If I am not missing something
> > > > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > > > > active protection.
> > > > > > 
> > > > > > What do you mean by "kernfs active protection"?  I see that
> > > > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > > > > that a write to a sysfs file uses that lock.
> > > > > > 
> > > > > > I'm probably missing something...
> > > > > 
> > > > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> > > > > he has on mind is per-kernfs_node active refcounting kernfs does (see 
> > > > > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > > > > 
> > > > > More specifically, execution of store() and show() sysfs callbacks is 
> > > > > guaranteed (by kernfs) to happen with that particular attribute's active 
> > > > > reference held for reading (and that makes it impossible for that 
> > > > > attribute to vanish prematurely).
> > > > 
> > > > Thanks, that makes sense.
> > > > 
> > > > So what exactly is the problem the completion is trying to solve?  Is it
> > > > to ensure that the kobject has been cleaned up before it returns to the
> > > > caller, in case the user wants to call klp_register() again after
> > > > unregistering?
> > > > 
> > > > If so, that's quite an unusual use case which I think we should just
> > > > consider unsupported.  In fact, if you try to do it, kobject_init()
> > > > complains loudly because kobj->state_initialized is still 1 because
> > > > kobjects aren't meant to be reused like that.
> > > 
> > > ... and now I realize the point is actually to prevent the caller from
> > > freeing klp_patch before kobject_cleanup() runs.
> > 
> > Exactly. Sorry I was so brief.
> > 
> > > So yeah, it looks like we need the completion in case
> > > CONFIG_DEBUG_KOBJECT_RELEASE is enabled.
> > > 
> > > Or alternatively we could convert patch->kobj to be dynamically
> > > allocated instead of embedded in klp_patch.
> > 
> > But that wouldn't help, would it? Dynamic kobjects registers generic 
> > release function dynamic_kobj_release() and that's it. We're in the same 
> > situation. I have got a feeling that dynamic kobjects are only for trivial 
> > cases.
> 
> But the patch release doesn't need to do anything, right?

That is correct. I wanted to point out that dynamic_kobj_release() did not 
really solve our "completion" issue. If there is a problem in our code and 
we need completion, dynamic kobjects would not help. If we don't need a 
completion at all we could move to dynamic kobjects.

There is still container_of() though.

> > Moreover we use container_of() several times in the code and that does not
> > work with dynamically allocated kobjects.
> > 
> > Anyway I am really confused now. When I read changelog of c817a67ecba7 
> > ("kobject: delayed kobject release: help find buggy drivers") all makes 
> > perfect sense. But isn't our situation somewhat special, because we have 
> > refcounts completely under control? So we know that once we call 
> > kobject_put() we can let a patch go... I must be missing something.
> > 
> > It does not make sense to introduce completion just to satisfy a feature 
> > which was introduced to debug general cases.
> 
> I think our situation is "special" because klp_patch and its embedded
> kobject have separate lifetimes.  We have a kobject, which we own, which
> is embedded in a klp_patch struct, which we don't own.
> 
> If I understand correctly, normally when you release a kobject, the
> containing struct gets freed.  But we can't do that here because the
> caller allocated the klp_patch.

Normally only struct kobject is freed, no? 

>From Documentation/kobject.txt:

"One important point cannot be overstated: every kobject must have a
release() method, and the kobject must persist (in a consistent state)
until that method is called. If these constraints are not met, the code is
flawed."

So we need to only make sure that klp_patch does not disappear before 
calling release() method. In our case that cannot happen even without 
completion, because we call kobject_put() in klp_unregister_patch() when 
the patch module is going and kobject_put() calls our release (potentially 
empty) method in a "sync" way. If I read that code correctly.

This does not hold only if CONFIG_DEBUG_KOBJECT_RELEASE=y.

> So I really get the feeling that a dynamic kobject would be more
> appropriate here.

See above.

> That said, the sysfs and kobject stuff always throws me for a loop.  So
> take what I'm saying with several grains of salt.

Tell me about it.

Miroslav

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-04 14:35             ` Miroslav Benes
@ 2016-05-04 16:14               ` Josh Poimboeuf
  2016-05-05  8:28                 ` Miroslav Benes
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-04 16:14 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Wed, May 04, 2016 at 04:35:28PM +0200, Miroslav Benes wrote:
> On Wed, 4 May 2016, Josh Poimboeuf wrote:
> 
> > On Wed, May 04, 2016 at 01:58:47PM +0200, Miroslav Benes wrote:
> > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > 
> > > > On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> > > > > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > > > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > > > > 
> > > > > > > > 1. Do we really need a completion? If I am not missing something
> > > > > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > > > > > active protection.
> > > > > > > 
> > > > > > > What do you mean by "kernfs active protection"?  I see that
> > > > > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > > > > > that a write to a sysfs file uses that lock.
> > > > > > > 
> > > > > > > I'm probably missing something...
> > > > > > 
> > > > > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> > > > > > he has on mind is per-kernfs_node active refcounting kernfs does (see 
> > > > > > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > > > > > 
> > > > > > More specifically, execution of store() and show() sysfs callbacks is 
> > > > > > guaranteed (by kernfs) to happen with that particular attribute's active 
> > > > > > reference held for reading (and that makes it impossible for that 
> > > > > > attribute to vanish prematurely).
> > > > > 
> > > > > Thanks, that makes sense.
> > > > > 
> > > > > So what exactly is the problem the completion is trying to solve?  Is it
> > > > > to ensure that the kobject has been cleaned up before it returns to the
> > > > > caller, in case the user wants to call klp_register() again after
> > > > > unregistering?
> > > > > 
> > > > > If so, that's quite an unusual use case which I think we should just
> > > > > consider unsupported.  In fact, if you try to do it, kobject_init()
> > > > > complains loudly because kobj->state_initialized is still 1 because
> > > > > kobjects aren't meant to be reused like that.
> > > > 
> > > > ... and now I realize the point is actually to prevent the caller from
> > > > freeing klp_patch before kobject_cleanup() runs.
> > > 
> > > Exactly. Sorry I was so brief.
> > > 
> > > > So yeah, it looks like we need the completion in case
> > > > CONFIG_DEBUG_KOBJECT_RELEASE is enabled.
> > > > 
> > > > Or alternatively we could convert patch->kobj to be dynamically
> > > > allocated instead of embedded in klp_patch.
> > > 
> > > But that wouldn't help, would it? Dynamic kobjects registers generic 
> > > release function dynamic_kobj_release() and that's it. We're in the same 
> > > situation. I have got a feeling that dynamic kobjects are only for trivial 
> > > cases.
> > 
> > But the patch release doesn't need to do anything, right?
> 
> That is correct. I wanted to point out that dynamic_kobj_release() did not 
> really solve our "completion" issue. If there is a problem in our code and 
> we need completion, dynamic kobjects would not help. If we don't need a 
> completion at all we could move to dynamic kobjects.

Why would we need a completion if we had a dynamic kobject?  The
lifetime of the kobject no longer matters if it's separated from the
klp_patch struct.  The user can go ahead and free the klp_patch before
the kobject gets freed, no problem.

> There is still container_of() though.

Yes, but we could still find the corresponding klp_patch by searching
through the klp_patches list.  It's not ideal, but IMO it would be
better than needing the completion to account for our "special"
situation.

> > > Moreover we use container_of() several times in the code and that does not
> > > work with dynamically allocated kobjects.
> > > 
> > > Anyway I am really confused now. When I read changelog of c817a67ecba7 
> > > ("kobject: delayed kobject release: help find buggy drivers") all makes 
> > > perfect sense. But isn't our situation somewhat special, because we have 
> > > refcounts completely under control? So we know that once we call 
> > > kobject_put() we can let a patch go... I must be missing something.
> > > 
> > > It does not make sense to introduce completion just to satisfy a feature 
> > > which was introduced to debug general cases.
> > 
> > I think our situation is "special" because klp_patch and its embedded
> > kobject have separate lifetimes.  We have a kobject, which we own, which
> > is embedded in a klp_patch struct, which we don't own.
> > 
> > If I understand correctly, normally when you release a kobject, the
> > containing struct gets freed.  But we can't do that here because the
> > caller allocated the klp_patch.
> 
> Normally only struct kobject is freed, no? 
> 
> From Documentation/kobject.txt:
> 
> "One important point cannot be overstated: every kobject must have a
> release() method, and the kobject must persist (in a consistent state)
> until that method is called. If these constraints are not met, the code is
> flawed."
> 
> So we need to only make sure that klp_patch does not disappear before 
> calling release() method. In our case that cannot happen even without 
> completion, because we call kobject_put() in klp_unregister_patch() when 
> the patch module is going and kobject_put() calls our release (potentially 
> empty) method in a "sync" way. If I read that code correctly.
> 
> This does not hold only if CONFIG_DEBUG_KOBJECT_RELEASE=y.

Hm, maybe I'm missing your point, or maybe you're missing mine.  Maybe
both :-)

I understand why your patch has a completion.  And I understand that
it's needed because of CONFIG_DEBUG_KOBJECT_RELEASE.  But even then I
think it's only needed *if* the kobject is embedded in the klp_patch
struct.

In my experience, usually the point of having a kobject release function
is so you can kfree the kobject's containing struct.  That doesn't
conflict with your quote from kobject.txt.

But in our "special" code, klp_patch (and thus the kobject itself) is
allocated by the caller.  But its embedded kobject is initialized by
livepatch code.  In my experience, it's highly unusual to have the
kobject allocated by one party and initialized by another.

That's why I think it would be much more logical to turn patch->obj into
a pointer.  Then we can allocate it *and* initialize it, so that we can
100% control its lifetime and not have to worry about the low-level
details of how kobject_del() works, completions,
CONFIG_DEBUG_KOBJECT_RELEASE, etc.  And we would be more in line with
the principles of kobject, as I understand them ;-)

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-04 16:14               ` Josh Poimboeuf
@ 2016-05-05  8:28                 ` Miroslav Benes
  2016-05-05 13:27                   ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Miroslav Benes @ 2016-05-05  8:28 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Wed, 4 May 2016, Josh Poimboeuf wrote:

> On Wed, May 04, 2016 at 04:35:28PM +0200, Miroslav Benes wrote:
> > On Wed, 4 May 2016, Josh Poimboeuf wrote:
> > 
> > > On Wed, May 04, 2016 at 01:58:47PM +0200, Miroslav Benes wrote:
> > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > > 
> > > > > On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> > > > > > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > > > > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > > > > > 
> > > > > > > > > 1. Do we really need a completion? If I am not missing something
> > > > > > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > > > > > > active protection.
> > > > > > > > 
> > > > > > > > What do you mean by "kernfs active protection"?  I see that
> > > > > > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > > > > > > that a write to a sysfs file uses that lock.
> > > > > > > > 
> > > > > > > > I'm probably missing something...
> > > > > > > 
> > > > > > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what 
> > > > > > > he has on mind is per-kernfs_node active refcounting kernfs does (see 
> > > > > > > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > > > > > > 
> > > > > > > More specifically, execution of store() and show() sysfs callbacks is 
> > > > > > > guaranteed (by kernfs) to happen with that particular attribute's active 
> > > > > > > reference held for reading (and that makes it impossible for that 
> > > > > > > attribute to vanish prematurely).
> > > > > > 
> > > > > > Thanks, that makes sense.
> > > > > > 
> > > > > > So what exactly is the problem the completion is trying to solve?  Is it
> > > > > > to ensure that the kobject has been cleaned up before it returns to the
> > > > > > caller, in case the user wants to call klp_register() again after
> > > > > > unregistering?
> > > > > > 
> > > > > > If so, that's quite an unusual use case which I think we should just
> > > > > > consider unsupported.  In fact, if you try to do it, kobject_init()
> > > > > > complains loudly because kobj->state_initialized is still 1 because
> > > > > > kobjects aren't meant to be reused like that.
> > > > > 
> > > > > ... and now I realize the point is actually to prevent the caller from
> > > > > freeing klp_patch before kobject_cleanup() runs.
> > > > 
> > > > Exactly. Sorry I was so brief.
> > > > 
> > > > > So yeah, it looks like we need the completion in case
> > > > > CONFIG_DEBUG_KOBJECT_RELEASE is enabled.
> > > > > 
> > > > > Or alternatively we could convert patch->kobj to be dynamically
> > > > > allocated instead of embedded in klp_patch.
> > > > 
> > > > But that wouldn't help, would it? Dynamic kobjects registers generic 
> > > > release function dynamic_kobj_release() and that's it. We're in the same 
> > > > situation. I have got a feeling that dynamic kobjects are only for trivial 
> > > > cases.
> > > 
> > > But the patch release doesn't need to do anything, right?
> > 
> > That is correct. I wanted to point out that dynamic_kobj_release() did not 
> > really solve our "completion" issue. If there is a problem in our code and 
> > we need completion, dynamic kobjects would not help. If we don't need a 
> > completion at all we could move to dynamic kobjects.
> 
> Why would we need a completion if we had a dynamic kobject?  The
> lifetime of the kobject no longer matters if it's separated from the
> klp_patch struct.  The user can go ahead and free the klp_patch before
> the kobject gets freed, no problem.

See below.

> > There is still container_of() though.
> 
> Yes, but we could still find the corresponding klp_patch by searching
> through the klp_patches list.  It's not ideal, but IMO it would be
> better than needing the completion to account for our "special"
> situation.

That would be a solution.

> > > > Moreover we use container_of() several times in the code and that does not
> > > > work with dynamically allocated kobjects.
> > > > 
> > > > Anyway I am really confused now. When I read changelog of c817a67ecba7 
> > > > ("kobject: delayed kobject release: help find buggy drivers") all makes 
> > > > perfect sense. But isn't our situation somewhat special, because we have 
> > > > refcounts completely under control? So we know that once we call 
> > > > kobject_put() we can let a patch go... I must be missing something.
> > > > 
> > > > It does not make sense to introduce completion just to satisfy a feature 
> > > > which was introduced to debug general cases.
> > > 
> > > I think our situation is "special" because klp_patch and its embedded
> > > kobject have separate lifetimes.  We have a kobject, which we own, which
> > > is embedded in a klp_patch struct, which we don't own.
> > > 
> > > If I understand correctly, normally when you release a kobject, the
> > > containing struct gets freed.  But we can't do that here because the
> > > caller allocated the klp_patch.
> > 
> > Normally only struct kobject is freed, no? 
> > 
> > From Documentation/kobject.txt:
> > 
> > "One important point cannot be overstated: every kobject must have a
> > release() method, and the kobject must persist (in a consistent state)
> > until that method is called. If these constraints are not met, the code is
> > flawed."
> > 
> > So we need to only make sure that klp_patch does not disappear before 
> > calling release() method. In our case that cannot happen even without 
> > completion, because we call kobject_put() in klp_unregister_patch() when 
> > the patch module is going and kobject_put() calls our release (potentially 
> > empty) method in a "sync" way. If I read that code correctly.
> > 
> > This does not hold only if CONFIG_DEBUG_KOBJECT_RELEASE=y.
> 
> Hm, maybe I'm missing your point, or maybe you're missing mine.  Maybe
> both :-)
> 
> I understand why your patch has a completion.  And I understand that
> it's needed because of CONFIG_DEBUG_KOBJECT_RELEASE.  But even then I
> think it's only needed *if* the kobject is embedded in the klp_patch
> struct.
> 
> In my experience, usually the point of having a kobject release function
> is so you can kfree the kobject's containing struct.  That doesn't
> conflict with your quote from kobject.txt.
> 
> But in our "special" code, klp_patch (and thus the kobject itself) is
> allocated by the caller.  But its embedded kobject is initialized by
> livepatch code.  In my experience, it's highly unusual to have the
> kobject allocated by one party and initialized by another.
> 
> That's why I think it would be much more logical to turn patch->obj into
> a pointer.  Then we can allocate it *and* initialize it, so that we can
> 100% control its lifetime and not have to worry about the low-level
> details of how kobject_del() works, completions,
> CONFIG_DEBUG_KOBJECT_RELEASE, etc.  And we would be more in line with
> the principles of kobject, as I understand them ;-)

I think it boils down to the following problem.

1. CONFIG_DEBUG_KOBJECT_RELEASE=y

2. we have dynamic kobjects, so there is a pointer in klp_patch to struct 
kobject

3. it is allocated during klp_init_patch() and all is fine

4. now we want to remove the patch module. It is disabled and module_put()
is called. User calls rmmod on the module.

5. klp_unregister_patch() is called in __exit method.

6. klp_free_patch() is called.

7. kobject_put(patch->kobj) is called.

...now it gets interesting...

8. among others kobject_cleanup() is scheduled as a delayed work (this is 
important).

9. there is no completion, so kobject_put returns and the module goes 
away.

10. someone calls patch enabled_store attribute (for example). They can 
because kobject_cleanup() has not been called yet. It is delayed 
scheduled.

...crash...

Is this scenario possible? I think it is. And I think it is possible 
regardless dynamic/static allocation of the kobject in klp_patch (static 
allocation scenario was my concern at the beginning of this thread).

I think it is not possible when CONFIG_DEBUG_KOBJECT_RELEASE=n. Again 
regardless of dynamic/static situation.

So if it is possible we need the completion just because of 
CONFIG_DEBUG_KOBJECT_RELEASE. Which is unfortunate.

Does all this make sense now?

Miroslav

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-05  8:28                 ` Miroslav Benes
@ 2016-05-05 13:27                   ` Josh Poimboeuf
  2016-05-05 14:25                     ` Miroslav Benes
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-05 13:27 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
> I think it boils down to the following problem.
> 
> 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
> 
> 2. we have dynamic kobjects, so there is a pointer in klp_patch to struct 
> kobject
> 
> 3. it is allocated during klp_init_patch() and all is fine
> 
> 4. now we want to remove the patch module. It is disabled and module_put()
> is called. User calls rmmod on the module.
> 
> 5. klp_unregister_patch() is called in __exit method.
> 
> 6. klp_free_patch() is called.
> 
> 7. kobject_put(patch->kobj) is called.
> 
> ...now it gets interesting...
> 
> 8. among others kobject_cleanup() is scheduled as a delayed work (this is 
> important).
> 
> 9. there is no completion, so kobject_put returns and the module goes 
> away.
> 
> 10. someone calls patch enabled_store attribute (for example). They can 
> because kobject_cleanup() has not been called yet. It is delayed 
> scheduled.
> 
> ...crash...

But what exactly causes the crash?  In enabled_store() we can see that
the patch isn't in the list, so we can return -EINVAL.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-05 13:27                   ` Josh Poimboeuf
@ 2016-05-05 14:25                     ` Miroslav Benes
  2016-05-05 15:04                       ` Josh Poimboeuf
  0 siblings, 1 reply; 21+ messages in thread
From: Miroslav Benes @ 2016-05-05 14:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Thu, 5 May 2016, Josh Poimboeuf wrote:

> On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
> > I think it boils down to the following problem.
> > 
> > 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
> > 
> > 2. we have dynamic kobjects, so there is a pointer in klp_patch to struct 
> > kobject
> > 
> > 3. it is allocated during klp_init_patch() and all is fine
> > 
> > 4. now we want to remove the patch module. It is disabled and module_put()
> > is called. User calls rmmod on the module.
> > 
> > 5. klp_unregister_patch() is called in __exit method.
> > 
> > 6. klp_free_patch() is called.
> > 
> > 7. kobject_put(patch->kobj) is called.
> > 
> > ...now it gets interesting...
> > 
> > 8. among others kobject_cleanup() is scheduled as a delayed work (this is 
> > important).
> > 
> > 9. there is no completion, so kobject_put returns and the module goes 
> > away.
> > 
> > 10. someone calls patch enabled_store attribute (for example). They can 
> > because kobject_cleanup() has not been called yet. It is delayed 
> > scheduled.
> > 
> > ...crash...
> 
> But what exactly causes the crash?  In enabled_store() we can see that
> the patch isn't in the list, so we can return -EINVAL.

Ok, bad example. Take enabled_show() instead. It could be fixed in the 
same way, but I am not sure it is the right thing to do. It does not scale 
because the problem is elsewhere.

Anyway, it is (even if theoretically) there in my opinion and we 
have two options.

1. We could forget about CONFIG_DEBUG_KOBJECT_RELEASE and all is ok 
without completion and regardless of dynamic/static kobject allocation.

2. We introduce completion and we are ok even with 
CONFIG_DEBUG_KOBJECT_RELEASE=y and again regardless of dynamic/static 
kobject allocation.

Miroslav

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-05 14:25                     ` Miroslav Benes
@ 2016-05-05 15:04                       ` Josh Poimboeuf
  2016-05-05 21:08                         ` Jiri Kosina
  2016-05-06  0:42                         ` Jessica Yu
  0 siblings, 2 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2016-05-05 15:04 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Jiri Kosina, jeyu, pmladek, jslaby, live-patching, linux-kernel,
	huawei.libin, minfei.huang

On Thu, May 05, 2016 at 04:25:48PM +0200, Miroslav Benes wrote:
> On Thu, 5 May 2016, Josh Poimboeuf wrote:
> 
> > On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
> > > I think it boils down to the following problem.
> > > 
> > > 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > 
> > > 2. we have dynamic kobjects, so there is a pointer in klp_patch to struct 
> > > kobject
> > > 
> > > 3. it is allocated during klp_init_patch() and all is fine
> > > 
> > > 4. now we want to remove the patch module. It is disabled and module_put()
> > > is called. User calls rmmod on the module.
> > > 
> > > 5. klp_unregister_patch() is called in __exit method.
> > > 
> > > 6. klp_free_patch() is called.
> > > 
> > > 7. kobject_put(patch->kobj) is called.
> > > 
> > > ...now it gets interesting...
> > > 
> > > 8. among others kobject_cleanup() is scheduled as a delayed work (this is 
> > > important).
> > > 
> > > 9. there is no completion, so kobject_put returns and the module goes 
> > > away.
> > > 
> > > 10. someone calls patch enabled_store attribute (for example). They can 
> > > because kobject_cleanup() has not been called yet. It is delayed 
> > > scheduled.
> > > 
> > > ...crash...
> > 
> > But what exactly causes the crash?  In enabled_store() we can see that
> > the patch isn't in the list, so we can return -EINVAL.
> 
> Ok, bad example. Take enabled_show() instead. It could be fixed in the 
> same way, but I am not sure it is the right thing to do. It does not scale 
> because the problem is elsewhere.
>
> Anyway, it is (even if theoretically) there in my opinion and we 
> have two options.
> 
> 1. We could forget about CONFIG_DEBUG_KOBJECT_RELEASE and all is ok 
> without completion and regardless of dynamic/static kobject allocation.
> 
> 2. We introduce completion and we are ok even with 
> CONFIG_DEBUG_KOBJECT_RELEASE=y and again regardless of dynamic/static 
> kobject allocation.

I would disagree with the statement that the dynamic kobject doesn't
scale.  We would just need a helper function to get from a kobject to
its klp_patch.

In fact, to me it seems like the right way to do it.  It doesn't make
sense for the code which creates the kobject to be different from the
code which initializes it.  It's slightly out of context, but
kobject.txt does say:

  "Code which creates a kobject must, of course, initialize that object."

I view the completion as a hack to compensate for the fact that we're
abusing the kobject interface.  And so it makes sense to me that
CONFIG_DEBUG_KOBJECT_RELEASE would cause problems, because we're using
kobjects in the wrong way.

So in my view, the two options are:

1. Convert the kobject to dynamic as I described.

2. Change the klp_register() interface so that klp_patch gets allocated
   in livepatch code.

I'd be curious to hear what others think.

-- 
Josh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-05 15:04                       ` Josh Poimboeuf
@ 2016-05-05 21:08                         ` Jiri Kosina
  2016-05-06  0:42                         ` Jessica Yu
  1 sibling, 0 replies; 21+ messages in thread
From: Jiri Kosina @ 2016-05-05 21:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Miroslav Benes, jeyu, pmladek, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

On Thu, 5 May 2016, Josh Poimboeuf wrote:

> I would disagree with the statement that the dynamic kobject doesn't
> scale.  We would just need a helper function to get from a kobject to
> its klp_patch.
>
> In fact, to me it seems like the right way to do it.  It doesn't make
> sense for the code which creates the kobject to be different from the
> code which initializes it.  It's slightly out of context, but
> kobject.txt does say:
> 
>   "Code which creates a kobject must, of course, initialize that object."
> 
> I view the completion as a hack to compensate for the fact that we're
> abusing the kobject interface.  And so it makes sense to me that
> CONFIG_DEBUG_KOBJECT_RELEASE would cause problems, because we're using
> kobjects in the wrong way.
> 
> So in my view, the two options are:
> 
> 1. Convert the kobject to dynamic as I described.
> 
> 2. Change the klp_register() interface so that klp_patch gets allocated
>    in livepatch code.
> 
> I'd be curious to hear what others think.

My understanding is that the concern here is that walking through the 
complete linked list every time sysfs node is accessed, just to figure out 
whether we're able to find a klp_patch entry that points back to the 
particular kobject that's being passed to the sysfs callback, isn't really 
super-efficient.

I personally wouldn't worry *that* much about that particular aspect 
(sysfs operations are hardly considered time critical anyway), but I'd 
have to think a bit more whether this is really safe wrt. deadlocks 
between kernfs locks and klp_mutex; but so far it seems to me that 
klp_mutex always nests below kernfs, so it should be OK.

Unfortunately, this still feels like a non-negligible (mis|ab)use of 
kobjects that could bite us later wrt. maintainability and general clarity 
of the code, and therefore tying klp_patch lifetime to (de)allocations 
from within livepatch code itself seems like much better idea to me.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: livepatch: allow removal of a disabled patch
  2016-05-05 15:04                       ` Josh Poimboeuf
  2016-05-05 21:08                         ` Jiri Kosina
@ 2016-05-06  0:42                         ` Jessica Yu
  2016-05-06  7:51                           ` Miroslav Benes
  1 sibling, 1 reply; 21+ messages in thread
From: Jessica Yu @ 2016-05-06  0:42 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Miroslav Benes, Jiri Kosina, pmladek, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

+++ Josh Poimboeuf [05/05/16 10:04 -0500]:
>On Thu, May 05, 2016 at 04:25:48PM +0200, Miroslav Benes wrote:
>> On Thu, 5 May 2016, Josh Poimboeuf wrote:
>>
>> > On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
>> > > I think it boils down to the following problem.
>> > >
>> > > 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
>> > >
>> > > 2. we have dynamic kobjects, so there is a pointer in klp_patch to struct
>> > > kobject
>> > >
>> > > 3. it is allocated during klp_init_patch() and all is fine
>> > >
>> > > 4. now we want to remove the patch module. It is disabled and module_put()
>> > > is called. User calls rmmod on the module.
>> > >
>> > > 5. klp_unregister_patch() is called in __exit method.
>> > >
>> > > 6. klp_free_patch() is called.
>> > >
>> > > 7. kobject_put(patch->kobj) is called.
>> > >
>> > > ...now it gets interesting...
>> > >
>> > > 8. among others kobject_cleanup() is scheduled as a delayed work (this is
>> > > important).
>> > >
>> > > 9. there is no completion, so kobject_put returns and the module goes
>> > > away.
>> > >
>> > > 10. someone calls patch enabled_store attribute (for example). They can
>> > > because kobject_cleanup() has not been called yet. It is delayed
>> > > scheduled.
>> > >
>> > > ...crash...
>> >
>> > But what exactly causes the crash?  In enabled_store() we can see that
>> > the patch isn't in the list, so we can return -EINVAL.
>>
>> Ok, bad example. Take enabled_show() instead. It could be fixed in the
>> same way, but I am not sure it is the right thing to do. It does not scale
>> because the problem is elsewhere.
>>
>> Anyway, it is (even if theoretically) there in my opinion and we
>> have two options.
>>
>> 1. We could forget about CONFIG_DEBUG_KOBJECT_RELEASE and all is ok
>> without completion and regardless of dynamic/static kobject allocation.
>>
>> 2. We introduce completion and we are ok even with
>> CONFIG_DEBUG_KOBJECT_RELEASE=y and again regardless of dynamic/static
>> kobject allocation.
>
>I would disagree with the statement that the dynamic kobject doesn't
>scale.  We would just need a helper function to get from a kobject to
>its klp_patch.
>
>In fact, to me it seems like the right way to do it.  It doesn't make
>sense for the code which creates the kobject to be different from the
>code which initializes it.  It's slightly out of context, but
>kobject.txt does say:
>
>  "Code which creates a kobject must, of course, initialize that object."
>
>I view the completion as a hack to compensate for the fact that we're
>abusing the kobject interface.  And so it makes sense to me that
>CONFIG_DEBUG_KOBJECT_RELEASE would cause problems, because we're using
>kobjects in the wrong way.
>
>So in my view, the two options are:
>
>1. Convert the kobject to dynamic as I described.
>
>2. Change the klp_register() interface so that klp_patch gets allocated
>   in livepatch code.
>
>I'd be curious to hear what others think.

So, I think both of these solutions would enable us to get rid of
the completion. Let me try to summarize..

For solution #1, if we dynamically allocate the kobject, i.e. we have a
pointer now instead of having it embedded in the klp_patch struct, we no
longer need to worry if the corresponding klp_patch gets deallocated under
our nose. Since the kobject_cleanup function is delayed w/
CONFIG_DEBUG_KOBJECT_RELEASE, it is possible to have sysfs entries that
refer to a klp_patch that no longer exists. Thus if any of the sysfs
functions get called, we would have to take care to ensure that the
klp_patch struct corresponding to the kobject in question actually still
exists. In this case, all sysfs functions would require an extra check to
make sure the matching klp_patch is still on the patches list and return an
error if it isn't found. The "pro" is that this change would be simple, the
"con" is that now kobjects are decoupled and managed completely separately
from the object (klp_patch) with which they are associated, which doesn't
feel 100% right.

For solution #2, we could have livepatch manage the (de)allocation of
klp_patch objects internally. Maybe in this scenario the caller would need
to request a klp_patch object be allocated and the caller would fill out the
returned klp_patch struct as appropriate. In this case, we would be able
leave the kobject embedded in the klp_patch struct (and dynamic kobjects
wouldn't be needed), as livepatch would now have control of both structures.

Then during the patch module exit path, when kobject_put is called, the
klp_patch struct would be freed in its kobject's release function. We
wouldn't have to hold up rmmod, and delayed execution of kobject_cleanup
wouldn't break anything, because a klp_patch would then have the same "lifespan"
as its corresponding kobject, and therefore it would be safe to invoke
enabled_store & co. up until kobject_cleanup is finally executed. We'd be
able to use container_of in this case as well. In addition, we wouldn't
have to force all sysfs functions to support an awkward edge-case (i.e.
checking if the corresponding klp_patch still exists). I think this
solution also matches better with the typical use-case of the kobject
release method, as described in kobject.txt (replacing 'my_object' with
klp_patch):
---
(...)
This notification is done through a kobject's release() method.
Usually such a method has a form like:

   void my_object_release(struct kobject *kobj)
    {
        struct my_object *mine = container_of(kobj, struct my_object, kobj);
        /* Perform any additional cleanup on this object, then... */
        kfree(mine);
    }
---
Apologies for the giant wall of text. In any case I feel like solution #2
is actually closer in line with how kobjects are normally used, embedded in
the structures they refer to, which get deallocated once their refcount
hits 0. What do people think?

Jessica

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: livepatch: allow removal of a disabled patch
  2016-05-06  0:42                         ` Jessica Yu
@ 2016-05-06  7:51                           ` Miroslav Benes
  0 siblings, 0 replies; 21+ messages in thread
From: Miroslav Benes @ 2016-05-06  7:51 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Josh Poimboeuf, Jiri Kosina, pmladek, jslaby, live-patching,
	Linux Kernel Mailing List, huawei.libin, minfei.huang

On Thu, 5 May 2016, Jessica Yu wrote:

> +++ Josh Poimboeuf [05/05/16 10:04 -0500]:
> > On Thu, May 05, 2016 at 04:25:48PM +0200, Miroslav Benes wrote:
> > > On Thu, 5 May 2016, Josh Poimboeuf wrote:
> > > 
> > > > On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
> > > > > I think it boils down to the following problem.
> > > > >
> > > > > 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > >
> > > > > 2. we have dynamic kobjects, so there is a pointer in klp_patch to
> > > struct
> > > > > kobject
> > > > >
> > > > > 3. it is allocated during klp_init_patch() and all is fine
> > > > >
> > > > > 4. now we want to remove the patch module. It is disabled and
> > > module_put()
> > > > > is called. User calls rmmod on the module.
> > > > >
> > > > > 5. klp_unregister_patch() is called in __exit method.
> > > > >
> > > > > 6. klp_free_patch() is called.
> > > > >
> > > > > 7. kobject_put(patch->kobj) is called.
> > > > >
> > > > > ...now it gets interesting...
> > > > >
> > > > > 8. among others kobject_cleanup() is scheduled as a delayed work (this
> > > is
> > > > > important).
> > > > >
> > > > > 9. there is no completion, so kobject_put returns and the module goes
> > > > > away.
> > > > >
> > > > > 10. someone calls patch enabled_store attribute (for example). They
> > > can
> > > > > because kobject_cleanup() has not been called yet. It is delayed
> > > > > scheduled.
> > > > >
> > > > > ...crash...
> > > >
> > > > But what exactly causes the crash?  In enabled_store() we can see that
> > > > the patch isn't in the list, so we can return -EINVAL.
> > > 
> > > Ok, bad example. Take enabled_show() instead. It could be fixed in the
> > > same way, but I am not sure it is the right thing to do. It does not scale
> > > because the problem is elsewhere.
> > > 
> > > Anyway, it is (even if theoretically) there in my opinion and we
> > > have two options.
> > > 
> > > 1. We could forget about CONFIG_DEBUG_KOBJECT_RELEASE and all is ok
> > > without completion and regardless of dynamic/static kobject allocation.
> > > 
> > > 2. We introduce completion and we are ok even with
> > > CONFIG_DEBUG_KOBJECT_RELEASE=y and again regardless of dynamic/static
> > > kobject allocation.
> > 
> > I would disagree with the statement that the dynamic kobject doesn't
> > scale.  We would just need a helper function to get from a kobject to
> > its klp_patch.
> > 
> > In fact, to me it seems like the right way to do it.  It doesn't make
> > sense for the code which creates the kobject to be different from the
> > code which initializes it.  It's slightly out of context, but
> > kobject.txt does say:
> > 
> >  "Code which creates a kobject must, of course, initialize that object."
> > 
> > I view the completion as a hack to compensate for the fact that we're
> > abusing the kobject interface.  And so it makes sense to me that
> > CONFIG_DEBUG_KOBJECT_RELEASE would cause problems, because we're using
> > kobjects in the wrong way.
> > 
> > So in my view, the two options are:
> > 
> > 1. Convert the kobject to dynamic as I described.
> > 
> > 2. Change the klp_register() interface so that klp_patch gets allocated
> >   in livepatch code.
> > 
> > I'd be curious to hear what others think.
> 
> So, I think both of these solutions would enable us to get rid of
> the completion. Let me try to summarize..

Whoa, thanks for a good summary. That's exactly what was needed :)

> For solution #1, if we dynamically allocate the kobject, i.e. we have a
> pointer now instead of having it embedded in the klp_patch struct, we no
> longer need to worry if the corresponding klp_patch gets deallocated under
> our nose. Since the kobject_cleanup function is delayed w/
> CONFIG_DEBUG_KOBJECT_RELEASE, it is possible to have sysfs entries that
> refer to a klp_patch that no longer exists. Thus if any of the sysfs
> functions get called, we would have to take care to ensure that the
> klp_patch struct corresponding to the kobject in question actually still
> exists. In this case, all sysfs functions would require an extra check to
> make sure the matching klp_patch is still on the patches list and return an
> error if it isn't found. The "pro" is that this change would be simple, the
> "con" is that now kobjects are decoupled and managed completely separately
> from the object (klp_patch) with which they are associated, which doesn't
> feel 100% right.

Yes, I've thought a lot about it during the night (I've even dreams about 
kobjects now) and this doesn't seem as a good solution. We would need to 
make sure that we have appropriate checks everytime we change something in 
the code. Moreover I don't like walking through the list of patches each 
time. It is not critical path, but it is not nice anyway. This is what I 
meant with scaling problem previously.

I agree that while compaction makes all the problems go away, it is more 
like 'after a fashion' solution. Which leads me to... 

> For solution #2, we could have livepatch manage the (de)allocation of
> klp_patch objects internally. Maybe in this scenario the caller would need
> to request a klp_patch object be allocated and the caller would fill out the
> returned klp_patch struct as appropriate. In this case, we would be able
> leave the kobject embedded in the klp_patch struct (and dynamic kobjects
> wouldn't be needed), as livepatch would now have control of both structures.

I think this is the way to go.

> Then during the patch module exit path, when kobject_put is called, the
> klp_patch struct would be freed in its kobject's release function. We
> wouldn't have to hold up rmmod, and delayed execution of kobject_cleanup
> wouldn't break anything, because a klp_patch would then have the same
> "lifespan"
> as its corresponding kobject, and therefore it would be safe to invoke
> enabled_store & co. up until kobject_cleanup is finally executed. We'd be
> able to use container_of in this case as well. In addition, we wouldn't
> have to force all sysfs functions to support an awkward edge-case (i.e.
> checking if the corresponding klp_patch still exists). I think this
> solution also matches better with the typical use-case of the kobject
> release method, as described in kobject.txt (replacing 'my_object' with
> klp_patch):
> ---
> (...)
> This notification is done through a kobject's release() method.
> Usually such a method has a form like:
> 
>   void my_object_release(struct kobject *kobj)
>    {
>        struct my_object *mine = container_of(kobj, struct my_object, kobj);
>        /* Perform any additional cleanup on this object, then... */
>        kfree(mine);
>    }
> ---
> Apologies for the giant wall of text. In any case I feel like solution #2
> is actually closer in line with how kobjects are normally used, embedded in
> the structures they refer to, which get deallocated once their refcount
> hits 0. What do people think?

I agree. Josh and jikos proposed this as well, so I think we have an 
agreement. I'm gonna prepare a patch independent of the consistency model 
as this is a separate issue.

Thanks a lot,
Miroslav

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] livepatch: allow removal of a disabled patch
  2016-05-03  8:16   ` Miroslav Benes
@ 2016-05-31 23:13     ` Jiri Kosina
  0 siblings, 0 replies; 21+ messages in thread
From: Jiri Kosina @ 2016-05-31 23:13 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Josh Poimboeuf, jeyu, pmladek, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

On Tue, 3 May 2016, Miroslav Benes wrote:

> > > Currently we do not allow patch module to unload since there is no
> > > method to determine if a task is still running in the patched code.
> > > 
> > > The consistency model gives us the way because when the patching
> > > finishes we know that all tasks were marked as safe to call a new
> > > patched function. Thus every new call to the function calls the new
> > > patched code and at the same time no task can be somewhere in the old
> > > code, because it had to leave that code to be marked as safe.
> > > 
> > > We can safely let the patch module go after that.
> > 
> > I found this a little confusing because it talks about patching, whereas
> > we really want to remove the patch module after _unpatching_ it.
> 
> You're right. I'll rephrase that.

Now that it's been settled that this way (completion) is the way to go, 
could you please incorporate the feedback (and persumably also add Acks 
from Josh and Jessica) and send me v2?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: livepatch: allow removal of a disabled patch
  2016-06-07 22:39   ` Jessica Yu
@ 2016-06-08  8:10     ` Petr Mladek
  0 siblings, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2016-06-08  8:10 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Miroslav Benes, jpoimboe, jikos, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

On Tue 2016-06-07 18:39:51, Jessica Yu wrote:
> +++ Petr Mladek [07/06/16 11:36 +0200]:
> >On Wed 2016-06-01 10:31:59, Miroslav Benes wrote:
> >>Currently we do not allow patch module to unload since there is no
> >>method to determine if a task is still running in the patched code.
> >>
> >>The consistency model gives us the way because when the unpatching
> >>finishes we know that all tasks were marked as safe to call an original
> >>function. Thus every new call to the function calls the original code
> >>and at the same time no task can be somewhere in the patched code,
> >>because it had to leave that code to be marked as safe.
> >>
> >>We can safely let the patch module go after that.
> >>
> >>Completion is used for synchronization between module removal and sysfs
> >>infrastructure in a similar way to commit 942e443127e9 ("module: Fix
> >>mod->mkobj.kobj potentially freed too early").
> >>
> >>Note that we still do not allow the removal for immediate model, that is
> >>no consistency model. The module refcount may increase in this case if
> >>somebody disables and enables the patch several times. This should not
> >>cause any harm.
> >>
> >>With this change a call to try_module_get() is moved to
> >>__klp_enable_patch from klp_register_patch to make module reference
> >>counting symmetric (module_put() is in a patch disable path) and to
> >>allow to take a new reference to a disabled module when being enabled.
> >>
> >>Also all kobject_put(&patch->kobj) calls are moved outside of klp_mutex
> >>lock protection to prevent a deadlock situation when
> >>klp_unregister_patch is called and sysfs directories are removed. There
> >>is no need to do the same for other kobject_put() callsites as we
> >>currently do not have their sysfs counterparts.
> >>
> >>Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> >>---
> >>Based on Josh's v2 of the consistency model.
> >>
> >>diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> >>index d55701222b49..a649186fb4af 100644
> >>--- a/kernel/livepatch/core.c
> >>+++ b/kernel/livepatch/core.c
> >>@@ -459,6 +472,15 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
> >>
> >> 	mutex_lock(&klp_mutex);
> >>
> >>+	if (!klp_is_patch_registered(patch)) {
> >>+		/*
> >>+		 * Module with the patch could either disappear meanwhile or is
> >>+		 * not properly initialized yet.
> >>+		 */
> >>+		ret = -EINVAL;
> >>+		goto err;
> >>+	}
> >>+
> >> 	if (patch->enabled == val) {
> >> 		/* already in requested state */
> >> 		ret = -EINVAL;
> >>@@ -700,11 +721,14 @@ static int klp_init_patch(struct klp_patch *patch)
> >> 	mutex_lock(&klp_mutex);
> >>
> >> 	patch->enabled = false;
> >>+	init_completion(&patch->finish);
> >>
> >> 	ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
> >> 				   klp_root_kobj, "%s", patch->mod->name);
> >>-	if (ret)
> >>-		goto unlock;
> >>+	if (ret) {
> >>+		mutex_unlock(&klp_mutex);
> >>+		return ret;
> >>+	}
> >>
> >> 	klp_for_each_object(patch, obj) {
> >> 		ret = klp_init_object(patch, obj);
> >>@@ -720,9 +744,12 @@ static int klp_init_patch(struct klp_patch *patch)
> >>
> >> free:
> >> 	klp_free_objects_limited(patch, obj);
> >>-	kobject_put(&patch->kobj);
> >>-unlock:
> >>+
> >> 	mutex_unlock(&klp_mutex);
> >>+
> >
> >Just for record. The sysfs entry exists but patch->list is not
> >initialized in this error path. Therefore we could write into
> >
> >  /sys/.../livepatch_foo/enable
> >
> >and call enabled_store(). It is safe because enabled_store() calls
> >klp_is_patch_registered() and it does not need patch->list for this
> >check. Kudos for the klp_is_patch_registered() implementation.
> >
> >I write this because it is not obvious and it took me some time
> >to verify that we are on the safe side.
> >
> >Well, I would feel more comfortable if we initialize patch->list
> >above.
> 
> Hi Petr,
> 
> I'm a bit unclear on this, can you clarify what you mean by initialize
> patch->list? I don't think any extra initialization is needed here to
> be able use a patch->list node with an existing list (klp_patches),
> since this field is embedded and the klp_patches list header is
> already statically initialized with LIST_HEAD.

I meant to call

	INIT_LIST_HEAD(&patch->list);

around the existing initialization:

	patch->enabled = false;
	init_completion(&patch->finish);


The list is zeroed by default. This is true at least in
the livepatch_sample.c. It means that it is not a valid
list head without an extra initialization.

Also note that we do a check for list_empty(&patch->list) in
klp_free_patch(). This check is useless if we do not initialize
the list.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: livepatch: allow removal of a disabled patch
  2016-06-07  9:36 ` Petr Mladek
@ 2016-06-07 22:39   ` Jessica Yu
  2016-06-08  8:10     ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Jessica Yu @ 2016-06-07 22:39 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Miroslav Benes, jpoimboe, jikos, jslaby, live-patching,
	linux-kernel, huawei.libin, minfei.huang

+++ Petr Mladek [07/06/16 11:36 +0200]:
>On Wed 2016-06-01 10:31:59, Miroslav Benes wrote:
>> Currently we do not allow patch module to unload since there is no
>> method to determine if a task is still running in the patched code.
>>
>> The consistency model gives us the way because when the unpatching
>> finishes we know that all tasks were marked as safe to call an original
>> function. Thus every new call to the function calls the original code
>> and at the same time no task can be somewhere in the patched code,
>> because it had to leave that code to be marked as safe.
>>
>> We can safely let the patch module go after that.
>>
>> Completion is used for synchronization between module removal and sysfs
>> infrastructure in a similar way to commit 942e443127e9 ("module: Fix
>> mod->mkobj.kobj potentially freed too early").
>>
>> Note that we still do not allow the removal for immediate model, that is
>> no consistency model. The module refcount may increase in this case if
>> somebody disables and enables the patch several times. This should not
>> cause any harm.
>>
>> With this change a call to try_module_get() is moved to
>> __klp_enable_patch from klp_register_patch to make module reference
>> counting symmetric (module_put() is in a patch disable path) and to
>> allow to take a new reference to a disabled module when being enabled.
>>
>> Also all kobject_put(&patch->kobj) calls are moved outside of klp_mutex
>> lock protection to prevent a deadlock situation when
>> klp_unregister_patch is called and sysfs directories are removed. There
>> is no need to do the same for other kobject_put() callsites as we
>> currently do not have their sysfs counterparts.
>>
>> Signed-off-by: Miroslav Benes <mbenes@suse.cz>
>> ---
>> Based on Josh's v2 of the consistency model.
>>
>> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
>> index d55701222b49..a649186fb4af 100644
>> --- a/kernel/livepatch/core.c
>> +++ b/kernel/livepatch/core.c
>> @@ -459,6 +472,15 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
>>
>>  	mutex_lock(&klp_mutex);
>>
>> +	if (!klp_is_patch_registered(patch)) {
>> +		/*
>> +		 * Module with the patch could either disappear meanwhile or is
>> +		 * not properly initialized yet.
>> +		 */
>> +		ret = -EINVAL;
>> +		goto err;
>> +	}
>> +
>>  	if (patch->enabled == val) {
>>  		/* already in requested state */
>>  		ret = -EINVAL;
>> @@ -700,11 +721,14 @@ static int klp_init_patch(struct klp_patch *patch)
>>  	mutex_lock(&klp_mutex);
>>
>>  	patch->enabled = false;
>> +	init_completion(&patch->finish);
>>
>>  	ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
>>  				   klp_root_kobj, "%s", patch->mod->name);
>> -	if (ret)
>> -		goto unlock;
>> +	if (ret) {
>> +		mutex_unlock(&klp_mutex);
>> +		return ret;
>> +	}
>>
>>  	klp_for_each_object(patch, obj) {
>>  		ret = klp_init_object(patch, obj);
>> @@ -720,9 +744,12 @@ static int klp_init_patch(struct klp_patch *patch)
>>
>>  free:
>>  	klp_free_objects_limited(patch, obj);
>> -	kobject_put(&patch->kobj);
>> -unlock:
>> +
>>  	mutex_unlock(&klp_mutex);
>> +
>
>Just for record. The sysfs entry exists but patch->list is not
>initialized in this error path. Therefore we could write into
>
>   /sys/.../livepatch_foo/enable
>
>and call enabled_store(). It is safe because enabled_store() calls
>klp_is_patch_registered() and it does not need patch->list for this
>check. Kudos for the klp_is_patch_registered() implementation.
>
>I write this because it is not obvious and it took me some time
>to verify that we are on the safe side.
>
>Well, I would feel more comfortable if we initialize patch->list
>above.

Hi Petr,

I'm a bit unclear on this, can you clarify what you mean by initialize
patch->list? I don't think any extra initialization is needed here to
be able use a patch->list node with an existing list (klp_patches),
since this field is embedded and the klp_patches list header is
already statically initialized with LIST_HEAD.

Jessica

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-06-08  8:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-02 11:57 [RFC PATCH] livepatch: allow removal of a disabled patch Miroslav Benes
2016-05-02 15:08 ` Josh Poimboeuf
2016-05-03  8:16   ` Miroslav Benes
2016-05-31 23:13     ` Jiri Kosina
2016-05-03 21:37 ` Josh Poimboeuf
2016-05-03 22:31   ` Jiri Kosina
2016-05-04  2:39     ` Josh Poimboeuf
2016-05-04  3:36       ` Josh Poimboeuf
2016-05-04 11:58         ` Miroslav Benes
2016-05-04 13:14           ` Josh Poimboeuf
2016-05-04 14:35             ` Miroslav Benes
2016-05-04 16:14               ` Josh Poimboeuf
2016-05-05  8:28                 ` Miroslav Benes
2016-05-05 13:27                   ` Josh Poimboeuf
2016-05-05 14:25                     ` Miroslav Benes
2016-05-05 15:04                       ` Josh Poimboeuf
2016-05-05 21:08                         ` Jiri Kosina
2016-05-06  0:42                         ` Jessica Yu
2016-05-06  7:51                           ` Miroslav Benes
2016-06-01  8:31 [PATCH v2] " Miroslav Benes
2016-06-07  9:36 ` Petr Mladek
2016-06-07 22:39   ` Jessica Yu
2016-06-08  8:10     ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).