All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
@ 2015-11-30  3:54 Li Bin
  2015-11-30 13:53 ` Petr Mladek
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Li Bin @ 2015-11-30  3:54 UTC (permalink / raw)
  To: jpoimboe, sjenning, jikos, vojtech
  Cc: live-patching, linux-kernel, guohanjun, dingtianhong, xiexiuqi,
	zhouchengming1

There is a potential race as following:

CPU0                         |  CPU1
-----------------------------|-----------------------------------
enabled_store()              |  klp_unregister_patch()
                             |  |-mutex_lock(&klp_mutex);
|-mutex_lock(&klp_mutex);    |  |-klp_free_patch();
                             |  |-mutex_unlock(&klp_mutex);
|-[process the patch's state]|
|-mutex_unlock(&klp_mutex)   |

Fix this race condition by adding klp_is_patch_registered() check in
enabled_store() after get the lock klp_mutex.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
---
 kernel/livepatch/core.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index db545cb..50af971 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -614,6 +614,11 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 	mutex_lock(&klp_mutex);
 
+	if (!klp_is_patch_registered(patch)) {
+		ret = -EINVAL;
+		goto err;
+	}
+
 	if (val == patch->state) {
 		/* already in requested state */
 		ret = -EINVAL;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-11-30  3:54 [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch() Li Bin
@ 2015-11-30 13:53 ` Petr Mladek
  2015-12-01  1:11 ` Josh Poimboeuf
  2015-12-15  8:14 ` Miroslav Benes
  2 siblings, 0 replies; 10+ messages in thread
From: Petr Mladek @ 2015-11-30 13:53 UTC (permalink / raw)
  To: Li Bin
  Cc: jpoimboe, sjenning, jikos, vojtech, live-patching, linux-kernel,
	guohanjun, dingtianhong, xiexiuqi, zhouchengming1

On Mon 2015-11-30 11:54:37, Li Bin wrote:
> There is a potential race as following:
> 
> CPU0                         |  CPU1
> -----------------------------|-----------------------------------
> enabled_store()              |  klp_unregister_patch()
>                              |  |-mutex_lock(&klp_mutex);
> |-mutex_lock(&klp_mutex);    |  |-klp_free_patch();
>                              |  |-mutex_unlock(&klp_mutex);
> |-[process the patch's state]|
> |-mutex_unlock(&klp_mutex)   |
> 
> Fix this race condition by adding klp_is_patch_registered() check in
> enabled_store() after get the lock klp_mutex.

It seems that you are right but the situation is more complicated.
See below.

> 
> Signed-off-by: Li Bin <huawei.libin@huawei.com>
> ---
>  kernel/livepatch/core.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index db545cb..50af971 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -614,6 +614,11 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
>  
>  	mutex_lock(&klp_mutex);
>  
> +	if (!klp_is_patch_registered(patch)) {

How is it guaranteed that "patch" is still a valid pointer, please?

I think that we also need to modify klp_unregister_patch().
It must not finish until the patch->kobj is destroyed.
Otherwise, the patch module would get removed and
the "patch" would be invalid.

I guess that enabled_store() takes reference of the patch->kobj.
Then kobject_put(&patch->kobj) in klp_free_patch() decrements
the refcount but it does not free the object. It means that
it does not wait for  enabled_store() to finish. I am not 100%
sure because the kobject/sysfs code is quite complex.


Anyway, we should refuse to call klp_unregister_patch() at all
in the current implementation. It is not safe because we
do not know if the patch is still used or not.

Note that if you disable the patch, you are only sure that
the new function will not longer be called. But you
do _not_ know if all the existing calls have already finished.
A process might sleep inside the function from the patch.
I guess that it will be possible after we introduce
a more complex consistency algorithm for switching
the patches.

Best Regards,
Petr



> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
>  	if (val == patch->state) {
>  		/* already in requested state */
>  		ret = -EINVAL;
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe live-patching" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-11-30  3:54 [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch() Li Bin
  2015-11-30 13:53 ` Petr Mladek
@ 2015-12-01  1:11 ` Josh Poimboeuf
  2015-12-01  2:46   ` libin
  2015-12-01  8:50   ` Jiri Slaby
  2015-12-15  8:14 ` Miroslav Benes
  2 siblings, 2 replies; 10+ messages in thread
From: Josh Poimboeuf @ 2015-12-01  1:11 UTC (permalink / raw)
  To: Li Bin
  Cc: sjenning, jikos, vojtech, live-patching, linux-kernel, guohanjun,
	dingtianhong, xiexiuqi, zhouchengming1, Miroslav Benes

On Mon, Nov 30, 2015 at 11:54:37AM +0800, Li Bin wrote:
> There is a potential race as following:
> 
> CPU0                         |  CPU1
> -----------------------------|-----------------------------------
> enabled_store()              |  klp_unregister_patch()
>                              |  |-mutex_lock(&klp_mutex);
> |-mutex_lock(&klp_mutex);    |  |-klp_free_patch();
>                              |  |-mutex_unlock(&klp_mutex);
> |-[process the patch's state]|
> |-mutex_unlock(&klp_mutex)   |
> 
> Fix this race condition by adding klp_is_patch_registered() check in
> enabled_store() after get the lock klp_mutex.

I'm thinking this race isn't possible, and that instead it would
deadlock.

When I try to recreate something similar by putting a delay in
enabled_store(), klp_free_patch() just sleeps on its call to
kobject_put() until enabled_store() returns.  The unregister stack looks
like:

  [<ffffffff812e966b>] __kernfs_remove+0x1fb/0x380
  [<ffffffff812ea273>] kernfs_remove+0x23/0x40
  [<ffffffff812ec601>] sysfs_remove_dir+0x51/0x80
  [<ffffffff81407fb8>] kobject_del+0x18/0x50
  [<ffffffff8140804a>] kobject_release+0x5a/0x190
  [<ffffffff81407f27>] kobject_put+0x27/0x50
  [<ffffffff81128d41>] klp_unregister_patch+0x71/0x80

It seems to be waiting on a lock which is held by the kernfs code which
called enabled_store().  So when enabled_store() tries to get the
klp_mutex, it deadlocks.

Miroslav and I previously discussed a few options to fix this:

 https://lkml.kernel.org/r/alpine.LNX.2.00.1502171728520.29490@pobox.suse.cz

-- 
Josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-12-01  1:11 ` Josh Poimboeuf
@ 2015-12-01  2:46   ` libin
  2015-12-01  8:50   ` Jiri Slaby
  1 sibling, 0 replies; 10+ messages in thread
From: libin @ 2015-12-01  2:46 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: sjenning, jikos, vojtech, live-patching, linux-kernel, guohanjun,
	dingtianhong, xiexiuqi, zhouchengming1, Miroslav Benes


Hi Josh,
on 2015/12/1 9:11, Josh Poimboeuf wrote:
> On Mon, Nov 30, 2015 at 11:54:37AM +0800, Li Bin wrote:
>> There is a potential race as following:
>>
>> CPU0                         |  CPU1
>> -----------------------------|-----------------------------------
>> enabled_store()              |  klp_unregister_patch()
>>                              |  |-mutex_lock(&klp_mutex);
>> |-mutex_lock(&klp_mutex);    |  |-klp_free_patch();
>>                              |  |-mutex_unlock(&klp_mutex);
>> |-[process the patch's state]|
>> |-mutex_unlock(&klp_mutex)   |
>>
>> Fix this race condition by adding klp_is_patch_registered() check in
>> enabled_store() after get the lock klp_mutex.
> I'm thinking this race isn't possible, and that instead it would
> deadlock.

Good point.  I did not consider the kernfs lock.

> When I try to recreate something similar by putting a delay in
> enabled_store(), klp_free_patch() just sleeps on its call to
> kobject_put() until enabled_store() returns.  The unregister stack looks
> like:
>
>   [<ffffffff812e966b>] __kernfs_remove+0x1fb/0x380
>   [<ffffffff812ea273>] kernfs_remove+0x23/0x40
>   [<ffffffff812ec601>] sysfs_remove_dir+0x51/0x80
>   [<ffffffff81407fb8>] kobject_del+0x18/0x50
>   [<ffffffff8140804a>] kobject_release+0x5a/0x190
>   [<ffffffff81407f27>] kobject_put+0x27/0x50
>   [<ffffffff81128d41>] klp_unregister_patch+0x71/0x80
>
> It seems to be waiting on a lock which is held by the kernfs code which
> called enabled_store().  So when enabled_store() tries to get the
> klp_mutex, it deadlocks.
>
> Miroslav and I previously discussed a few options to fix this:
>
>  https://lkml.kernel.org/r/alpine.LNX.2.00.1502171728520.29490@pobox.suse.cz

Both the options seem good to me. I look forward to the patch. :)
Thanks,
Li Bin




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-12-01  1:11 ` Josh Poimboeuf
  2015-12-01  2:46   ` libin
@ 2015-12-01  8:50   ` Jiri Slaby
  2015-12-01 14:13     ` Petr Mladek
  2015-12-01 15:53     ` Josh Poimboeuf
  1 sibling, 2 replies; 10+ messages in thread
From: Jiri Slaby @ 2015-12-01  8:50 UTC (permalink / raw)
  To: Josh Poimboeuf, Li Bin
  Cc: sjenning, jikos, vojtech, live-patching, linux-kernel, guohanjun,
	dingtianhong, xiexiuqi, zhouchengming1, Miroslav Benes

On 12/01/2015, 02:11 AM, Josh Poimboeuf wrote:
> When I try to recreate something similar by putting a delay in
> enabled_store(), klp_free_patch() just sleeps on its call to
> kobject_put() until enabled_store() returns.  The unregister stack looks
> like:
> 
>   [<ffffffff812e966b>] __kernfs_remove+0x1fb/0x380
>   [<ffffffff812ea273>] kernfs_remove+0x23/0x40
>   [<ffffffff812ec601>] sysfs_remove_dir+0x51/0x80
>   [<ffffffff81407fb8>] kobject_del+0x18/0x50
>   [<ffffffff8140804a>] kobject_release+0x5a/0x190
>   [<ffffffff81407f27>] kobject_put+0x27/0x50

What about _put outside of klp_mutex in klp_unregister_patch (and maybe
the other _put's as well)? Plus Li Bin's patch.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-12-01  8:50   ` Jiri Slaby
@ 2015-12-01 14:13     ` Petr Mladek
  2015-12-01 14:28       ` Jiri Slaby
  2015-12-01 15:53     ` Josh Poimboeuf
  1 sibling, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2015-12-01 14:13 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Josh Poimboeuf, Li Bin, sjenning, jikos, vojtech, live-patching,
	linux-kernel, guohanjun, dingtianhong, xiexiuqi, zhouchengming1,
	Miroslav Benes

On Tue 2015-12-01 09:50:23, Jiri Slaby wrote:
> On 12/01/2015, 02:11 AM, Josh Poimboeuf wrote:
> > When I try to recreate something similar by putting a delay in
> > enabled_store(), klp_free_patch() just sleeps on its call to
> > kobject_put() until enabled_store() returns.  The unregister stack looks
> > like:
> > 
> >   [<ffffffff812e966b>] __kernfs_remove+0x1fb/0x380
> >   [<ffffffff812ea273>] kernfs_remove+0x23/0x40
> >   [<ffffffff812ec601>] sysfs_remove_dir+0x51/0x80
> >   [<ffffffff81407fb8>] kobject_del+0x18/0x50
> >   [<ffffffff8140804a>] kobject_release+0x5a/0x190
> >   [<ffffffff81407f27>] kobject_put+0x27/0x50
> 
> What about _put outside of klp_mutex in klp_unregister_patch (and maybe
> the other _put's as well)? Plus Li Bin's patch.

This might work. But I am pretty sure that we would need to put also
all the other kobject_puts outside of the lock.

I wondered how the approach with mutex_trylock() would look like
and got the patch below.

It is not trivial but it still might be easier than moving all
the kobject_put() calls. Also it should be easier to review
because all the logic is on a single place.

What do you think?

>From 3eaec912f2700cd2a886ada1e7b4361ae192ef25 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Tue, 1 Dec 2015 13:25:34 +0100
Subject: [PATCH] livepatch: Avoid deadlock when unregistering and enabling a
 patch

There is a possible deadlock between kobject_put() calls and
enabled_store(), see the lockdep report below.

A solution would be to put all kobject without the klp_mutex
and check if the patch is registered in enabled_store().
But this would make the unregister/free code even more
scattered.

This patch takes the other possible approach. It uses trylock
in enabled_store(). It the lock is not available and the patch
is not registered, it is probably being removed. Anyway, there
is nothing to do and enable_store() returns -EINVAL.
If the lock is not available and the patch is registered,
it tries harder to get it. It uses mutex_is_locked()
in the busy loop to avoid the cache bouncing.

Lockdep report:

[   69.512196] ======================================================
[   69.513139] [ INFO: possible circular locking dependency detected ]
[   69.513437] 4.4.0-rc3-4-default+ #2079 Tainted: G        W   E K
[   69.513437] -------------------------------------------------------
[   69.513437] rmmod/3786 is trying to acquire lock:
[   69.513437]  (s_active#99){++++.+}, at: [<ffffffff8127d4a3>] kernfs_remove+0x23/0x40
[   69.513437]
but task is already holding lock:
[   69.513437]  (klp_mutex){+.+.+.}, at: [<ffffffff810de383>] klp_unregister_patch+0x23/0xc0
[   69.513437]
which lock already depends on the new lock.

[   69.513437]
the existing dependency chain (in reverse order) is:
[   69.513437]
-> #1 (klp_mutex){+.+.+.}:
[   69.513437]        [<ffffffff810bfd5d>] lock_acquire+0xad/0x130
[   69.513437]        [<ffffffff81916f04>] mutex_lock_nested+0x44/0x380
[   69.513437]        [<ffffffff810debe0>] enabled_store+0x50/0xc0
[   69.513437]        [<ffffffff813fa4ef>] kobj_attr_store+0xf/0x20
[   69.513437]        [<ffffffff8127ef44>] sysfs_kf_write+0x44/0x60
[   69.513437]        [<ffffffff8127e564>] kernfs_fop_write+0x144/0x190
[   69.513437]        [<ffffffff811fb648>] __vfs_write+0x28/0xe0
[   69.513437]        [<ffffffff811fbd02>] vfs_write+0xa2/0x1a0
[   69.513437]        [<ffffffff811fca19>] SyS_write+0x49/0xa0
[   69.513437]        [<ffffffff8191a2f2>] entry_SYSCALL_64_fastpath+0x12/0x76
[   69.513437]
-> #0 (s_active#99){++++.+}:
[   69.513437]        [<ffffffff810beead>] __lock_acquire+0x14fd/0x1bd0
[   69.513437]        [<ffffffff810bfd5d>] lock_acquire+0xad/0x130
[   69.513437]        [<ffffffff8127c745>] __kernfs_remove+0x1f5/0x2b0
[   69.513437]        [<ffffffff8127d4a3>] kernfs_remove+0x23/0x40
[   69.513437]        [<ffffffff8127f881>] sysfs_remove_dir+0x51/0x80
[   69.513437]        [<ffffffff813fa6e8>] kobject_del+0x18/0x50
[   69.513437]        [<ffffffff813fa774>] kobject_cleanup+0x54/0x70
[   69.513437]        [<ffffffff813fa645>] kobject_put+0x25/0x50
[   69.513437]        [<ffffffff810de3f8>] klp_unregister_patch+0x98/0xc0
[   69.513437]        [<ffffffffa00000c5>] livepatch_exit+0x25/0xf60 [livepatch_sample]
[   69.513437]        [<ffffffff810fd21c>] SyS_delete_module+0x16c/0x1d0
[   69.513437]        [<ffffffff8191a2f2>] entry_SYSCALL_64_fastpath+0x12/0x76
[   69.513437]
other info that might help us debug this:

[   69.513437]  Possible unsafe locking scenario:

[   69.513437]        CPU0                    CPU1
[   69.513437]        ----                    ----
[   69.513437]   lock(klp_mutex);
[   69.513437]                                lock(s_active#99);
[   69.513437]                                lock(klp_mutex);
[   69.513437]   lock(s_active#99);
[   69.513437]
 *** DEADLOCK ***

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/livepatch/core.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 27712384c69e..e2f00f32bb00 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -612,7 +612,19 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 	patch = container_of(kobj, struct klp_patch, kobj);
 
-	mutex_lock(&klp_mutex);
+	/*
+	 * Avoid a deadlock with kobject_put(&patch->kobj) that is
+	 * called under klp_mutex. Bail out when the patch is not
+	 * longer registered.
+	 */
+	if (!mutex_trylock(&klp_mutex)) {
+		if (!klp_is_patch_registered(patch))
+			return -EINVAL;
+		/* Do not spin with trylock that bounce cache lines. */
+		while (mutex_is_locked(&klp_mutex) &&
+		       klp_is_patch_registered(patch))
+			cond_resched();
+	}
 
 	if (val == patch->state) {
 		/* already in requested state */
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-12-01 14:13     ` Petr Mladek
@ 2015-12-01 14:28       ` Jiri Slaby
  2015-12-01 16:57         ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Jiri Slaby @ 2015-12-01 14:28 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Li Bin, sjenning, jikos, vojtech, live-patching,
	linux-kernel, guohanjun, dingtianhong, xiexiuqi, zhouchengming1,
	Miroslav Benes

On 12/01/2015, 03:13 PM, Petr Mladek wrote:
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -612,7 +612,19 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
>  
>  	patch = container_of(kobj, struct klp_patch, kobj);
>  
> -	mutex_lock(&klp_mutex);
> +	/*
> +	 * Avoid a deadlock with kobject_put(&patch->kobj) that is
> +	 * called under klp_mutex. Bail out when the patch is not
> +	 * longer registered.
> +	 */
> +	if (!mutex_trylock(&klp_mutex)) {

This introduces false positives.
Deleting/enabling/disabling/other_op_under_klp_mutex of an unrelated
patch may now cause enabled_store to fail. Hence I don't like this
approach at all.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-12-01  8:50   ` Jiri Slaby
  2015-12-01 14:13     ` Petr Mladek
@ 2015-12-01 15:53     ` Josh Poimboeuf
  1 sibling, 0 replies; 10+ messages in thread
From: Josh Poimboeuf @ 2015-12-01 15:53 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Li Bin, sjenning, jikos, vojtech, live-patching, linux-kernel,
	guohanjun, dingtianhong, xiexiuqi, zhouchengming1,
	Miroslav Benes

On Tue, Dec 01, 2015 at 09:50:23AM +0100, Jiri Slaby wrote:
> On 12/01/2015, 02:11 AM, Josh Poimboeuf wrote:
> > When I try to recreate something similar by putting a delay in
> > enabled_store(), klp_free_patch() just sleeps on its call to
> > kobject_put() until enabled_store() returns.  The unregister stack looks
> > like:
> > 
> >   [<ffffffff812e966b>] __kernfs_remove+0x1fb/0x380
> >   [<ffffffff812ea273>] kernfs_remove+0x23/0x40
> >   [<ffffffff812ec601>] sysfs_remove_dir+0x51/0x80
> >   [<ffffffff81407fb8>] kobject_del+0x18/0x50
> >   [<ffffffff8140804a>] kobject_release+0x5a/0x190
> >   [<ffffffff81407f27>] kobject_put+0x27/0x50
> 
> What about _put outside of klp_mutex in klp_unregister_patch (and maybe
> the other _put's as well)? Plus Li Bin's patch.

This approach sounds the best to me.  I think all _put's for the patch
kobj need to be outside the mutex.  There's also a _put for the patch
kobj in the klp_init_patch() error path which needs to be moved out.

I think the rest of the _put's (for object and func kobjs) are fine as
they are, because they don't have corresponding sysfs functions which
get the mutex.

-- 
Josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-12-01 14:28       ` Jiri Slaby
@ 2015-12-01 16:57         ` Petr Mladek
  0 siblings, 0 replies; 10+ messages in thread
From: Petr Mladek @ 2015-12-01 16:57 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Josh Poimboeuf, Li Bin, sjenning, jikos, vojtech, live-patching,
	linux-kernel, guohanjun, dingtianhong, xiexiuqi, zhouchengming1,
	Miroslav Benes

On Tue 2015-12-01 15:28:19, Jiri Slaby wrote:
> On 12/01/2015, 03:13 PM, Petr Mladek wrote:
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -612,7 +612,19 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
> >  
> >  	patch = container_of(kobj, struct klp_patch, kobj);
> >  
> > -	mutex_lock(&klp_mutex);
> > +	/*
> > +	 * Avoid a deadlock with kobject_put(&patch->kobj) that is
> > +	 * called under klp_mutex. Bail out when the patch is not
> > +	 * longer registered.
> > +	 */
> > +	if (!mutex_trylock(&klp_mutex)) {
>
> This introduces false positives.
> Deleting/enabling/disabling/other_op_under_klp_mutex of an unrelated
> patch may now cause enabled_store to fail. Hence I don't like this
> approach at all.

Ah, there should have been

	while (!mutex_trylock(&klp_mutex)) {
		if (!klp_is_patch_registered(patch))
			return -EINVAL;
		/* Do not spin with trylock that bounce cache lines. */
		while (mutex_is_locked(&klp_mutex) &&
		       klp_is_patch_registered(patch))
			cond_resched();
	}

, so it should not produce false positives.

But I do not have a strong opinion about it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
  2015-11-30  3:54 [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch() Li Bin
  2015-11-30 13:53 ` Petr Mladek
  2015-12-01  1:11 ` Josh Poimboeuf
@ 2015-12-15  8:14 ` Miroslav Benes
  2 siblings, 0 replies; 10+ messages in thread
From: Miroslav Benes @ 2015-12-15  8:14 UTC (permalink / raw)
  To: Li Bin
  Cc: jpoimboe, sjenning, jikos, vojtech, live-patching, linux-kernel,
	guohanjun, dingtianhong, xiexiuqi, zhouchengming1


Hi,

[ sry for late responses. Two weeks of holiday and trying to go 
through all the emails... ]

On Mon, 30 Nov 2015, Li Bin wrote:

> There is a potential race as following:
> 
> CPU0                         |  CPU1
> -----------------------------|-----------------------------------
> enabled_store()              |  klp_unregister_patch()
>                              |  |-mutex_lock(&klp_mutex);
> |-mutex_lock(&klp_mutex);    |  |-klp_free_patch();
>                              |  |-mutex_unlock(&klp_mutex);
> |-[process the patch's state]|
> |-mutex_unlock(&klp_mutex)   |
> 
> Fix this race condition by adding klp_is_patch_registered() check in
> enabled_store() after get the lock klp_mutex.
> 
> Signed-off-by: Li Bin <huawei.libin@huawei.com>

Well, I think all the relevant feedback was mentioned in other replies. 
I'll add my two cents here than to reply to the respective ones.

1. as Josh pointed out this is a potential deadlock situation between 
   sysfs infrastructure and our unregister path. It has been already 
   discussed [1] and some solutions were proposed [2].

2. if I am not missing something it is purely theoretical now. 
   klp_unregister_patch is supposed to be called from module_exit function 
   to clean up after the patch module. There is no way today how this 
   function can be called. We take module reference (try_module_get() in 
   klp_register_patch()) and we do not put it anywhere. Because we can't 
   as Petr mentioned. Without a consistency model we cannot know if there 
   is a task in a module code or not.

So I think we can postpone the solution till there is a consistency model.

Regards,
Miroslav

[1] https://lkml.kernel.org/r/alpine.LNX.2.00.1502171728520.29490@pobox.suse.cz

[2]
 1. rework klp_unregister_patch and move kobject_put out of mutex 
    protected parts. This is what Jiri Slaby also proposed.

 2. as Petr Mladek said we could use mutex_trylock instead of mutex_lock 
    in enabled_store.

    I've even had the solution prepared in one of my git branches since 
    April so here it is just for the sake of completeness. It is based on 
    Jiri Slaby's kgraft consistency model for klp and it is maybe a 
    superset, but you get the idea.

-->8---

>From b854b3feac2883f5b0a17ea7a5c83b4389fcd6ad Mon Sep 17 00:00:00 2001
From: Miroslav Benes <mbenes@suse.cz>
Date: Thu, 2 Apr 2015 14:13:00 +0200
Subject: [PATCH] livepatch: allow removal of a disabled patch

Currently we do not allow patch module to unload since there is no
method to determine if a task is still running in the patched code.

The consistency model gives us the way because when the patching
finishes we know that all tasks were marked as safe to call a new
patched function. Thus every new call to the function calls the new
patched code and at the same time no task can be somewhere in the old
code, because it had to leave that code to be marked as safe.

We can safely let the patch module go after that.

Completion is used for synchronization between module removal and sysfs
infrastructure. Note that we still do not allow the removal for simple
model, that is no consistency model.

With this change a call to try_module_get() is moved to
__klp_enable_patch from klp_register_patch to make module reference
counting symmetric (module_put() is in a patch disable path). Also
mutex_lock() in enabled_store is changed to mutex_trylock() to prevent
a deadlock situation when klp_unregister_patch is called and sysfs
directories are removed.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
 include/linux/livepatch.h            |  3 ++
 kernel/livepatch/core.c              | 70 ++++++++++++++++++++++++------------
 samples/livepatch/livepatch-sample.c |  1 -
 3 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 5b0c5ca341ee..be2494759a6b 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -24,6 +24,7 @@
 #include <linux/module.h>
 #include <linux/ftrace.h>
 #include <linux/ptrace.h>
+#include <linux/completion.h>
 
 #if IS_ENABLED(CONFIG_LIVEPATCH)
 
@@ -163,6 +164,7 @@ struct klp_object {
  * @kobj:	kobject for sysfs resources
  * @state:	tracks patch-level application state
  * @cmodel:	cmodel_id's implementation
+ * @finish:	for waiting till it is safe to remove the patch module
  */
 struct klp_patch {
 	/* external */
@@ -175,6 +177,7 @@ struct klp_patch {
 	struct kobject kobj;
 	enum klp_state state;
 	struct klp_cmodel *cmodel;
+	struct completion finish;
 };
 
 #define klp_for_each_object(patch, obj) \
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index aa2a5c7c1301..046bf055c187 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -27,6 +27,7 @@
 #include <linux/ftrace.h>
 #include <linux/list.h>
 #include <linux/kallsyms.h>
+#include <linux/completion.h>
 #include <linux/livepatch.h>
 
 /**
@@ -514,6 +515,16 @@ static void klp_disable_patch_real(struct klp_patch *patch)
 	}
 
 	patch->state = KLP_DISABLED;
+
+	if (patch->cmodel->async_finish) {
+		/*
+		 * We need to wait for all the tasks to leave our rcu-protected
+		 * section in ftrace handler. Disabled funcs have been deleted
+		 * from the list, but there still could be readers who see them.
+		 */
+		synchronize_rcu();
+		module_put(patch->mod);
+	}
 }
 
 static int __klp_disable_patch(struct klp_patch *patch)
@@ -615,6 +626,18 @@ static int __klp_enable_patch(struct klp_patch *patch)
 	    list_prev_entry(patch, list)->state == KLP_DISABLED)
 		return -EBUSY;
 
+	/*
+	 * A reference is taken on the patch module to prevent it from being
+	 * unloaded.
+	 *
+	 * Note: For immediate (no consistency model) patches we don't allow
+	 * patch modules to unload since there is no safe/sane method to
+	 * determine if a thread is still running in the patched code contained
+	 * in the patch module once the ftrace registration is successful.
+	 */
+	if (!try_module_get(patch->mod))
+		return -ENODEV;
+
 	pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n");
 	add_taint(TAINT_LIVEPATCH, LOCKDEP_STILL_OK);
 
@@ -715,7 +738,21 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 	patch = container_of(kobj, struct klp_patch, kobj);
 
-	mutex_lock(&klp_mutex);
+	/*
+	 * There has to be mutex_trylock here to prevent a potential deadlock
+	 * between enabled_store() and klp_unregister_patch().
+	 * klp_unregister_patch() takes klp_mutex and destroys our sysfs
+	 * infrastructure (thus taking sysfs active protection). We do the
+	 * opposite here.
+	 */
+	if (!mutex_trylock(&klp_mutex))
+		return -EBUSY;
+
+	if (!klp_is_patch_registered(patch)) {
+		/* module with the patch could disappear meanwhile */
+		ret = -EINVAL;
+		goto err;
+	}
 
 	if (val == patch->state) {
 		/* already in requested state */
@@ -759,10 +796,10 @@ static struct attribute *klp_patch_attrs[] = {
 
 static void klp_kobj_release_patch(struct kobject *kobj)
 {
-	/*
-	 * Once we have a consistency model we'll need to module_put() the
-	 * patch module here.  See klp_register_patch() for more details.
-	 */
+	struct klp_patch *patch;
+
+	patch = container_of(kobj, struct klp_patch, kobj);
+	complete(&patch->finish);
 }
 
 static struct kobj_type klp_ktype_patch = {
@@ -834,6 +871,7 @@ static void klp_free_patch(struct klp_patch *patch)
 	if (!list_empty(&patch->list))
 		list_del(&patch->list);
 	kobject_put(&patch->kobj);
+	wait_for_completion(&patch->finish);
 }
 
 static int klp_init_func(struct klp_object *obj, struct klp_func *func)
@@ -934,6 +972,7 @@ static int klp_init_patch(struct klp_patch *patch)
 
 	patch->cmodel = cmodel;
 	patch->state = KLP_DISABLED;
+	init_completion(&patch->finish);
 
 	ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
 				   klp_root_kobj, "%s", patch->mod->name);
@@ -999,33 +1038,20 @@ EXPORT_SYMBOL_GPL(klp_unregister_patch);
  * Initializes the data structure associated with the patch and
  * creates the sysfs interface.
  *
+ * There is no need to take the reference on the patch module here. It is done
+ * later when the patch is enabled.
+ *
  * Return: 0 on success, otherwise error
  */
 int klp_register_patch(struct klp_patch *patch)
 {
-	int ret;
-
 	if (!klp_initialized())
 		return -ENODEV;
 
 	if (!patch || !patch->mod)
 		return -EINVAL;
 
-	/*
-	 * A reference is taken on the patch module to prevent it from being
-	 * unloaded.  Right now, we don't allow patch modules to unload since
-	 * there is currently no method to determine if a thread is still
-	 * running in the patched code contained in the patch module once
-	 * the ftrace registration is successful.
-	 */
-	if (!try_module_get(patch->mod))
-		return -ENODEV;
-
-	ret = klp_init_patch(patch);
-	if (ret)
-		module_put(patch->mod);
-
-	return ret;
+	return klp_init_patch(patch);
 }
 EXPORT_SYMBOL_GPL(klp_register_patch);
 
diff --git a/samples/livepatch/livepatch-sample.c b/samples/livepatch/livepatch-sample.c
index 25289083deac..85e9a87ecd8e 100644
--- a/samples/livepatch/livepatch-sample.c
+++ b/samples/livepatch/livepatch-sample.c
@@ -83,7 +83,6 @@ static int livepatch_init(void)
 
 static void livepatch_exit(void)
 {
-	WARN_ON(klp_disable_patch(&patch));
 	WARN_ON(klp_unregister_patch(&patch));
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-12-15  8:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-30  3:54 [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch() Li Bin
2015-11-30 13:53 ` Petr Mladek
2015-12-01  1:11 ` Josh Poimboeuf
2015-12-01  2:46   ` libin
2015-12-01  8:50   ` Jiri Slaby
2015-12-01 14:13     ` Petr Mladek
2015-12-01 14:28       ` Jiri Slaby
2015-12-01 16:57         ` Petr Mladek
2015-12-01 15:53     ` Josh Poimboeuf
2015-12-15  8:14 ` Miroslav Benes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.