linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH,RFC] livepatch: fix race between fork and klp_reverse_transition
@ 2022-07-20 16:10 Rik van Riel
  2022-07-21 23:23 ` Song Liu
  2022-07-22 15:30 ` Petr Mladek
  0 siblings, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2022-07-20 16:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: live-patching, kernel-team, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Breno Leitao

When a KLP fails to apply, klp_reverse_transition will clear the
TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
which are not on the task list yet.

Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
parent to the child early on, in dup_task_struct -> setup_thread_stack.

Much later, klp_copy_process will set child->patch_state to match
that of the parent.

However, the parent's patch_state may have been changed by KLP loading
or unloading since it was initially copied over into the child.

This results in the KLP code occasionally hitting this warning in
klp_complete_transition:

        for_each_process_thread(g, task) {
                WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
                task->patch_state = KLP_UNDEFINED;
        }

This patch will set, or clear, the TIF_PATCH_PENDING flag in the child
process depending on whether or not it is needed at the time
klp_copy_process is called, at a point in copy_process where the
tasklist_lock is held exclusively, preventing races with the KLP
code.

This should prevent this warning from triggering again in the
future.

I have not yet figured out whether this would also help with races in
the other direction, where the child process fails to have TIF_PATCH_PENDING
set and somehow misses a transition, or whether the retries in
klp_try_complete_transition would catch that task and help it transition
later.

Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: Breno Leitao <leitao@debian.org>
---
 kernel/livepatch/transition.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 5d03a2ad1066..7a90ad5e9224 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -612,7 +612,15 @@ void klp_copy_process(struct task_struct *child)
 {
 	child->patch_state = current->patch_state;
 
-	/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
+	/*
+	 * The parent process may have gone through a KLP transition since
+	 * the thread flag was copied in setup_thread_stack earlier. Set
+	 * the flag according to whether this task needs a KLP transition.
+	 */
+	if (child->patch_state != klp_target_state)
+		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
+	else
+		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH,RFC] livepatch: fix race between fork and klp_reverse_transition
  2022-07-20 16:10 [PATCH,RFC] livepatch: fix race between fork and klp_reverse_transition Rik van Riel
@ 2022-07-21 23:23 ` Song Liu
  2022-07-22 15:30 ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: Song Liu @ 2022-07-21 23:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: open list, live-patching, Kernel Team, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Breno Leitao

On Wed, Jul 20, 2022 at 9:20 AM Rik van Riel <riel@surriel.com> wrote:
>
> When a KLP fails to apply, klp_reverse_transition will clear the
> TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> which are not on the task list yet.
>
> Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
> parent to the child early on, in dup_task_struct -> setup_thread_stack.
>
> Much later, klp_copy_process will set child->patch_state to match
> that of the parent.
>
> However, the parent's patch_state may have been changed by KLP loading
> or unloading since it was initially copied over into the child.
>
> This results in the KLP code occasionally hitting this warning in
> klp_complete_transition:
>
>         for_each_process_thread(g, task) {
>                 WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
>                 task->patch_state = KLP_UNDEFINED;
>         }
>
> This patch will set, or clear, the TIF_PATCH_PENDING flag in the child
> process depending on whether or not it is needed at the time
> klp_copy_process is called, at a point in copy_process where the
> tasklist_lock is held exclusively, preventing races with the KLP
> code.
>
> This should prevent this warning from triggering again in the
> future.
>
> I have not yet figured out whether this would also help with races in
> the other direction, where the child process fails to have TIF_PATCH_PENDING
> set and somehow misses a transition, or whether the retries in
> klp_try_complete_transition would catch that task and help it transition
> later.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: Breno Leitao <leitao@debian.org>

LGTM!
Acked-by: Song Liu <song@kernel.org>

> ---
>  kernel/livepatch/transition.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 5d03a2ad1066..7a90ad5e9224 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -612,7 +612,15 @@ void klp_copy_process(struct task_struct *child)
>  {
>         child->patch_state = current->patch_state;
>
> -       /* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
> +       /*
> +        * The parent process may have gone through a KLP transition since
> +        * the thread flag was copied in setup_thread_stack earlier. Set
> +        * the flag according to whether this task needs a KLP transition.
> +        */
> +       if (child->patch_state != klp_target_state)
> +               set_tsk_thread_flag(child, TIF_PATCH_PENDING);
> +       else
> +               clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
>  }
>
>  /*
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH,RFC] livepatch: fix race between fork and klp_reverse_transition
  2022-07-20 16:10 [PATCH,RFC] livepatch: fix race between fork and klp_reverse_transition Rik van Riel
  2022-07-21 23:23 ` Song Liu
@ 2022-07-22 15:30 ` Petr Mladek
  2022-07-22 19:01   ` [PATCH v2] " Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2022-07-22 15:30 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, live-patching, kernel-team, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

On Wed 2022-07-20 12:10:23, Rik van Riel wrote:
> When a KLP fails to apply, klp_reverse_transition will clear the
> TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> which are not on the task list yet.
> 
> Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
> parent to the child early on, in dup_task_struct -> setup_thread_stack.
> 
> Much later, klp_copy_process will set child->patch_state to match
> that of the parent.
> 
> However, the parent's patch_state may have been changed by KLP loading
> or unloading since it was initially copied over into the child.
> 
> This results in the KLP code occasionally hitting this warning in
> klp_complete_transition:
> 
>         for_each_process_thread(g, task) {
>                 WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
>                 task->patch_state = KLP_UNDEFINED;
>         }

I see.

> This patch will set, or clear, the TIF_PATCH_PENDING flag in the child
> process depending on whether or not it is needed at the time
> klp_copy_process is called, at a point in copy_process where the
> tasklist_lock is held exclusively, preventing races with the KLP
> code.
> 
> This should prevent this warning from triggering again in the
> future.
> 
> I have not yet figured out whether this would also help with races in
> the other direction, where the child process fails to have TIF_PATCH_PENDING
> set and somehow misses a transition, or whether the retries in
> klp_try_complete_transition would catch that task and help it transition
> later.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: Breno Leitao <leitao@debian.org>
> ---
>  kernel/livepatch/transition.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 5d03a2ad1066..7a90ad5e9224 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -612,7 +612,15 @@ void klp_copy_process(struct task_struct *child)
>  {
>  	child->patch_state = current->patch_state;
>  
> -	/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
> +	/*
> +	 * The parent process may have gone through a KLP transition since
> +	 * the thread flag was copied in setup_thread_stack earlier. Set
> +	 * the flag according to whether this task needs a KLP transition.
> +	 */
> +	if (child->patch_state != klp_target_state)
> +		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
> +	else
> +		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
>  }

I am afraid that it is more complicated.

If the parent process might have gone through a KLP transition
then also the transition might have finished and klp_target_state might be
KLP_UNDEFINED. We must not set TIF_PATCH_PENDING in this case.

Now, we might race with klp_complete_transition() at any stage. It
might be before or after setting task->patch_state = KLP_UNDEFINED.
And it might be before or after setting klp_target_state =
KLP_UNDEFINED.

The great thing is that we could not race with
klp_update_patch_state() that would be migrating current because
we are current.

So, the easiest solution would be to copy the flag from current
once again here:


/* Called from copy_process() during fork */
void klp_copy_process(struct task_struct *child)
{
	/*
	 * The parent process may have gone through a KLP transition since
	 * the thread flag was copied in setup_thread_stack earlier.
	 * Copy also the flag once again here.
	 *
	 * The operation is serialized against all klp_*_transition()
	 * operations by tasklist_lock. The only exception is
	 * klp_update_patch_state(current). But it could not race
	 * because we are current.
	 */
	if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
	else
		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);

	child->patch_state = current->patch_state;
}


I hope that I did not miss anything. It is Friday.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] livepatch: fix race between fork and klp_reverse_transition
  2022-07-22 15:30 ` Petr Mladek
@ 2022-07-22 19:01   ` Rik van Riel
  2022-07-25 13:32     ` Petr Mladek
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2022-07-22 19:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: linux-kernel, live-patching, kernel-team, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

v2: a better approach, suggested by Petr (thank you)
---8<---

When a KLP fails to apply, klp_reverse_transition will clear the
TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
which are not on the task list yet.

Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
parent to the child early on, in dup_task_struct -> setup_thread_stack.

Much later, klp_copy_process will set child->patch_state to match
that of the parent.

However, the parent's patch_state may have been changed by KLP loading
or unloading since it was initially copied over into the child.

This results in the KLP code occasionally hitting this warning in
klp_complete_transition:

        for_each_process_thread(g, task) {
                WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
                task->patch_state = KLP_UNDEFINED;
        }

This patch will set, or clear, the TIF_PATCH_PENDING flag in the child
process depending on whether or not it is needed at the time
klp_copy_process is called, at a point in copy_process where the
tasklist_lock is held exclusively, preventing races with the KLP
code.

This should prevent this warning from triggering again in the
future.

I have not yet figured out whether this would also help with races in
the other direction, where the child process fails to have TIF_PATCH_PENDING
set and somehow misses a transition, or whether the retries in
klp_try_complete_transition would catch that task and help it transition
later.

Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: Breno Leitao <leitao@debian.org>
---
 kernel/livepatch/transition.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 5d03a2ad1066..30187b1d8275 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -610,9 +610,23 @@ void klp_reverse_transition(void)
 /* Called from copy_process() during fork */
 void klp_copy_process(struct task_struct *child)
 {
-	child->patch_state = current->patch_state;
 
-	/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
+	/*
+	 * The parent process may have gone through a KLP transition since
+	 * the thread flag was copied in setup_thread_stack earlier. Bring
+	 * the task flag up to date with the parent here.
+	 *
+	 * The operation is serialized against all klp_*_transition()
+	 * operations by the tasklist_lock. The only exception is
+	 * klp_update_patch_state(current), but we cannot race with
+	 * that because we are current.
+	 */
+	if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
+		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
+	else
+		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
+
+	child->patch_state = current->patch_state;
 }
 
 /*
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] livepatch: fix race between fork and klp_reverse_transition
  2022-07-22 19:01   ` [PATCH v2] " Rik van Riel
@ 2022-07-25 13:32     ` Petr Mladek
  2022-07-25 13:49       ` [PATCH v3] " Rik van Riel
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2022-07-25 13:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, live-patching, kernel-team, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

On Fri 2022-07-22 15:01:06, Rik van Riel wrote:
> v2: a better approach, suggested by Petr (thank you)
> ---8<---
> 
> When a KLP fails to apply, klp_reverse_transition will clear the
> TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> which are not on the task list yet.
> 
> Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
> parent to the child early on, in dup_task_struct -> setup_thread_stack.
> 
> Much later, klp_copy_process will set child->patch_state to match
> that of the parent.
> 
> However, the parent's patch_state may have been changed by KLP loading
> or unloading since it was initially copied over into the child.
> 
> This results in the KLP code occasionally hitting this warning in
> klp_complete_transition:
> 
>         for_each_process_thread(g, task) {
>                 WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
>                 task->patch_state = KLP_UNDEFINED;
>         }
> 
> This patch will set, or clear, the TIF_PATCH_PENDING flag in the child
> process depending on whether or not it is needed at the time
> klp_copy_process is called, at a point in copy_process where the
> tasklist_lock is held exclusively, preventing races with the KLP
> code.
> 
> This should prevent this warning from triggering again in the
> future.
> 
> I have not yet figured out whether this would also help with races in
> the other direction, where the child process fails to have TIF_PATCH_PENDING
> set and somehow misses a transition, or whether the retries in
> klp_try_complete_transition would catch that task and help it transition
> later.

It should fix these races as well. Both task->patch_state and
TIF_PATCH_PENDING flag are almost always modified under tasklist_lock.

One exception is klp_update_patch_state(current) but it could not
race because klp_copy_process() is called under spinlock.
So that "current" can't sleep and can't get migrated in the middle of
klp_copy_process().

Another exception is klp_check_and_switch_task() that is called
under p->pi_lock. It prevents rescheduling and the task will be
migrated only when sleeping. As a result "current" again
can't get migrated inside klp_copy_process().

Finally, the state of "idle" tasks (idle_task(cpu)) is updated
without tasklist_lock. But they are not forked so that we are
on safe side.


> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: Breno Leitao <leitao@debian.org>

We should update the commit message and mention also the other
two locations where the state is manipulated without tasklist_lock.
I am sorry that I did not mention it on Friday.

Also we should remove "I have not figured yet whether". The patch
should fix these races as well.

With the above changes:

Reviewed-by: Petr Mladek <pmladek@suse.com>


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] livepatch: fix race between fork and klp_reverse_transition
  2022-07-25 13:32     ` Petr Mladek
@ 2022-07-25 13:49       ` Rik van Riel
  2022-07-27  0:10         ` Josh Poimboeuf
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2022-07-25 13:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: linux-kernel, live-patching, kernel-team, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

On Mon, 25 Jul 2022 15:32:22 +0200
Petr Mladek <pmladek@suse.com> wrote:

> We should update the commit message and mention also the other
> two locations where the state is manipulated without tasklist_lock.
> I am sorry that I did not mention it on Friday.

Done. Thank you for reviewing this patch so carefully!
I had looked at those other places in the code as well, but
do not have as complete a picture of the KLP code as you.

v2: a better approach, suggested by Petr (thank you)
v3: update changelog (thank you Petr)
---8<---
When a KLP fails to apply, klp_reverse_transition will clear the
TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
which are not on the task list yet.

Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
parent to the child early on, in dup_task_struct -> setup_thread_stack.

Much later, klp_copy_process will set child->patch_state to match
that of the parent.

However, the parent's patch_state may have been changed by KLP loading
or unloading since it was initially copied over into the child.

This results in the KLP code occasionally hitting this warning in
klp_complete_transition:

        for_each_process_thread(g, task) {
                WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
                task->patch_state = KLP_UNDEFINED;
        }

This patch will set, or clear, the TIF_PATCH_PENDING flag in the child
process depending on whether or not it is needed at the time
klp_copy_process is called, at a point in copy_process where the
tasklist_lock is held exclusively, preventing races with the KLP
code.

The KLP code does have a few places where the state is changed
without the tasklist_lock held, but those should not cause
problems because klp_update_patch_state(current) cannot be
called while the current task is in the middle of fork,
klp_check_and_switch_task() which is called under the pi_lock,
which prevents rescheduling, and manipulation of the patch
state of idle tasks, which do not fork.

This should prevent this warning from triggering again in the
future.

Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/livepatch/transition.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 5d03a2ad1066..30187b1d8275 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -610,9 +610,23 @@ void klp_reverse_transition(void)
 /* Called from copy_process() during fork */
 void klp_copy_process(struct task_struct *child)
 {
-	child->patch_state = current->patch_state;
 
-	/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
+	/*
+	 * The parent process may have gone through a KLP transition since
+	 * the thread flag was copied in setup_thread_stack earlier. Bring
+	 * the task flag up to date with the parent here.
+	 *
+	 * The operation is serialized against all klp_*_transition()
+	 * operations by the tasklist_lock. The only exception is
+	 * klp_update_patch_state(current), but we cannot race with
+	 * that because we are current.
+	 */
+	if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
+		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
+	else
+		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
+
+	child->patch_state = current->patch_state;
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] livepatch: fix race between fork and klp_reverse_transition
  2022-07-25 13:49       ` [PATCH v3] " Rik van Riel
@ 2022-07-27  0:10         ` Josh Poimboeuf
  2022-07-27  0:26           ` Rik van Riel
  2022-07-27 14:24           ` [PATCH v4] livepatch: fix race between fork and KLP transition Rik van Riel
  0 siblings, 2 replies; 13+ messages in thread
From: Josh Poimboeuf @ 2022-07-27  0:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Petr Mladek, linux-kernel, live-patching, kernel-team,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

On Mon, Jul 25, 2022 at 09:49:19AM -0400, Rik van Riel wrote:
> When a KLP fails to apply, klp_reverse_transition will clear the
> TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> which are not on the task list yet.

This paragraph and $SUBJECT both talk about a reverse transition.  Isn't
it also possible to race on a normal (forward) transition?

> Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
> parent to the child early on, in dup_task_struct -> setup_thread_stack.
> 
> Much later, klp_copy_process will set child->patch_state to match
> that of the parent.
> 
> However, the parent's patch_state may have been changed by KLP loading
> or unloading since it was initially copied over into the child.
> 
> This results in the KLP code occasionally hitting this warning in
> klp_complete_transition:
> 
>         for_each_process_thread(g, task) {
>                 WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
>                 task->patch_state = KLP_UNDEFINED;
>         }
> 
> This patch will set, or clear, the TIF_PATCH_PENDING flag in the child
> process depending on whether or not it is needed at the time
> klp_copy_process is called, at a point in copy_process where the
> tasklist_lock is held exclusively, preventing races with the KLP
> code.

Use imperative language, i.e. no "This patch".  See
Documentation/process/submitting-patches.rst
> 
> The KLP code does have a few places where the state is changed
> without the tasklist_lock held, but those should not cause
> problems because klp_update_patch_state(current) cannot be
> called while the current task is in the middle of fork,
> klp_check_and_switch_task() which is called under the pi_lock,
> which prevents rescheduling, and manipulation of the patch
> state of idle tasks, which do not fork.
> 
> This should prevent this warning from triggering again in the
> future.
> 

Fixes: d83a7cb375ee ("livepatch: change to a per-task consistency model")

> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: Breno Leitao <leitao@debian.org>
> Reviewed-by: Petr Mladek <pmladek@suse.com>

With the above minor things fixed:

Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>

-- 
Josh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] livepatch: fix race between fork and klp_reverse_transition
  2022-07-27  0:10         ` Josh Poimboeuf
@ 2022-07-27  0:26           ` Rik van Riel
  2022-07-28 15:20             ` Petr Mladek
  2022-07-27 14:24           ` [PATCH v4] livepatch: fix race between fork and KLP transition Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2022-07-27  0:26 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, linux-kernel, live-patching, kernel-team,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

[-- Attachment #1: Type: text/plain, Size: 3212 bytes --]

On Tue, 2022-07-26 at 17:10 -0700, Josh Poimboeuf wrote:
> On Mon, Jul 25, 2022 at 09:49:19AM -0400, Rik van Riel wrote:
> > When a KLP fails to apply, klp_reverse_transition will clear the
> > TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> > which are not on the task list yet.
> 
> This paragraph and $SUBJECT both talk about a reverse transition. 
> Isn't
> it also possible to race on a normal (forward) transition?

I don't know whether the race is also possible on a forward
transition.  If the parent task has transitioned, will
the child have, as well, by the time we reach the end of fork?

I suppose the only way the parent task can transition while
inside fork would be if none of the functions in its stack
need to be transitioned, and at that point the child process
would automatically be safe, too?

That would make copying the KLP transition state from parent to
child safe on a forward transition, too.

Am I overlooking anything?

However, we have only observed this warning on reverse transitions
for some reason.

> > Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
> > parent to the child early on, in dup_task_struct ->
> > setup_thread_stack.
> > 
> > Much later, klp_copy_process will set child->patch_state to match
> > that of the parent.
> > 
> > However, the parent's patch_state may have been changed by KLP
> > loading
> > or unloading since it was initially copied over into the child.
> > 
> > This results in the KLP code occasionally hitting this warning in
> > klp_complete_transition:
> > 
> >         for_each_process_thread(g, task) {
> >                 WARN_ON_ONCE(test_tsk_thread_flag(task,
> > TIF_PATCH_PENDING));
> >                 task->patch_state = KLP_UNDEFINED;
> >         }
> > 
> > This patch will set, or clear, the TIF_PATCH_PENDING flag in the
> > child
> > process depending on whether or not it is needed at the time
> > klp_copy_process is called, at a point in copy_process where the
> > tasklist_lock is held exclusively, preventing races with the KLP
> > code.
> 
> Use imperative language, i.e. no "This patch".  See
> Documentation/process/submitting-patches.rst
> > 

Will do. I'll send a v4 tomorrow.

> > The KLP code does have a few places where the state is changed
> > without the tasklist_lock held, but those should not cause
> > problems because klp_update_patch_state(current) cannot be
> > called while the current task is in the middle of fork,
> > klp_check_and_switch_task() which is called under the pi_lock,
> > which prevents rescheduling, and manipulation of the patch
> > state of idle tasks, which do not fork.
> > 
> > This should prevent this warning from triggering again in the
> > future.
> > 
> 
> Fixes: d83a7cb375ee ("livepatch: change to a per-task consistency
> model")
> 
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> > Reported-by: Breno Leitao <leitao@debian.org>
> > Reviewed-by: Petr Mladek <pmladek@suse.com>
> 
> With the above minor things fixed:
> 
> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
> 

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4] livepatch: fix race between fork and KLP transition
  2022-07-27  0:10         ` Josh Poimboeuf
  2022-07-27  0:26           ` Rik van Riel
@ 2022-07-27 14:24           ` Rik van Riel
  2022-07-28 15:37             ` Petr Mladek
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2022-07-27 14:24 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, linux-kernel, live-patching, kernel-team,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

v4: address changelog comments by Josh (thank you)

---8<---
When a KLP fails to apply, klp_reverse_transition will clear the
TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
which are not on the task list yet. A similar race is possible
for normal (forward) transitions, where TIF_PATCH_PENDING gets
copied to the child, then later cleared in the parent.

Meanwhile, fork will copy over the TIF_PATCH_PENDING flag from the
parent to the child early on, in dup_task_struct -> setup_thread_stack.

Much later, klp_copy_process will set child->patch_state to match
that of the parent.

However, the parent's patch_state may have been changed by KLP loading
or unloading since it was initially copied over into the child.

This results in the KLP code occasionally hitting this warning in
klp_complete_transition:

        for_each_process_thread(g, task) {
                WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
                task->patch_state = KLP_UNDEFINED;
        }

Set, or clear, the TIF_PATCH_PENDING flag in the child task
depending on whether or not it is needed at the time
klp_copy_process is called, at a point in copy_process where the
tasklist_lock is held exclusively, preventing races with the KLP
code.

The KLP code does have a few places where the state is changed
without the tasklist_lock held, but those should not cause
problems because klp_update_patch_state(current) cannot be
called while the current task is in the middle of fork,
klp_check_and_switch_task() which is called under the pi_lock,
which prevents rescheduling, and manipulation of the patch
state of idle tasks, which do not fork.

This should prevent this warning from triggering again in the
future, and close the race for both normal and reverse transitions.

Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Fixes: d83a7cb375ee ("livepatch: change to a per-task consistency model")
Cc: stable@kernel.org
---
 kernel/livepatch/transition.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 5d03a2ad1066..30187b1d8275 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -610,9 +610,23 @@ void klp_reverse_transition(void)
 /* Called from copy_process() during fork */
 void klp_copy_process(struct task_struct *child)
 {
-	child->patch_state = current->patch_state;
 
-	/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
+	/*
+	 * The parent process may have gone through a KLP transition since
+	 * the thread flag was copied in setup_thread_stack earlier. Bring
+	 * the task flag up to date with the parent here.
+	 *
+	 * The operation is serialized against all klp_*_transition()
+	 * operations by the tasklist_lock. The only exception is
+	 * klp_update_patch_state(current), but we cannot race with
+	 * that because we are current.
+	 */
+	if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
+		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
+	else
+		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
+
+	child->patch_state = current->patch_state;
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] livepatch: fix race between fork and klp_reverse_transition
  2022-07-27  0:26           ` Rik van Riel
@ 2022-07-28 15:20             ` Petr Mladek
  0 siblings, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2022-07-28 15:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Josh Poimboeuf, linux-kernel, live-patching, kernel-team,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

On Tue 2022-07-26 20:26:41, Rik van Riel wrote:
> On Tue, 2022-07-26 at 17:10 -0700, Josh Poimboeuf wrote:
> > On Mon, Jul 25, 2022 at 09:49:19AM -0400, Rik van Riel wrote:
> > > When a KLP fails to apply, klp_reverse_transition will clear the
> > > TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> > > which are not on the task list yet.
> > 
> > This paragraph and $SUBJECT both talk about a reverse transition. 
> > Isn't
> > it also possible to race on a normal (forward) transition?
> 
> I don't know whether the race is also possible on a forward
> transition.  If the parent task has transitioned, will
> the child have, as well, by the time we reach the end of fork?

I think that the race should be possible also with the forward
transition. I do not see what would prevent it.


> I suppose the only way the parent task can transition while
> inside fork would be if none of the functions in its stack
> need to be transitioned, and at that point the child process
> would automatically be safe, too?

IMHO, these races might be dangerous only when fork() calls
a function on the way out that is livepatched but it was not
on the stack when the process was copied.

Anyway, the patch should make sure that task->patch_state and
TIF_PATCH_PENTING are always consitent when the child is added
to the global task list. So, we should always be on the safe side.


> However, we have only observed this warning on reverse transitions
> for some reason.

IMHO, it is because the race during the forward transition is
kind of "self-healing":

parent:				worker:

  fork()
    #copy set TIF_PATCH_PENDING
    # schedule


				  klp_try_complete_transition()
				     clear_bit(parent, TIF_PATCH_PENDING);
				     parent->patch_state = klp_target_state;

   # running again
   # copy already migrated parent->patch_state



later:

     clear_bit(child, TIF_PATCH_PENDING);
     child->patch_state = klp_target_state;

As a result, child->patch_state will be updated twice
to klp_target_state.



The problematic situation during revert:


parent:					another process:

   # migrate parent
   clear_bit(parent, TIF_PATCH_PENDING);
   parent->patch_state = klp_target_state;

   fork()
     #copy cleared TIF_PATCH_PENDING

					klp_revert_patch()
					    # invert @klp_target_state
					    set_bit(parent, TIF_PATCH_PENDING)

     # copy parent->patch_state that
       needs migration once again

   # migrated once again after revert
   clear_bit(parent, TIF_PATCH_PENDING);
   parent->patch_state = klp_target_state;

WARNING: child will never get migrated because it copied the cleared
	 TIF_PATCH_PENDING before @klp_target_state was inverted

Resume:

It is great that the race was found and fixed.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] livepatch: fix race between fork and KLP transition
  2022-07-27 14:24           ` [PATCH v4] livepatch: fix race between fork and KLP transition Rik van Riel
@ 2022-07-28 15:37             ` Petr Mladek
  2022-08-02 20:07               ` Rik van Riel
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2022-07-28 15:37 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Josh Poimboeuf, linux-kernel, live-patching, kernel-team,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

On Wed 2022-07-27 10:24:37, Rik van Riel wrote:
> v4: address changelog comments by Josh (thank you)
> 
> ---8<---
> When a KLP fails to apply, klp_reverse_transition will clear the
> TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> which are not on the task list yet.

It actually is not true. klp_reverse_transtion() clears TIF_PATCH_FLAG only
temporary when it waits until all processes leave the ftrace
handler. It sets TIF_PATCH_FLAG once again for all tasks by calling
klp_start_transition().

The difference is important. The WARN_ON_ONCE() in
klp_complete_transition() will be printed when fork() copied
TIF_PATCH_FLAG before it was set again.

Anyway, the important thing is that TIF_PATCH_FLAG and task->patch_state
might be incompatible because fork() copies them at different times.

klp_copy_process() must make sure that they are in sync. And
it must be done under tasklist_lock when the child is added
to the global task list.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] livepatch: fix race between fork and KLP transition
  2022-07-28 15:37             ` Petr Mladek
@ 2022-08-02 20:07               ` Rik van Riel
  2022-08-04 13:06                 ` Petr Mladek
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2022-08-02 20:07 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, linux-kernel, live-patching, kernel-team,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

On Thu, 2022-07-28 at 17:37 +0200, Petr Mladek wrote:
> On Wed 2022-07-27 10:24:37, Rik van Riel wrote:
> > v4: address changelog comments by Josh (thank you)
> > 
> > ---8<---
> > When a KLP fails to apply, klp_reverse_transition will clear the
> > TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> > which are not on the task list yet.
> 
> It actually is not true. klp_reverse_transtion() clears
> TIF_PATCH_FLAG only
> temporary when it waits until all processes leave the ftrace
> handler. It sets TIF_PATCH_FLAG once again for all tasks by calling
> klp_start_transition().
> 
> The difference is important. The WARN_ON_ONCE() in
> klp_complete_transition() will be printed when fork() copied
> TIF_PATCH_FLAG before it was set again.
> 
> Anyway, the important thing is that TIF_PATCH_FLAG and task-
> >patch_state
> might be incompatible because fork() copies them at different times.
> 
> klp_copy_process() must make sure that they are in sync. And
> it must be done under tasklist_lock when the child is added
> to the global task list.

Hmmm, how should this be addressed in the changelog?

Should I just remove most of that paragraph and leave it
at "there can be a race"?

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] livepatch: fix race between fork and KLP transition
  2022-08-02 20:07               ` Rik van Riel
@ 2022-08-04 13:06                 ` Petr Mladek
  0 siblings, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2022-08-04 13:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Josh Poimboeuf, linux-kernel, live-patching, kernel-team,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Breno Leitao

On Tue 2022-08-02 16:07:08, Rik van Riel wrote:
> On Thu, 2022-07-28 at 17:37 +0200, Petr Mladek wrote:
> > On Wed 2022-07-27 10:24:37, Rik van Riel wrote:
> > > v4: address changelog comments by Josh (thank you)
> > > 
> > > ---8<---
> > > When a KLP fails to apply, klp_reverse_transition will clear the
> > > TIF_PATCH_PENDING flag on all tasks, except for newly created tasks
> > > which are not on the task list yet.
> > 
> > It actually is not true. klp_reverse_transtion() clears
> > TIF_PATCH_FLAG only
> > temporary when it waits until all processes leave the ftrace
> > handler. It sets TIF_PATCH_FLAG once again for all tasks by calling
> > klp_start_transition().
> > 
> > The difference is important. The WARN_ON_ONCE() in
> > klp_complete_transition() will be printed when fork() copied
> > TIF_PATCH_FLAG before it was set again.
> > 
> > Anyway, the important thing is that TIF_PATCH_FLAG and task-
> > >patch_state
> > might be incompatible because fork() copies them at different times.
> > 
> > klp_copy_process() must make sure that they are in sync. And
> > it must be done under tasklist_lock when the child is added
> > to the global task list.
> 
> Hmmm, how should this be addressed in the changelog?
> 
> Should I just remove most of that paragraph and leave it
> at "there can be a race"?

It would be nice to somehow summarize what I wrote. I mean to explain
why the problem is easier to see with revert and not with forward
transition.

It is because TIF_PATCH_FLAG might stay cleared in the child even
when it was set again in the parent by the klp_revert_transtion().
As a result, the child will never get transition back to the reverted
state.

The problem is hard to hit during the forward transition because
child might have TIF_PATCH_FLAG still set even when
it might later copy an already migrated task->patch_state
when parent gets migrated in the race window. In this case,
the TIF_PATCH_FLAG will get cleared when the child returns
from fork and all will be good.

In each case, the inconsistent state is there even during
the forward transition. But it would be caught only when
the entire transition is finished during the rather small
race window.

The patch should fix the race in any direction.

I could provide even better description after I am back
from vacation on Aug 22.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-04 13:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-20 16:10 [PATCH,RFC] livepatch: fix race between fork and klp_reverse_transition Rik van Riel
2022-07-21 23:23 ` Song Liu
2022-07-22 15:30 ` Petr Mladek
2022-07-22 19:01   ` [PATCH v2] " Rik van Riel
2022-07-25 13:32     ` Petr Mladek
2022-07-25 13:49       ` [PATCH v3] " Rik van Riel
2022-07-27  0:10         ` Josh Poimboeuf
2022-07-27  0:26           ` Rik van Riel
2022-07-28 15:20             ` Petr Mladek
2022-07-27 14:24           ` [PATCH v4] livepatch: fix race between fork and KLP transition Rik van Riel
2022-07-28 15:37             ` Petr Mladek
2022-08-02 20:07               ` Rik van Riel
2022-08-04 13:06                 ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).