linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: gor@linux.ibm.com, jpoimboe@redhat.com, jikos@kernel.org,
	mbenes@suse.cz, mingo@kernel.org, linux-kernel@vger.kernel.org,
	joe.lawrence@redhat.com, fweisbec@gmail.com, tglx@linutronix.de,
	hca@linux.ibm.com, svens@linux.ibm.com, sumanthk@linux.ibm.com,
	live-patching@vger.kernel.org, paulmck@kernel.org,
	rostedt@goodmis.org, x86@kernel.org
Subject: Re: [RFC][PATCH v2 09/11] context_tracking,livepatch: Dont disturb NOHZ_FULL
Date: Wed, 6 Oct 2021 10:12:17 +0200	[thread overview]
Message-ID: <YV1aYaHEynjSAUuI@alley> (raw)
In-Reply-To: <20210929152429.067060646@infradead.org>

On Wed 2021-09-29 17:17:32, Peter Zijlstra wrote:
> Using the new context_tracking infrastructure, avoid disturbing
> userspace tasks when context tracking is enabled.
> 
> When context_tracking_set_cpu_work() returns true, we have the
> guarantee that klp_update_patch_state() is called from noinstr code
> before it runs normal kernel code. This covers
> syscall/exceptions/interrupts and NMI entry.

This patch touches the most tricky (lockless) parts of the livepatch code.
I always have to refresh my head about all the dependencies.

Sigh, I guess that the livepatch code looks over complicated to you.

The main problem is that we want to migrate tasks only when they
are not inside any livepatched function. It allows to do semantic
changes which is needed by some sort of critical security fixes.


> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -55,15 +56,13 @@ static noinstr void ct_exit_user_work(struct
>  {
>  	unsigned int work = arch_atomic_read(&ct->work);
>  
> -#if 0
> -	if (work & CT_WORK_n) {
> +	if (work & CT_WORK_KLP) {
>  		/* NMI happens here and must still do/finish CT_WORK_n */
> -		do_work_n();
> +		__klp_update_patch_state(current);
>  
>  		smp_mb__before_atomic();
> -		arch_atomic_andnot(CT_WORK_n, &ct->work);
> +		arch_atomic_andnot(CT_WORK_KLP, &ct->work);
>  	}
> -#endif
>  
>  	smp_mb__before_atomic();
>  	arch_atomic_andnot(CT_SEQ_WORK, &ct->seq);
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -153,6 +154,11 @@ void klp_cancel_transition(void)
>  	klp_complete_transition();
>  }
>  
> +noinstr void __klp_update_patch_state(struct task_struct *task)
> +{
> +	task->patch_state = READ_ONCE(klp_target_state);
> +}
> +
>  /*
>   * Switch the patched state of the task to the set of functions in the target
>   * patch state.
> @@ -180,8 +186,10 @@ void klp_update_patch_state(struct task_
>  	 *    of func->transition, if klp_ftrace_handler() is called later on
>  	 *    the same CPU.  See __klp_disable_patch().
>  	 */
> -	if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
> +	if (test_tsk_thread_flag(task, TIF_PATCH_PENDING)) {

This would require smp_rmb() here. It will make sure that we will
read the right @klp_target_state when TIF_PATCH_PENDING is set.

, where @klp_target_state is set in klp_init_transition()
  and TIF_PATCH_PENDING is set in klp_start_transition()

There are actually two related smp_wmp() barriers between these two
assignments in:

	1st in klp_init_transition()
	2nd in __klp_enable_patch()

One would be enough for klp_update_patch_state(). But we need
both for klp_ftrace_handler(), see the smp_rmb() there.
In particular, they synchronize:

   + ops->func_stack vs.
   + func->transition vs.
   + current->patch_state


>  		task->patch_state = READ_ONCE(klp_target_state);

Note that smp_wmb() is not needed here because
klp_complete_transition() calls klp_synchronize_transition()
aka synchronize_rcu() before clearing klp_target_state.
This is why the original code worked.


> +		clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
> +	}
>  
>  	preempt_enable_notrace();
>  }
> @@ -270,15 +278,30 @@ static int klp_check_and_switch_task(str
>  {
>  	int ret;
>  
> -	if (task_curr(task))
> +	if (task_curr(task)) {
> +		/*
> +		 * This only succeeds when the task is in NOHZ_FULL user
> +		 * mode, the true return value guarantees any kernel entry
> +		 * will call klp_update_patch_state().
> +		 *
> +		 * XXX: ideally we'd simply return 0 here and leave
> +		 * TIF_PATCH_PENDING alone, to be fixed up by
> +		 * klp_update_patch_state(), except livepatching goes wobbly
> +		 * with 'pending' TIF bits on.
> +		 */
> +		if (context_tracking_set_cpu_work(task_cpu(task), CT_WORK_KLP))
> +			goto clear;

If I get it correctly, this will clear TIF_PATCH_PENDING immediately
but task->patch_state = READ_ONCE(klp_target_state) will be
done later by ct_exit_user_work().

This is a bit problematic:

  1. The global @klp_target_state is set to KLP_UNDEFINED when all
     processes have TIF_PATCH_PENDING is cleared. This is actually
     still fine because func->transition is cleared as well.
     As a result, current->patch_state is ignored in klp_ftrace_handler.

  2. The real problem happens when another livepatch is enabled.
     The global @klp_target_state is set to new value and
     func->transition is set again. In this case, the delayed
     ct_exit_user_work() might assign wrong value that might
     really be used by klp_ftrace_handler().


IMHO, the original solution from v1 was better. We only needed to
be careful when updating task->patch_state and clearing
TIF_PATCH_PENDING to avoid the race.

The following might work:

static int klp_check_and_switch_task(struct task_struct *task, void *arg)
{
	int ret;

	/*
	 * Stack is reliable only when the task is not running on any CPU,
	 * except for the task running this code.
	 */
	if (task_curr(task) && task != current) {
		/*
		 * This only succeeds when the task is in NOHZ_FULL user
		 * mode. Such a task might be migrated immediately. We
		 * only need to be careful to set task->patch_state before
		 * clearing TIF_PATCH_PENDING so that the task migrates
		 * itself when entring kernel in the meatime.
		 */
		if (is_ct_user(task)) {
			klp_update_patch_state(task);
			return 0;
		}

		return -EBUSY;
	}

	ret = klp_check_stack(task, arg);
	if (ret)
		return ret;

	/*
	 * The task neither is running on any CPU and nor it can get
	 * running. As a result, the ordering is not important and
	 * barrier is not needed.
	 */
	task->patch_state = klp_target_state;
	clear_tsk_thread_flag(task, TIF_PATCH_PENDING);

	return 0;
}

, where is_ct_user(task) would return true when task is running in
CONTEXT_USER. If I get the context_tracking API correctly then
it might be implemeted the following way:


#ifdef CONFIG_CONTEXT_TRACKING

/*
 * XXX: The value is reliable depending the context where this is called.
 * At least migrating between CPUs should get prevented.
 */
static __always_inline bool is_ct_user(struct task_struct *task)
{
	int seq;

	if (!context_tracking_enabled())
		return false;

	seq = __context_tracking_cpu_seq(task_cpu(task));
	return __context_tracking_seq_in_user(seq);
}

#else

static __always_inline bool is_ct_user(struct task_struct *task)
{
	return false;
}

#endif /* CONFIG_CONTEXT_TRACKING */

Best Regards,
Petr

>  		return -EBUSY;
> +	}
>  
>  	ret = klp_check_stack(task, arg);
>  	if (ret)
>  		return ret;
>  
> -	clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
>  	task->patch_state = klp_target_state;
> +clear:
> +	clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
>  	return 0;
>  }

  reply	other threads:[~2021-10-06  8:12 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29 15:17 [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 01/11] sched: Improve try_invoke_on_locked_down_task() Peter Zijlstra
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 02/11] sched,rcu: Rework try_invoke_on_locked_down_task() Peter Zijlstra
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 03/11] sched,livepatch: Use task_call_func() Peter Zijlstra
2021-10-05 11:40   ` Petr Mladek
2021-10-05 14:03     ` Peter Zijlstra
2021-10-06  8:59   ` Miroslav Benes
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 04/11] sched: Simplify wake_up_*idle*() Peter Zijlstra
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-10-13 14:32   ` [PATCH v2 04/11] " Qian Cai
2021-10-19  3:47     ` Qian Cai
2021-10-19  8:56       ` Peter Zijlstra
2021-10-19  9:10         ` Peter Zijlstra
2021-10-19 15:32           ` Qian Cai
2021-10-19 15:50             ` Peter Zijlstra
2021-10-19 19:22               ` Qian Cai
2021-10-19 20:27                 ` Peter Zijlstra
     [not found]   ` <CGME20211022134630eucas1p2e79e2816587d182c580459d567c1f2a9@eucas1p2.samsung.com>
2021-10-22 13:46     ` Marek Szyprowski
2021-09-29 15:17 ` [PATCH v2 05/11] sched,livepatch: Use wake_up_if_idle() Peter Zijlstra
2021-10-05 12:00   ` Petr Mladek
2021-10-06  9:16   ` Miroslav Benes
2021-10-07  9:18     ` Vasily Gorbik
2021-10-07 10:02       ` Peter Zijlstra
2021-10-13 19:37   ` Arnd Bergmann
2021-10-14 10:42     ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 06/11] context_tracking: Prefix user_{enter,exit}*() Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 07/11] context_tracking: Add an atomic sequence/state count Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 08/11] context_tracking,rcu: Replace RCU dynticks counter with context_tracking Peter Zijlstra
2021-09-29 18:37   ` Paul E. McKenney
2021-09-29 19:09     ` Peter Zijlstra
2021-09-29 19:11     ` Peter Zijlstra
2021-09-29 19:13     ` Peter Zijlstra
2021-09-29 19:24       ` Peter Zijlstra
2021-09-29 19:45         ` Paul E. McKenney
2021-09-29 18:54   ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 09/11] context_tracking,livepatch: Dont disturb NOHZ_FULL Peter Zijlstra
2021-10-06  8:12   ` Petr Mladek [this message]
2021-10-06  9:04     ` Peter Zijlstra
2021-10-06 10:29       ` Petr Mladek
2021-10-06 11:41         ` Peter Zijlstra
2021-10-06 11:48         ` Miroslav Benes
2021-09-29 15:17 ` [RFC][PATCH v2 10/11] livepatch: Remove klp_synchronize_transition() Peter Zijlstra
2021-10-06 12:30   ` Petr Mladek
2021-09-29 15:17 ` [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync() vs NOHZ_FULL Peter Zijlstra
2021-10-21 18:39   ` Marcelo Tosatti
2021-10-21 18:40     ` Marcelo Tosatti
2021-10-21 19:25     ` Peter Zijlstra
2021-10-21 19:57       ` Marcelo Tosatti
2021-10-21 20:18         ` Peter Zijlstra
2021-10-26 18:19           ` Marcelo Tosatti
2021-10-26 19:38             ` Peter Zijlstra
2021-09-29 18:03 ` [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Paul E. McKenney
2021-10-09 10:07 ` [tip: sched/core] sched,livepatch: Use wake_up_if_idle() tip-bot2 for Peter Zijlstra
2021-10-14 11:16 ` tip-bot2 for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YV1aYaHEynjSAUuI@alley \
    --to=pmladek@suse.com \
    --cc=fweisbec@gmail.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=mbenes@suse.cz \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sumanthk@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).