linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kvm@vger.kernel.org,
	linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org,
	rcu@vger.kernel.org, linux-kselftest@vger.kernel.org,
	"Nicolas Saenz Julienne" <nsaenzju@redhat.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Neeraj Upadhyay" <quic_neeraju@quicinc.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Uladzislau Rezki" <urezki@gmail.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Lorenzo Stoakes" <lstoakes@gmail.com>,
	"Josh Poimboeuf" <jpoimboe@kernel.org>,
	"Jason Baron" <jbaron@akamai.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Ard Biesheuvel" <ardb@kernel.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Juerg Haefliger" <juerg.haefliger@canonical.com>,
	"Nicolas Saenz Julienne" <nsaenz@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Nadav Amit" <namit@vmware.com>,
	"Dan Carpenter" <error27@gmail.com>,
	"Chuang Wang" <nashuiliang@gmail.com>,
	"Yang Jihong" <yangjihong1@huawei.com>,
	"Petr Mladek" <pmladek@suse.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	"Song Liu" <song@kernel.org>,
	"Julian Pidancet" <julian.pidancet@oracle.com>,
	"Tom Lendacky" <thomas.lendacky@amd.com>,
	"Dionna Glaze" <dionnaglaze@google.com>,
	"Thomas Weißschuh" <linux@weissschuh.net>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	"Yair Podemsky" <ypodemsk@redhat.com>
Subject: Re: [RFC PATCH v2 15/20] context-tracking: Introduce work deferral infrastructure
Date: Mon, 24 Jul 2023 16:52:19 +0200	[thread overview]
Message-ID: <ZL6QI4mV-NKlh4Ox@localhost.localdomain> (raw)
In-Reply-To: <20230720163056.2564824-16-vschneid@redhat.com>

Le Thu, Jul 20, 2023 at 05:30:51PM +0100, Valentin Schneider a écrit :
> +enum ctx_state {
> +	/* Following are values */
> +	CONTEXT_DISABLED	= -1,	/* returned by ct_state() if unknown */
> +	CONTEXT_KERNEL		= 0,
> +	CONTEXT_IDLE		= 1,
> +	CONTEXT_USER		= 2,
> +	CONTEXT_GUEST		= 3,
> +	CONTEXT_MAX             = 4,
> +};
> +
> +/*
> + * We cram three different things within the same atomic variable:
> + *
> + *                CONTEXT_STATE_END                        RCU_DYNTICKS_END
> + *                         |       CONTEXT_WORK_END                |
> + *                         |               |                       |
> + *                         v               v                       v
> + *         [ context_state ][ context work ][ RCU dynticks counter ]
> + *         ^                ^               ^
> + *         |                |               |
> + *         |        CONTEXT_WORK_START      |
> + * CONTEXT_STATE_START              RCU_DYNTICKS_START

Should the layout be displayed in reverse? Well at least I always picture
bitmaps in reverse, that's probably due to the direction of the shift arrows.
Not sure what is the usual way to picture it though...

> + */
> +
> +#define CT_STATE_SIZE (sizeof(((struct context_tracking *)0)->state) * BITS_PER_BYTE)
> +
> +#define CONTEXT_STATE_START 0
> +#define CONTEXT_STATE_END   (bits_per(CONTEXT_MAX - 1) - 1)

Since you have non overlapping *_START symbols, perhaps the *_END
are superfluous?

> +
> +#define RCU_DYNTICKS_BITS  (IS_ENABLED(CONFIG_CONTEXT_TRACKING_WORK) ? 16 : 31)
> +#define RCU_DYNTICKS_START (CT_STATE_SIZE - RCU_DYNTICKS_BITS)
> +#define RCU_DYNTICKS_END   (CT_STATE_SIZE - 1)
> +#define RCU_DYNTICKS_IDX   BIT(RCU_DYNTICKS_START)

Might be the right time to standardize and fix our naming:

CT_STATE_START,
CT_STATE_KERNEL,
CT_STATE_USER,
...
CT_WORK_START,
CT_WORK_*,
...
CT_RCU_DYNTICKS_START,
CT_RCU_DYNTICKS_IDX

> +bool ct_set_cpu_work(unsigned int cpu, unsigned int work)
> +{
> +	struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
> +	unsigned int old;
> +	bool ret = false;
> +
> +	preempt_disable();
> +
> +	old = atomic_read(&ct->state);
> +	/*
> +	 * Try setting the work until either
> +	 * - the target CPU no longer accepts any more deferred work
> +	 * - the work has been set
> +	 *
> +	 * NOTE: CONTEXT_GUEST intersects with CONTEXT_USER and CONTEXT_IDLE
> +	 * as they are regular integers rather than bits, but that doesn't
> +	 * matter here: if any of the context state bit is set, the CPU isn't
> +	 * in kernel context.
> +	 */
> +	while ((old & (CONTEXT_GUEST | CONTEXT_USER | CONTEXT_IDLE)) && !ret)

That may still miss a recent entry to userspace due to the first plain read, ending
with an undesired interrupt.

You need at least one cmpxchg. Well, of course that stays racy by nature because
between the cmpxchg() returning CONTEXT_KERNEL and the actual IPI raised and
received, the remote CPU may have gone to userspace already. But still it limits
a little the window.

Thanks.

> +		ret = atomic_try_cmpxchg(&ct->state, &old, old | (work << CONTEXT_WORK_START));
> +
> +	preempt_enable();
> +	return ret;
> +}
> +#else
> +static __always_inline void ct_work_flush(unsigned long work) { }
> +static __always_inline void ct_work_clear(struct context_tracking *ct) { }
> +#endif
> +
>  /*
>   * Record entry into an extended quiescent state.  This is only to be
>   * called when not already in an extended quiescent state, that is,
> @@ -88,7 +133,8 @@ static noinstr void ct_kernel_exit_state(int offset)
>  	 * next idle sojourn.
>  	 */
>  	rcu_dynticks_task_trace_enter();  // Before ->dynticks update!
> -	seq = ct_state_inc(offset);
> +	seq = ct_state_inc_clear_work(offset);
> +
>  	// RCU is no longer watching.  Better be in extended quiescent state!
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && (seq & RCU_DYNTICKS_IDX));
>  }
> @@ -100,7 +146,7 @@ static noinstr void ct_kernel_exit_state(int offset)
>   */
>  static noinstr void ct_kernel_enter_state(int offset)
>  {
> -	int seq;
> +	unsigned long seq;
>  
>  	/*
>  	 * CPUs seeing atomic_add_return() must see prior idle sojourns,
> @@ -108,6 +154,7 @@ static noinstr void ct_kernel_enter_state(int offset)
>  	 * critical section.
>  	 */
>  	seq = ct_state_inc(offset);
> +	ct_work_flush(seq);
>  	// RCU is now watching.  Better not be in an extended quiescent state!
>  	rcu_dynticks_task_trace_exit();  // After ->dynticks update!
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !(seq & RCU_DYNTICKS_IDX));
> diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
> index bae8f11070bef..fdb266f2d774b 100644
> --- a/kernel/time/Kconfig
> +++ b/kernel/time/Kconfig
> @@ -181,6 +181,11 @@ config CONTEXT_TRACKING_USER_FORCE
>  	  Say N otherwise, this option brings an overhead that you
>  	  don't want in production.
>  
> +config CONTEXT_TRACKING_WORK
> +	bool
> +	depends on HAVE_CONTEXT_TRACKING_WORK && CONTEXT_TRACKING_USER
> +	default y
> +
>  config NO_HZ
>  	bool "Old Idle dynticks config"
>  	help
> -- 
> 2.31.1
> 

  reply	other threads:[~2023-07-24 14:52 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20 16:30 [RFC PATCH v2 00/20] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 01/20] tracing/filters: Dynamically allocate filter_pred.regex Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 02/20] tracing/filters: Enable filtering a cpumask field by another cpumask Valentin Schneider
2023-07-26 19:41   ` Josh Poimboeuf
2023-07-27  9:46     ` Valentin Schneider
2023-07-29 19:09     ` Steven Rostedt
2023-07-31 11:19       ` Valentin Schneider
2023-07-31 15:48         ` Steven Rostedt
2023-07-20 16:30 ` [RFC PATCH v2 03/20] tracing/filters: Enable filtering a scalar field by a cpumask Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 04/20] tracing/filters: Enable filtering the CPU common " Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 05/20] tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU Valentin Schneider
2023-07-29 19:34   ` Steven Rostedt
2023-07-31 11:20     ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 06/20] tracing/filters: Optimise scalar vs cpumask filtering when the " Valentin Schneider
2023-07-29 19:55   ` Steven Rostedt
2023-07-31 11:20     ` Valentin Schneider
2023-07-31 12:07     ` Dan Carpenter
2023-07-31 15:54       ` Steven Rostedt
2023-07-31 16:03         ` Dan Carpenter
2023-07-31 17:20           ` Valentin Schneider
2023-07-31 18:16           ` Steven Rostedt
2023-07-20 16:30 ` [RFC PATCH v2 07/20] tracing/filters: Optimise CPU " Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 08/20] tracing/filters: Further optimise scalar vs cpumask comparison Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 09/20] tracing/filters: Document cpumask filtering Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 10/20] jump_label,module: Don't alloc static_key_mod for __ro_after_init keys Valentin Schneider
2023-07-28 22:04   ` Peter Zijlstra
2023-07-20 16:30 ` [RFC PATCH v2 11/20] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2023-07-28 15:33   ` Josh Poimboeuf
2023-07-31 11:16     ` Valentin Schneider
2023-07-31 21:36       ` Josh Poimboeuf
2023-07-31 21:46         ` Peter Zijlstra
2023-08-01 16:06           ` Josh Poimboeuf
2023-08-01 18:12             ` Peter Zijlstra
2023-07-20 16:30 ` [RFC PATCH v2 12/20] objtool: Warn about non __ro_after_init static key usage in .noinstr Valentin Schneider
2023-07-28 15:35   ` Josh Poimboeuf
2023-07-31 11:18     ` Valentin Schneider
2023-07-28 16:02   ` Josh Poimboeuf
2023-07-31 11:18     ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 13/20] context_tracking: Make context_tracking_key __ro_after_init Valentin Schneider
2023-07-28 16:00   ` Josh Poimboeuf
2023-07-31 11:16     ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 14/20] x86/kvm: Make kvm_async_pf_enabled __ro_after_init Valentin Schneider
2023-10-09 16:40   ` Maxim Levitsky
2023-07-20 16:30 ` [RFC PATCH v2 15/20] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2023-07-24 14:52   ` Frederic Weisbecker [this message]
2023-07-24 16:55     ` Valentin Schneider
2023-07-24 19:18       ` Frederic Weisbecker
2023-07-25 10:10         ` Valentin Schneider
2023-07-25 11:22           ` Frederic Weisbecker
2023-07-25 13:05             ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 16/20] rcu: Make RCU dynticks counter size configurable Valentin Schneider
2023-07-21  8:17   ` Valentin Schneider
2023-07-21 14:10     ` Paul E. McKenney
2023-07-21 15:08       ` Valentin Schneider
2023-07-21 16:09         ` Paul E. McKenney
2023-07-20 16:30 ` [RFC PATCH v2 17/20] rcutorture: Add a test config to torture test low RCU_DYNTICKS width Valentin Schneider
2023-07-20 19:53   ` Paul E. McKenney
2023-07-21  4:00     ` Paul E. McKenney
2023-07-21  7:58       ` Valentin Schneider
2023-07-21 14:07         ` Paul E. McKenney
2023-07-21 15:08           ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 18/20] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2023-07-25 10:49   ` Joel Fernandes
2023-07-25 13:36     ` Valentin Schneider
2023-07-25 17:41       ` Joel Fernandes
2023-07-25 13:39     ` Peter Zijlstra
2023-07-25 17:47       ` Joel Fernandes
2023-07-20 16:30 ` [RFC PATCH v2 19/20] context_tracking,x86: Add infrastructure to defer kernel TLBI Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2023-07-21 18:15   ` Nadav Amit
2023-07-24 11:32     ` Valentin Schneider
2023-07-24 17:40       ` Dave Hansen
2023-07-25 13:21         ` Peter Zijlstra
2023-07-25 14:03           ` Valentin Schneider
2023-07-25 16:37         ` Marcelo Tosatti
2023-07-25 17:12           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZL6QI4mV-NKlh4Ox@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=bristot@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dionnaglaze@google.com \
    --cc=error27@gmail.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jbaron@akamai.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=jpoimboe@kernel.org \
    --cc=juerg.haefliger@canonical.com \
    --cc=julian.pidancet@oracle.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@weissschuh.net \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=namit@vmware.com \
    --cc=nashuiliang@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=nsaenz@kernel.org \
    --cc=nsaenzju@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=qiang.zhang1211@gmail.com \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=urezki@gmail.com \
    --cc=vkuznets@redhat.com \
    --cc=vschneid@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yangjihong1@huawei.com \
    --cc=ypodemsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).