All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: "Valentin Schneider" <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kvm@vger.kernel.org,
	linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org,
	"Nicolas Saenz Julienne" <nsaenzju@redhat.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Uladzislau Rezki" <urezki@gmail.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Lorenzo Stoakes" <lstoakes@gmail.com>,
	"Josh Poimboeuf" <jpoimboe@kernel.org>,
	"Kees Cook" <keescook@chromium.org>,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Ard Biesheuvel" <ardb@kernel.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Juerg Haefliger" <juerg.haefliger@canonical.com>,
	"Nicolas Saenz Julienne" <nsaenz@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Nadav Amit" <namit@vmware.com>,
	"Dan Carpenter" <error27@gmail.com>,
	"Chuang Wang" <nashuiliang@gmail.com>,
	"Yang Jihong" <yangjihong1@huawei.com>,
	"Petr Mladek" <pmladek@suse.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	"Song Liu" <song@kernel.org>,
	"Julian Pidancet" <julian.pidancet@oracle.com>,
	"Tom Lendacky" <thomas.lendacky@amd.com>,
	"Dionna Glaze" <dionnaglaze@google.com>,
	"Thomas Weißschuh" <linux@weissschuh.net>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	"Yair Podemsky" <ypodemsk@redhat.com>
Subject: Re: [RFC PATCH 11/14] context-tracking: Introduce work deferral infrastructure
Date: Thu, 6 Jul 2023 09:39:16 -0700	[thread overview]
Message-ID: <4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop> (raw)
In-Reply-To: <ZKaoHrm0Fejb7kAl@lothringen>

On Thu, Jul 06, 2023 at 01:40:14PM +0200, Frederic Weisbecker wrote:
> On Thu, Jul 06, 2023 at 12:30:46PM +0100, Valentin Schneider wrote:
> > >> +		ret = atomic_try_cmpxchg(&ct->work, &old_work, old_work | work);
> > >> +
> > >> +	preempt_enable();
> > >> +	return ret;
> > >> +}
> > > [...]
> > >> @@ -100,14 +158,19 @@ static noinstr void ct_kernel_exit_state(int offset)
> > >>   */
> > >>  static noinstr void ct_kernel_enter_state(int offset)
> > >>  {
> > >> +	struct context_tracking *ct = this_cpu_ptr(&context_tracking);
> > >>      int seq;
> > >> +	unsigned int work;
> > >>
> > >> +	work = ct_work_fetch(ct);
> > >
> > > So this adds another fully ordered operation on user <-> kernel transition.
> > > How many such IPIs can we expect?
> > >
> > 
> > Despite having spent quite a lot of time on that question, I think I still
> > only have a hunch.
> > 
> > Poking around RHEL systems, I'd say 99% of the problematic IPIs are
> > instruction patching and TLB flushes.
> > 
> > Staring at the code, there's quite a lot of smp_calls for which it's hard
> > to say whether the target CPUs can actually be isolated or not (e.g. the
> > CPU comes from a cpumask shoved in a struct that was built using data from
> > another struct of uncertain origins), but then again some of them don't
> > need to hook into context_tracking.
> > 
> > Long story short: I /think/ we can consider that number to be fairly small,
> > but there could be more lurking in the shadows.
> 
> I guess it will still be time to reconsider the design if we ever reach such size.
> 
> > > If this is just about a dozen, can we stuff them in the state like in the
> > > following? We can potentially add more of them especially on 64 bits we could
> > > afford 30 different works, this is just shrinking the RCU extended quiescent
> > > state counter space. Worst case that can happen is that RCU misses 65535
> > > idle/user <-> kernel transitions and delays a grace period...
> > >
> > 
> > I'm trying to grok how this impacts RCU, IIUC most of RCU mostly cares about the
> > even/odd-ness of the thing, and rcu_gp_fqs() cares about the actual value
> > but only to check if it has changed over time (rcu_dynticks_in_eqs_since()
> > only does a !=).
> > 
> > I'm rephrasing here to make sure I get it - is it then that the worst case
> > here is 2^(dynticks_counter_size) transitions happen between saving the
> > dynticks snapshot and checking it again, so RCU waits some more?
> 
> That's my understanding as well but I have to defer on Paul to make sure I'm
> not overlooking something.

That does look plausible to me.

And yes, RCU really cares about whether its part of this counter has
been a multiple of two during a given interval of time, because this
indicates that the CPU has no pre-existing RCU readers still active.
One way that this can happen is for that value to be a multiple of two
at some point in time.  The other way that this can happen is for the
value to have changed.  No matter what the start and end values, if they
are different, the counter must necessarily have at least passed through
multiple of two in the meantime, again guaranteeing that any RCU readers
that around when the count was first fetched have now finished.

But we should take the machine's opinions much more seriously than we
take any of our own opinions.  Why not adjust RCU_DYNTICKS_IDX so as
to crank RCU's portion of this counter down to (say) two or three bits
and let rcutorture have at it on TREE04 or TREE07, both of which have
nohz_full CPUs?

Maybe also adjust mkinitrd.sh to make the user/kernel transitions more
frequent?

Please note that I do -not- recommend production use of a three-bit
(let alone a two-bit) RCU portion because this has a high probability
of excessively extending grace periods.  But it might be good to keep
a tiny counter as a debug option so that we regularly rcutorture it.

							Thanx, Paul

  reply	other threads:[~2023-07-06 16:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-05 18:12 [RFC PATCH 00/14] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 01/14] tracing/filters: Dynamically allocate filter_pred.regex Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 02/14] tracing/filters: Enable filtering a cpumask field by another cpumask Valentin Schneider
2023-07-05 18:54   ` Steven Rostedt
2023-07-05 18:12 ` [RFC PATCH 03/14] tracing/filters: Enable filtering a scalar field by a cpumask Valentin Schneider
2023-07-05 19:00   ` Steven Rostedt
2023-07-05 18:12 ` [RFC PATCH 04/14] tracing/filters: Enable filtering the CPU common " Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 05/14] tracing/filters: Document cpumask filtering Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 06/14] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 07/14] objtool: Warn about non __ro_after_init static key usage in .noinstr Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 08/14] BROKEN: context_tracking: Make context_tracking_key __ro_after_init Valentin Schneider
2023-07-05 20:41   ` Peter Zijlstra
2023-07-05 18:12 ` [RFC PATCH 09/14] x86/kvm: Make kvm_async_pf_enabled __ro_after_init Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 10/14] x86/sev-es: Make sev_es_enable_key __ro_after_init Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 11/14] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2023-07-05 22:23   ` Frederic Weisbecker
2023-07-05 22:41     ` Peter Zijlstra
2023-07-06  9:53       ` Frederic Weisbecker
2023-07-06 10:01     ` Frederic Weisbecker
2023-07-06 11:30     ` Valentin Schneider
2023-07-06 11:40       ` Frederic Weisbecker
2023-07-06 16:39         ` Paul E. McKenney [this message]
2023-07-06 18:05           ` Valentin Schneider
2023-07-06 11:55     ` Frederic Weisbecker
2023-07-05 22:39   ` Peter Zijlstra
2023-07-06 11:31     ` Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 12/14] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 13/14] context_tracking,x86: Add infrastructure to defer kernel TLBI Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 14/14] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2023-07-05 18:48 ` [RFC PATCH 00/14] context_tracking,x86: Defer some IPIs until a user->kernel transition Nadav Amit
2023-07-06 11:29   ` Valentin Schneider
2023-07-05 19:03 ` Steven Rostedt
2023-07-06 11:30   ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=bristot@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dionnaglaze@google.com \
    --cc=error27@gmail.com \
    --cc=frederic@kernel.org \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@kernel.org \
    --cc=juerg.haefliger@canonical.com \
    --cc=julian.pidancet@oracle.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@weissschuh.net \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=namit@vmware.com \
    --cc=nashuiliang@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=nsaenz@kernel.org \
    --cc=nsaenzju@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=urezki@gmail.com \
    --cc=vkuznets@redhat.com \
    --cc=vschneid@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yangjihong1@huawei.com \
    --cc=ypodemsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.