All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leonardo Bras <leobras@redhat.com>
To: Leonardo Bras <leobras@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	rcu@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu
Date: Fri,  3 May 2024 16:09:11 -0300	[thread overview]
Message-ID: <ZjU2VxZe3A9_Y7Yf@LeoBras> (raw)
In-Reply-To: <ZjUwHvyvkM3lj80Q@LeoBras>

On Fri, May 03, 2024 at 03:42:38PM -0300, Leonardo Bras wrote:
> Hello Sean, Marcelo and Paul,
> 
> Thank you for your comments on this thread!
> I will try to reply some of the questions below:
> 
> (Sorry for the delay, I was OOO for a while.)
> 
> 
> On Mon, Apr 01, 2024 at 01:21:25PM -0700, Sean Christopherson wrote:
> > On Thu, Mar 28, 2024, Leonardo Bras wrote:
> > > I am dealing with a latency issue inside a KVM guest, which is caused by
> > > a sched_switch to rcuc[1].
> > > 
> > > During guest entry, kernel code will signal to RCU that current CPU was on
> > > a quiescent state, making sure no other CPU is waiting for this one.
> > > 
> > > If a vcpu just stopped running (guest_exit), and a syncronize_rcu() was
> > > issued somewhere since guest entry, there is a chance a timer interrupt
> > > will happen in that CPU, which will cause rcu_sched_clock_irq() to run.
> > > 
> > > rcu_sched_clock_irq() will check rcu_pending() which will return true,
> > > and cause invoke_rcu_core() to be called, which will (in current config)
> > > cause rcuc/N to be scheduled into the current cpu.
> > > 
> > > On rcu_pending(), I noticed we can avoid returning true (and thus invoking
> > > rcu_core()) if the current cpu is nohz_full, and the cpu came from either
> > > idle or userspace, since both are considered quiescent states.
> > > 
> > > Since this is also true to guest context, my idea to solve this latency
> > > issue by avoiding rcu_core() invocation if it was running a guest vcpu.
> > > 
> > > On the other hand, I could not find a way of reliably saying the current
> > > cpu was running a guest vcpu, so patch #1 implements a per-cpu variable
> > > for keeping the time (jiffies) of the last guest exit.
> > > 
> > > In patch #2 I compare current time to that time, and if less than a second
> > > has past, we just skip rcu_core() invocation, since there is a high chance
> > > it will just go back to the guest in a moment.
> > 
> > What's the downside if there's a false positive?
> 
> False positive being guest_exit without going back in this CPU, right?
> If so in WSC, supposing no qs happens and there is a pending request, RCU 
> will take a whole second to run again, possibly making other CPUs wait 
> this long for a synchronize_rcu.

Just to make sure it's clear:
It will wait at most 1 second, if the grace period was requested just 
before the last_guest_exit update. It will never make the grace period 
be longer than the already defined 1 second. 

That's because in the timer interrupt we have:

	if (rcu_pending())
		invoke_rcu_core();

and on rcu_pending():

	if ((user || rcu_is_cpu_rrupt_from_idle() || rcu_recent_guest_exit()) &&
	    rcu_nohz_full_cpu())
		return 0;

Meaning that even if we allow 5 seconds after recent_guest_exit, it will 
only make rcu_nohz_full_cpu() run, and it will check if the grace period is 
younger than 1 second before skipping the rcu_core() invocation.



> 
> This value (1 second) could defined in .config or as a parameter if needed, 
> but does not seem a big deal, 
> 
> > 
> > > What I know it's weird with this patch:
> > > 1 - Not sure if this is the best way of finding out if the cpu was
> > >     running a guest recently.
> > > 
> > > 2 - This per-cpu variable needs to get set at each guest_exit(), so it's
> > >     overhead, even though it's supposed to be in local cache. If that's
> > >     an issue, I would suggest having this part compiled out on 
> > >     !CONFIG_NO_HZ_FULL, but further checking each cpu for being nohz_full
> > >     enabled seems more expensive than just setting this out.
> > 
> > A per-CPU write isn't problematic, but I suspect reading jiffies will be quite
> > imprecise, e.g. it'll be a full tick "behind" on many exits.
> 
> That would not be a problem, as it would mean 1 tick less waiting in the 
> false positive WSC, and the 1s amount is plenty.

s/less/more/

> 
> > 
> > > 3 - It checks if the guest exit happened over than 1 second ago. This 1
> > >     second value was copied from rcu_nohz_full_cpu() which checks if the
> > >     grace period started over than a second ago. If this value is bad,
> > >     I have no issue changing it.
> > 
> > IMO, checking if a CPU "recently" ran a KVM vCPU is a suboptimal heuristic regardless
> > of what magic time threshold is used.  IIUC, what you want is a way to detect if
> > a CPU is likely to _run_ a KVM vCPU in the near future.
> 
> That's correct!
> 
> >  KVM can provide that
> > information with much better precision, e.g. KVM knows when when it's in the core
> > vCPU run loop.
> 
> That would not be enough.
> I need to present the application/problem to make a point:
> 
> - There is multiple  isolated physical CPU (nohz_full) on which we want to 
>   run KVM_RT vcpus, which will be running a real-time (low latency) task.
> - This task should not miss deadlines (RT), so we test the VM to make sure 
>   the maximum latency on a long run does not exceed the latency requirement
> - This vcpu will run on SCHED_FIFO, but has to run on lower priority than
>   rcuc, so we can avoid stalling other cpus.
> - There may be some scenarios where the vcpu will go back to userspace
>   (from KVM_RUN ioctl), and that does not mean it's good to interrupt the 
>   this to run other stuff (like rcuc).
> 
> Now, I understand it will cover most of our issues if we have a context 
> tracking around the vcpu_run loop, since we can use that to decide not to 
> run rcuc on the cpu if the interruption hapenned inside the loop.
> 
> But IIUC we can have a thread that "just got out of the loop" getting 
> interrupted by the timer, and asked to run rcu_core which will be bad for 
> latency.
> 
> I understand that the chance may be statistically low, but happening once 
> may be enough to crush the latency numbers.
> 
> Now, I can't think on a place to put this context trackers in kvm code that 
> would avoid the chance of rcuc running improperly, that's why the suggested 
> timeout, even though its ugly.
> 
> About the false-positive, IIUC we could reduce it if we reset the per-cpu 
> last_guest_exit on kvm_put.
> 
> > 
> > > 4 - Even though I could detect no issue, I included linux/kvm_host.h into 
> > >     rcu/tree_plugin.h, which is the first time it's getting included
> > >     outside of kvm or arch code, and can be weird.
> > 
> > Heh, kvm_host.h isn't included outside of KVM because several architectures can
> > build KVM as a module, which means referencing global KVM varibles from the kernel
> > proper won't work.
> > 
> > >     An alternative would be to create a new header for providing data for
> > >     non-kvm code.
> > 
> > I doubt a new .h or .c file is needed just for this, there's gotta be a decent
> > landing spot for a one-off variable.
> 
> You are probably right
> 
> >  E.g. I wouldn't be at all surprised if there
> > is additional usefulness in knowing if a CPU is in KVM's core run loop and thus
> > likely to do a VM-Enter in the near future, at which point you could probably make
> > a good argument for adding a flag in "struct context_tracking".  Even without a
> > separate use case, there's a good argument for adding that info to context_tracking.
> 
> For the tracking solution, makes sense :)
> Not sure if the 'timeout' alternative will be that useful outside rcu.
> 
> Thanks!
> Leo


  reply	other threads:[~2024-05-03 19:09 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 17:19 [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu Leonardo Bras
2024-03-28 17:19 ` [RFC PATCH v1 1/2] kvm: Implement guest_exit_last_time() Leonardo Bras
2024-03-28 17:19 ` [RFC PATCH v1 2/2] rcu: Ignore RCU in nohz_full cpus if it was running a guest recently Leonardo Bras
2024-04-01 15:52   ` Paul E. McKenney
2024-04-01 20:21 ` [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu Sean Christopherson
2024-04-05 13:45   ` Marcelo Tosatti
2024-04-05 14:42     ` Sean Christopherson
2024-04-06  0:03       ` Paul E. McKenney
2024-04-08 17:16         ` Sean Christopherson
2024-04-08 18:42           ` Paul E. McKenney
2024-04-08 20:06             ` Sean Christopherson
2024-04-08 21:02               ` Paul E. McKenney
2024-04-08 21:56                 ` Sean Christopherson
2024-04-08 22:35                   ` Paul E. McKenney
2024-04-08 23:06                     ` Sean Christopherson
2024-04-08 23:20                       ` Paul E. McKenney
2024-04-10  2:39           ` Marcelo Tosatti
2024-04-15 19:47           ` Marcelo Tosatti
2024-04-15 21:29             ` Sean Christopherson
2024-04-16 12:36               ` Marcelo Tosatti
2024-04-16 14:07                 ` Sean Christopherson
2024-04-17 16:14                   ` Marcelo Tosatti
2024-04-17 17:22                     ` Sean Christopherson
2024-05-03 20:44                       ` Leonardo Bras
2024-05-06 18:47                         ` Marcelo Tosatti
2024-05-07 18:05                           ` Sean Christopherson
2024-05-07 22:36                             ` Leonardo Bras
2024-05-03 18:42   ` Leonardo Bras
2024-05-03 19:09     ` Leonardo Bras [this message]
2024-05-03 21:29     ` Sean Christopherson
2024-05-03 22:00       ` Leonardo Bras
2024-05-03 22:00       ` Paul E. McKenney
2024-05-07 17:55         ` Sean Christopherson
2024-05-07 19:15           ` Paul E. McKenney
2024-05-07 21:00             ` Sean Christopherson
2024-05-07 21:37               ` Paul E. McKenney
2024-05-07 23:47                 ` Sean Christopherson
2024-05-08  0:08                   ` Sean Christopherson
2024-05-08  2:51                     ` Leonardo Bras
2024-05-08  3:22                       ` Paul E. McKenney
2024-05-08  6:19                         ` Leonardo Bras
2024-05-08 14:01                           ` Sean Christopherson
2024-05-09  3:32                             ` Paul E. McKenney
2024-05-09  8:16                               ` Leonardo Bras
2024-05-09 10:14                                 ` Leonardo Bras
2024-05-09 23:45                                   ` Paul E. McKenney
2024-05-10 16:06                                     ` Leonardo Bras
2024-05-10 16:21                                       ` Paul E. McKenney
2024-05-10 17:12                                         ` Leonardo Bras
2024-05-10 17:41                                           ` Paul E. McKenney
2024-05-10 19:50                                             ` Leonardo Bras
2024-05-10 21:15                                               ` Leonardo Bras
2024-05-10 21:38                                                 ` Paul E. McKenney
2024-05-09 22:41                                 ` Paul E. McKenney
2024-05-09 23:07                                   ` Leonardo Bras Soares Passos
2024-05-11  2:08                             ` Leonardo Bras
2024-05-08  3:20                     ` Paul E. McKenney
2024-05-08  4:04                       ` Paul E. McKenney
2024-05-08 14:36                         ` Paul E. McKenney
2024-05-08 15:35                       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjU2VxZe3A9_Y7Yf@LeoBras \
    --to=leobras@redhat.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mtosatti@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qiang.zhang1211@gmail.com \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.