kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Haiwei Li <lihaiwei.kernel@gmail.com>
To: Sean Christopherson <seanjc@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Haiwei Li <lihaiwei@tencent.com>
Subject: Re: [PATCH] kvm: lapic: add module parameters for LAPIC_TIMER_ADVANCE_ADJUST_MAX/MIN
Date: Sat, 13 Mar 2021 09:31:17 +0800	[thread overview]
Message-ID: <CAB5KdObBa2oiPZpHx_S6V+=TFqb_zet=7tdaqU0y3cVJk2UZuQ@mail.gmail.com> (raw)
In-Reply-To: <YEwOM3aTeUjVim/i@google.com>

On Sat, Mar 13, 2021 at 8:58 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Mar 10, 2021, Haiwei Li wrote:
> > On Wed, Mar 10, 2021 at 7:42 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Wed, Mar 03, 2021, Haiwei Li wrote:
> > > > On 21/3/3 10:09, lihaiwei.kernel@gmail.com wrote:
> > > > > From: Haiwei Li <lihaiwei@tencent.com>
> > > > >
> > > > > In my test environment, advance_expire_delta is frequently greater than
> > > > > the fixed LAPIC_TIMER_ADVANCE_ADJUST_MAX. And this will hinder the
> > > > > adjustment.
> > > >
> > > > Supplementary details:
> > > >
> > > > I have tried to backport timer related features to our production
> > > > kernel.
> > > >
> > > > After completed, i found that advance_expire_delta is frequently greater
> > > > than the fixed value. It's necessary to trun the fixed to dynamically
> > > > values.
> > >
> > > Does this reproduce on an upstream kernel?  If so...
> > >
> > >   1. How much over the 10k cycle limit is the delta?
> > >   2. Any idea what causes the large delta?  E.g. is there something that can
> > >      and/or should be fixed elsewhere?
> > >   3. Is it platform/CPU specific?
> >
> > Hi, Sean
> >
> > I have traced the flow on our production kernel and it frequently consumes more
> > than 10K cycles from sched_out to sched_in.
> > So two scenarios tested on Cascade lake Server(96 pcpu), v5.11 kernel.
> >
> > 1. only cyclictest in guest(88 vcpu and bound with isolated pcpus, w/o mwait
> > exposed, adaptive advance lapic timer is default -1). The ratio of occurrences:
> >
> > greater_than_10k/total: 29/2060, 1.41%
> >
> > 2. cyclictest in guest(88 vcpu and not bound, w/o mwait exposed, adaptive
> > advance lapic timer is default -1) and stress in host(no isolate). The ratio of
> > occurrences:
> >
> > greater_than_10k/total: 122381/1017363, 12.03%
>
> Hmm, I'm inclined to say this is working as intended.  If the vCPU isn't affined
> and/or it's getting preempted, then large spikes are expected, and not adjusting
> in reaction to those spikes is desirable.  E.g. adjusting by 20k cycles because
> the timer happened to expire while a vCPU was preempted will cause KVM to busy
> wait for quite a long time if the next timer runs without interference, and then
> KVM will thrash the advancement.
>
> And I don't really see the point in pushing the max adjustment beyond 10k.  The
> max _advancement_ is 5000ns, which means that even with a blazing fast 5.0ghz
> system, a max adjustment of 1250 (10k/ 8, the step divisor) should get KVM to
> the 25000 cycle advancement limit relatively quickly.  Since KVM resets to the
> initial 1000ns advancement when it would exceed the 5000ns max, I suspect that
> raising the max adjustment much beyond 10k cycles would quickly push a vCPU to
> the max, cause it to reset, and rinse and repeat.
>
> Note, we definitely don't want to raise the 5000ns max, as waiting with IRQs
> disabled for any longer than that will likely cause system instability.

I see. Thanks for your explanation.

--
Haiwei Li

      reply	other threads:[~2021-03-13  1:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-03  2:09 [PATCH] kvm: lapic: add module parameters for LAPIC_TIMER_ADVANCE_ADJUST_MAX/MIN lihaiwei.kernel
2021-03-03  2:39 ` Haiwei Li
2021-03-09 23:42   ` Sean Christopherson
2021-03-10  9:15     ` Haiwei Li
2021-03-13  0:58       ` Sean Christopherson
2021-03-13  1:31         ` Haiwei Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAB5KdObBa2oiPZpHx_S6V+=TFqb_zet=7tdaqU0y3cVJk2UZuQ@mail.gmail.com' \
    --to=lihaiwei.kernel@gmail.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=lihaiwei@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).