From: Andrew Jones <drjones@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: habanero@linux.vnet.ibm.com,
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Ingo Molnar <mingo@redhat.com>, Rik van Riel <riel@redhat.com>,
KVM <kvm@vger.kernel.org>, chegu vinod <chegu_vinod@hp.com>,
LKML <linux-kernel@vger.kernel.org>, X86 <x86@kernel.org>,
Gleb Natapov <gleb@redhat.com>,
Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>
Subject: Re: [RFC][PATCH] Improving directed yield scalability for PLE handler
Date: Mon, 17 Sep 2012 10:10:43 +0200 [thread overview]
Message-ID: <20120917081043.GB2104@turtle.usersys.redhat.com> (raw)
In-Reply-To: <50559400.8030203@redhat.com>
On Sun, Sep 16, 2012 at 11:55:28AM +0300, Avi Kivity wrote:
> On 09/14/2012 12:30 AM, Andrew Theurer wrote:
>
> > The concern I have is that even though we have gone through changes to
> > help reduce the candidate vcpus we yield to, we still have a very poor
> > idea of which vcpu really needs to run. The result is high cpu usage in
> > the get_pid_task and still some contention in the double runqueue lock.
> > To make this scalable, we either need to significantly reduce the
> > occurrence of the lock-holder preemption, or do a much better job of
> > knowing which vcpu needs to run (and not unnecessarily yielding to vcpus
> > which do not need to run).
> >
> > On reducing the occurrence: The worst case for lock-holder preemption
> > is having vcpus of same VM on the same runqueue. This guarantees the
> > situation of 1 vcpu running while another [of the same VM] is not. To
> > prove the point, I ran the same test, but with vcpus restricted to a
> > range of host cpus, such that any single VM's vcpus can never be on the
> > same runqueue. In this case, all 10 VMs' vcpu-0's are on host cpus 0-4,
> > vcpu-1's are on host cpus 5-9, and so on. Here is the result:
> >
> > kvm_cpu_spin, and all
> > yield_to changes, plus
> > restricted vcpu placement: 8823 +/- 3.20% much, much better
> >
> > On picking a better vcpu to yield to: I really hesitate to rely on
> > paravirt hint [telling us which vcpu is holding a lock], but I am not
> > sure how else to reduce the candidate vcpus to yield to. I suspect we
> > are yielding to way more vcpus than are prempted lock-holders, and that
> > IMO is just work accomplishing nothing. Trying to think of way to
> > further reduce candidate vcpus....
>
> I wouldn't say that yielding to the "wrong" vcpu accomplishes nothing.
> That other vcpu gets work done (unless it is in pause loop itself) and
> the yielding vcpu gets put to sleep for a while, so it doesn't spend
> cycles spinning. While we haven't fixed the problem at least the guest
> is accomplishing work, and meanwhile the real lock holder may get
> naturally scheduled and clear the lock.
>
> The main problem with this theory is that the experiments don't seem to
> bear it out. So maybe one of the assumptions is wrong - the yielding
> vcpu gets scheduled early. That could be the case if the two vcpus are
> on different runqueues - you could be changing the relative priority of
> vcpus on the target runqueue, but still remain on top yourself. Is this
> possible with the current code?
>
> Maybe we should prefer vcpus on the same runqueue as yield_to targets,
> and only fall back to remote vcpus when we see it didn't help.
I thought about this a bit recently too, but didn't pursue it, because I
figured it would actually increase the get_pid_task and double_rq_lock
contention time if we have to hunt too long for a vcpu that matches a more
strict criteria. But, I guess if we can implement a special "reschedule"
to run on the current cpu which prioritizes runnable/non-running vcpus,
then it should be just as fast or faster for it to look through the
runqueue first, than it is to look through all the vcpus first.
Drew
>
> Let's examine a few cases:
>
> 1. spinner on cpu 0, lock holder on cpu 0
>
> win!
>
> 2. spinner on cpu 0, random vcpu(s) (or normal processes) on cpu 0
>
> Spinner gets put to sleep, random vcpus get to work, low lock contention
> (no double_rq_lock), by the time spinner gets scheduled we might have won
>
> 3. spinner on cpu 0, another spinner on cpu 0
>
> Worst case, we'll just spin some more. Need to detect this case and
> migrate something in.
>
> 4. spinner on cpu 0, alone
>
> Similar
>
>
> It seems we need to tie in to the load balancer.
>
> Would changing the priority of the task while it is spinning help the
> load balancer?
>
> --
> error compiling committee.c: too many arguments to function
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-09-17 8:11 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-18 13:37 [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Raghavendra K T
2012-07-18 13:37 ` [PATCH RFC V5 1/3] kvm/config: Add config to support ple or cpu relax optimzation Raghavendra K T
2012-07-18 13:37 ` [PATCH RFC V5 2/3] kvm: Note down when cpu relax intercepted or pause loop exited Raghavendra K T
2012-07-18 13:38 ` [PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield Raghavendra K T
2012-07-18 14:39 ` Raghavendra K T
2012-07-19 9:47 ` [RESEND PATCH " Raghavendra K T
2012-07-20 17:36 ` [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Marcelo Tosatti
2012-07-22 12:34 ` Raghavendra K T
2012-07-22 12:43 ` Avi Kivity
2012-07-23 7:35 ` Christian Borntraeger
2012-07-22 17:58 ` Rik van Riel
2012-07-23 10:03 ` Avi Kivity
2012-09-07 13:11 ` [RFC][PATCH] Improving directed yield scalability for " Andrew Theurer
2012-09-07 18:06 ` Raghavendra K T
2012-09-07 19:42 ` Andrew Theurer
2012-09-08 8:43 ` Srikar Dronamraju
2012-09-10 13:16 ` Andrew Theurer
2012-09-10 16:03 ` Peter Zijlstra
2012-09-10 16:56 ` Srikar Dronamraju
2012-09-10 17:12 ` Peter Zijlstra
2012-09-10 19:10 ` Raghavendra K T
2012-09-10 20:12 ` Andrew Theurer
2012-09-10 20:19 ` Peter Zijlstra
2012-09-10 20:31 ` Rik van Riel
2012-09-11 6:08 ` Raghavendra K T
2012-09-11 12:48 ` Andrew Theurer
2012-09-11 18:27 ` Andrew Theurer
2012-09-13 11:48 ` Raghavendra K T
2012-09-13 21:30 ` Andrew Theurer
2012-09-14 17:10 ` Andrew Jones
2012-09-15 16:08 ` Raghavendra K T
2012-09-17 13:48 ` Andrew Jones
2012-09-14 20:34 ` Konrad Rzeszutek Wilk
2012-09-17 8:02 ` Andrew Jones
2012-09-16 8:55 ` Avi Kivity
2012-09-17 8:10 ` Andrew Jones [this message]
2012-09-18 3:03 ` Andrew Theurer
2012-09-19 13:39 ` Avi Kivity
2012-09-13 12:13 ` Avi Kivity
2012-09-11 7:04 ` Srikar Dronamraju
2012-09-10 14:43 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120917081043.GB2104@turtle.usersys.redhat.com \
--to=drjones@redhat.com \
--cc=avi@redhat.com \
--cc=chegu_vinod@hp.com \
--cc=gleb@redhat.com \
--cc=habanero@linux.vnet.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=srivatsa.vaddagiri@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).