* Regarding improving ple handler (vcpu_on_spin) @ 2012-06-19 20:20 Raghavendra K T 2012-06-19 20:51 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Rik van Riel 0 siblings, 1 reply; 21+ messages in thread From: Raghavendra K T @ 2012-06-19 20:20 UTC (permalink / raw) To: Avi Kivity, Marcelo Tosatti, Rik van Riel Cc: Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Raghavendra K T, Ingo Molnar, LKML In ple handler code, last_boosted_vcpu (lbv) variable is serving as reference point to start when we enter. lbv = kvm->lbv; for each vcpu i of kvm if i is eligible if yield_to(i) is success lbv = i currently this variable is per VM and it is set after we do yield_to(target), unfortunately it may take little longer than we expect to come back again (depending on its lag in rb tree) on successful yield and set the value. So when several ple_handle entry happens before it is set, all of them start from same place. (and overall RR is also slower). Also statistical analysis (below) is showing lbv is not very well distributed with current approach. naturally, first approach is to move lbv before yield_to, without bothering failure case to make RR fast. (was in Rik's V4 vcpu_on_spin patch series). But when I did performance analysis, in no-overcommit scenario, I saw violent/cascaded directed yield happening, leading to more wastage of cpu in spinning. (huge degradation in 1x and improvement in 3x, I assume this was the reason it was moved after yield_to in V5 of vcpu_on_spin series.) Second approach, I tried was, (1) get rid of per kvm lbv variable (2) everybody who enters handler start from a random vcpu as reference point. The above gave good distribution of starting point,(and performance improvement in 32 vcpu guest I tested) and also IMO, it scales well for larger VM's. Analysis ============= Four 32 vcpu guest running with one of them running kernbench. PLE handler yield stat is the statistics for successfully yielded case (for 32 vcpus) PLE handler start stat is the statistics for frequency of each vcpu index as starting point (for 32 vcpus) snapshot1 ============= PLE handler yield stat : 274391 33088 32554 46688 46653 48742 48055 37491 38839 31799 28974 30303 31466 45936 36208 51580 32754 53441 28956 30738 37940 37693 26183 40022 31725 41879 23443 35826 40985 30447 37352 35445 PLE handler start stat : 433590 383318 204835 169981 193508 203954 175960 139373 153835 125245 118532 140092 135732 134903 119349 149467 109871 160404 117140 120554 144715 125099 108527 125051 111416 141385 94815 138387 154710 116270 123130 173795 snapshot2 ============ PLE handler yield stat : 1957091 59383 67866 65474 100335 77683 80958 64073 53783 44620 80131 81058 66493 56677 74222 74974 42398 132762 48982 70230 78318 65198 54446 104793 59937 57974 73367 96436 79922 59476 58835 63547 PLE handler start stat : 2555089 611546 461121 346769 435889 452398 407495 314403 354277 298006 364202 461158 344783 288263 342165 357270 270887 451660 300020 332120 378403 317848 307969 414282 351443 328501 352840 426094 375050 330016 347540 371819 So questions I have in mind is, 1. Do you think going for randomizing last_boosted_vcpu and get rid of per VM variable is better? 2. Can/Do we have a mechanism, from which we will be able to decide not to yield to vcpu who is doing frequent PLE exit (possibly because he is doing unnecessary busy-waits) OR doing yield_to better candidate? On a side note: With pv patches I have tried doing yield_to a kicked VCPU, in vcpu_block path and is giving some performance improvement. Please let me know if you have any comments/suggestions. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-19 20:20 Regarding improving ple handler (vcpu_on_spin) Raghavendra K T @ 2012-06-19 20:51 ` Rik van Riel 2012-06-20 20:12 ` Raghavendra K T ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Rik van Riel @ 2012-06-19 20:51 UTC (permalink / raw) To: Raghavendra K T Cc: Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML On Wed, 20 Jun 2012 01:50:50 +0530 Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote: > > In ple handler code, last_boosted_vcpu (lbv) variable is > serving as reference point to start when we enter. > Also statistical analysis (below) is showing lbv is not very well > distributed with current approach. You are the second person to spot this bug today (yes, today). Due to time zones, the first person has not had a chance yet to test the patch below, which might fix the issue... Please let me know how it goes. ====8<==== If last_boosted_vcpu == 0, then we fall through all test cases and may end up with all VCPUs pouncing on vcpu 0. With a large enough guest, this can result in enormous runqueue lock contention, which can prevent vcpu0 from running, leading to a livelock. Changing < to <= makes sure we properly handle that case. Signed-off-by: Rik van Riel <riel@redhat.com> --- virt/kvm/kvm_main.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7e14068..1da542b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) */ for (pass = 0; pass < 2 && !yielded; pass++) { kvm_for_each_vcpu(i, vcpu, kvm) { - if (!pass && i < last_boosted_vcpu) { + if (!pass && i <= last_boosted_vcpu) { i = last_boosted_vcpu; continue; } else if (pass && i > last_boosted_vcpu) ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-19 20:51 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Rik van Riel @ 2012-06-20 20:12 ` Raghavendra K T 2012-06-21 2:11 ` Rik van Riel 2012-06-21 11:26 ` Raghavendra K T 2012-06-21 6:43 ` Gleb Natapov 2012-07-06 17:11 ` Marcelo Tosatti 2 siblings, 2 replies; 21+ messages in thread From: Raghavendra K T @ 2012-06-20 20:12 UTC (permalink / raw) To: Rik van Riel Cc: Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML On 06/20/2012 02:21 AM, Rik van Riel wrote: > On Wed, 20 Jun 2012 01:50:50 +0530 > Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> wrote: > >> >> In ple handler code, last_boosted_vcpu (lbv) variable is >> serving as reference point to start when we enter. > >> Also statistical analysis (below) is showing lbv is not very well >> distributed with current approach. > > You are the second person to spot this bug today (yes, today). Oh! really interesting. > > Due to time zones, the first person has not had a chance yet to > test the patch below, which might fix the issue... May be his timezone also falls near to mine. I am also pretty late now. :) > > Please let me know how it goes. Yes, have got result today, too tired to summarize. got better performance result too. will come back again tomorrow morning. have to post, randomized start point patch also, which I discussed to know the opinion. > > ====8<==== > > If last_boosted_vcpu == 0, then we fall through all test cases and > may end up with all VCPUs pouncing on vcpu 0. With a large enough > guest, this can result in enormous runqueue lock contention, which > can prevent vcpu0 from running, leading to a livelock. > > Changing< to<= makes sure we properly handle that case. Analysis shows distribution is more flatten now than before. Here are the snapshots: snapshot1 PLE handler yield stat : 66447 132222 75510 65875 121298 92543 111267 79523 118134 105366 116441 114195 107493 66666 86779 87733 84415 105778 94210 73197 55626 93036 112959 92035 95742 78558 72190 101719 94667 108593 63832 81580 PLE handler start stat : 334301 687807 384077 344917 504917 343988 439810 371389 466908 415509 394304 484276 376510 292821 370478 363727 366989 423441 392949 309706 292115 437900 413763 346135 364181 323031 348405 399593 336714 373995 302301 347383 snapshot2 PLE handler yield stat : 320547 267528 264316 164213 249246 182014 246468 225386 277179 310659 349767 310281 238680 187645 225791 266290 216202 316974 231077 216586 151679 356863 266031 213047 306229 182629 229334 241204 275975 265086 282218 242207 PLE handler start stat : 1335370 1378184 1252001 925414 1196973 951298 1219835 1108788 1265427 1290362 1308553 1271066 1107575 980036 1077210 1278611 1110779 1365130 1151200 1049859 937159 1577830 1209099 993391 1173766 987307 1144775 1102960 1100082 1177134 1207862 1119551 > > Signed-off-by: Rik van Riel<riel@redhat.com> > --- > virt/kvm/kvm_main.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 7e14068..1da542b 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > */ > for (pass = 0; pass< 2&& !yielded; pass++) { > kvm_for_each_vcpu(i, vcpu, kvm) { > - if (!pass&& i< last_boosted_vcpu) { > + if (!pass&& i<= last_boosted_vcpu) { Hmmm true, great catch. it was partial towards zero earlier. > i = last_boosted_vcpu; > continue; > } else if (pass&& i> last_boosted_vcpu) > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-20 20:12 ` Raghavendra K T @ 2012-06-21 2:11 ` Rik van Riel 2012-06-21 11:26 ` Raghavendra K T 1 sibling, 0 replies; 21+ messages in thread From: Rik van Riel @ 2012-06-21 2:11 UTC (permalink / raw) To: Raghavendra K T Cc: Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML On 06/20/2012 04:12 PM, Raghavendra K T wrote: > On 06/20/2012 02:21 AM, Rik van Riel wrote: >> Please let me know how it goes. > > Yes, have got result today, too tired to summarize. got better > performance result too. will come back again tomorrow morning. > have to post, randomized start point patch also, which I discussed to > know the opinion. The other person's problem has also gone away with this patch. Avi, could I convince you to apply this obvious bugfix to kvm.git? :) >> ====8<==== >> >> If last_boosted_vcpu == 0, then we fall through all test cases and >> may end up with all VCPUs pouncing on vcpu 0. With a large enough >> guest, this can result in enormous runqueue lock contention, which >> can prevent vcpu0 from running, leading to a livelock. >> >> Changing< to<= makes sure we properly handle that case. >> >> Signed-off-by: Rik van Riel<riel@redhat.com> >> --- >> virt/kvm/kvm_main.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 7e14068..1da542b 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) >> */ >> for (pass = 0; pass< 2&& !yielded; pass++) { >> kvm_for_each_vcpu(i, vcpu, kvm) { >> - if (!pass&& i< last_boosted_vcpu) { >> + if (!pass&& i<= last_boosted_vcpu) { >> i = last_boosted_vcpu; >> continue; >> } else if (pass&& i> last_boosted_vcpu) >> >> > -- All rights reversed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-20 20:12 ` Raghavendra K T 2012-06-21 2:11 ` Rik van Riel @ 2012-06-21 11:26 ` Raghavendra K T 2012-06-22 15:11 ` Andrew Jones 1 sibling, 1 reply; 21+ messages in thread From: Raghavendra K T @ 2012-06-21 11:26 UTC (permalink / raw) To: Rik van Riel, Avi Kivity Cc: Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML, Gleb Natapov, chegu_vinod [-- Attachment #1: Type: text/plain, Size: 4582 bytes --] On 06/21/2012 01:42 AM, Raghavendra K T wrote: > On 06/20/2012 02:21 AM, Rik van Riel wrote: >> On Wed, 20 Jun 2012 01:50:50 +0530 >> Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> wrote: >> [...] >> Please let me know how it goes. > > Yes, have got result today, too tired to summarize. got better > performance result too. will come back again tomorrow morning. > have to post, randomized start point patch also, which I discussed to > know the opinion. > Here are the results from kernbench. PS: I think we have to only take that, both the patches perform better, than reading into actual numbers since I am seeing more variance in especially 3x. may be I can test with some more stable benchmark if somebody points +----------+-------------+------------+------------+-----------+ | base | Rik patch | % improve |Random patch| %improve | +----------+-------------+------------+------------+-----------+ | 49.98 | 49.935 | 0.0901172 | 49.924286 | 0.111597 | | 106.0051 | 89.25806 | 18.7625 | 88.122217 | 20.2933 | | 189.82067| 175.58783 | 8.10582 | 166.99989 | 13.6651 | +----------+-------------+------------+------------+-----------+ I also have posted result of randomizing starting point patch. I agree that Rik's fix should ideally go into git ASAP. and when above patches go into git, feel free to add, Tested-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> But I still see some questions unanswered. 1) why can't we move setting of last_boosted_vcpu up, it gives more randomness ( As I said earlier, it gave degradation in 1x case because of violent yields but performance benefit in 3x case. degradation because most of them yielding back to same spinning guy increasing busy-wait but it gives huge benefit with ple_window set to higher values such as 32k/64k. But that is a different issue altogethor) 2) Having the update of last_boosted_vcpu after yield_to does not seem to be entirely correct. and having a common variable as starting point may not be that good too. Also RR is little slower. suppose we have 64 vcpu guest, and 4 vcpus enter ple_handler all of them jumping on same guy to yield may not be good. Rather I personally feel each of them starting at different point would be good idea. But this alone will not help, we need more filtering of eligible VCPU. for e.g. in first pass don't choose a VCPU that has recently done PL exit. (Thanks Vatsa for brainstorming this). May be Peter/Avi /Rik/Vatsa can give more idea in this area ( I mean, how can we identify that a vcpu had done a PL exit/OR exited from spinlock context etc) other idea may be something like identifying next eligible lock-holder (which is already possible with PV patches), and do yield-to him. Here is the stat from randomizing starting point patch. We can see that the patch has amazing fairness w.r.t starting point. IMO, this would be great only after we add more eligibility criteria to target vcpus (of yield_to). Randomizing start index =========================== snapshot1 PLE handler yield stat : 218416 176802 164554 141184 148495 154709 159871 145157 135476 158025 139997 247638 152498 133338 122774 248228 158469 121825 138542 113351 164988 120432 136391 129855 172764 214015 158710 133049 83485 112134 81651 190878 PLE handler start stat : 547772 547725 547545 547931 547836 548656 548272 547849 548879 549012 547285 548185 548700 547132 548310 547286 547236 547307 548328 548059 547842 549152 547870 548340 548170 546996 546678 547842 547716 548096 547918 547546 snapshot2 ============== PLE handler yield stat : 310690 222992 275829 156876 187354 185373 187584 155534 151578 205994 223731 320894 194995 167011 153415 286910 181290 143653 173988 181413 194505 170330 194455 181617 251108 226577 192070 143843 137878 166393 131405 250657 PLE handler start stat : 781335 782388 781837 782942 782025 781357 781950 781695 783183 783312 782004 782804 783766 780825 783232 781013 781587 781228 781642 781595 781665 783530 781546 781950 782268 781443 781327 781666 781907 781593 782105 781073 Sorry for attaching patch inline, I am using a dumb client. will post it separately if needed. ====8<==== Currently PLE handler uses per VM variable as starting point. Get rid of the variable and use randomized starting point. Thanks Vatsa for scheduler related clarifications. Suggested-by: Srikar <srikar@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> --- [-- Attachment #2: randomize_starting_vcpu.patch --] [-- Type: text/x-patch, Size: 1943 bytes --] diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c446435..9799cab 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -275,7 +275,6 @@ struct kvm { #endif struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; atomic_t online_vcpus; - int last_boosted_vcpu; struct list_head vm_list; struct mutex lock; struct kvm_io_bus *buses[KVM_NR_BUSES]; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7e14068..6bab9f7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -49,6 +49,7 @@ #include <linux/slab.h> #include <linux/sort.h> #include <linux/bsearch.h> +#include <linux/random.h> #include <asm/processor.h> #include <asm/io.h> @@ -1572,31 +1573,32 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) { struct kvm *kvm = me->kvm; struct kvm_vcpu *vcpu; - int last_boosted_vcpu = me->kvm->last_boosted_vcpu; + int vcpu_to_boost; int yielded = 0; int pass; int i; + int num_vcpus = atomic_read(&kvm->online_vcpus); + vcpu_to_boost = (random32() % num_vcpus); /* * We boost the priority of a VCPU that is runnable but not * currently running, because it got preempted by something * else and called schedule in __vcpu_run. Hopefully that * VCPU is holding the lock that we need and will release it. - * We approximate round-robin by starting at the last boosted VCPU. + * We approximate round-robin by starting at a random VCPU. */ for (pass = 0; pass < 2 && !yielded; pass++) { kvm_for_each_vcpu(i, vcpu, kvm) { - if (!pass && i < last_boosted_vcpu) { - i = last_boosted_vcpu; + if (!pass && i < vcpu_to_boost) { + i = vcpu_to_boost; continue; - } else if (pass && i > last_boosted_vcpu) + } else if (pass && i > vcpu_to_boost) break; if (vcpu == me) continue; if (waitqueue_active(&vcpu->wq)) continue; if (kvm_vcpu_yield_to(vcpu)) { - kvm->last_boosted_vcpu = i; yielded = 1; break; } ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-21 11:26 ` Raghavendra K T @ 2012-06-22 15:11 ` Andrew Jones 2012-06-22 21:00 ` Raghavendra K T 0 siblings, 1 reply; 21+ messages in thread From: Andrew Jones @ 2012-06-22 15:11 UTC (permalink / raw) To: Raghavendra K T Cc: Rik van Riel, Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML, Gleb Natapov, chegu_vinod On Thu, Jun 21, 2012 at 04:56:08PM +0530, Raghavendra K T wrote: > Here are the results from kernbench. > > PS: I think we have to only take that, both the patches perform better, > than reading into actual numbers since I am seeing more variance in > especially 3x. may be I can test with some more stable benchmark if > somebody points > Hi Raghu, I wonder if we should back up and try to determine the best benchmark/test environment first. I think kernbench is good, but I wonder about how to simulate the overcommit, and to what degree (1x, 3x, ??). What are you currently running to simulate overcommit now? Originally we were running kernbench in one VM and cpu hogs (bash infinite loops) in other VMs. Then we added vcpus and infinite loops to get up to the desired overcommit. I saw later that you've experimented with running kernbench in the other VMs as well, rather than cpu hogs. Is that still the case? I started playing with benchmarking these proposals myself, but so far have stuck to the cpu hog, since I wanted to keep variability limited. However, when targeting a reasonable host loadavg with a bunch of cpu hog vcpus, it limits the overcommit too much. I certainly haven't tried 3x this way. So I'm inclined to throw out the cpu hog approach as well. The question is, what to replace it with? It appears that the performance of the PLE and pvticketlock proposals are quite dependant on the level of overcommit, so we should choose a target overcommit level and also a constraint on the host loadavg first, then determine how to setup a test environment that fits it and yields results with low variance. Here are results from my 1.125x overcommit test environment using cpu hogs. kcbench (a.k.a kernbench) results; 'mean-time (stddev)' base-noPLE: 235.730 (25.932) base-PLE: 238.820 (11.199) rand_start-PLE: 283.193 (23.262) pvticketlocks-noPLE: 244.987 (7.562) pvticketlocks-PLE: 247.597 (17.200) base kernel: 3.5.0-rc3 + Rik's new last_boosted patch rand_start kernel: 3.5.0-rc3 + Raghu's proposed random start patch pvticketlocks kernel: 3.5.0-rc3 + Rik's new last_boosted patch + Raghu's pvticketlock series The relative standard deviations are as high as 11%. So I'm not real pleased with the results, and they show degradation everywhere. Below are the details of the benchmarking. Everything is there except the kernel config, but our benchmarking should be reproducible with nearly random configs anyway. Drew = host = - Intel(R) Xeon(R) CPU X7560 @ 2.27GHz - 64 cpus, 4 nodes, 64G mem - Fedora 17 with test kernels (see tests) = benchmark = - one cpu hog F17 VM - 64 vcpus, 8G mem - all vcpus run a bash infinite loop - kernel: 3.5.0-rc3 - one kcbench (a.k.a kernbench) F17 VM - 8 vcpus, 8G mem - 'kcbench -d /mnt/ram', /mnt/ram is 1G ramfs - kcbench-0.3-8.1.noarch, kcbench-data-2.6.38-0.1-9.fc17.noarch, kcbench-data-0.1-9.fc17.noarch - gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5) - kernel: same test kernel as host = test 1: base, PLE disabled (ple_gap=0) = - kernel: 3.5.0-rc3 + Rik's last_boosted patch Run 1 (-j 16): 4211 (e:237.43 P:637% U:697.98 S:815.46 F:0) Run 2 (-j 16): 3834 (e:260.77 P:631% U:729.69 S:917.56 F:0) Run 3 (-j 16): 4784 (e:208.99 P:644% U:638.17 S:708.63 F:0) mean: 235.730 stddev: 25.932 = test 2: base, PLE enabled = - kernel: 3.5.0-rc3 + Rik's last_boosted patch Run 1 (-j 16): 4335 (e:230.67 P:639% U:657.74 S:818.28 F:0) Run 2 (-j 16): 4269 (e:234.20 P:647% U:743.43 S:772.52 F:0) Run 3 (-j 16): 3974 (e:251.59 P:639% U:724.29 S:884.21 F:0) mean: 238.820 stddev: 11.199 = test 3: rand_start, PLE enabled = - kernel: 3.5.0-rc3 + Raghu's random start patch Run 1 (-j 16): 3898 (e:256.52 P:639% U:756.14 S:884.63 F:0) Run 2 (-j 16): 3341 (e:299.27 P:633% U:857.49 S:1039.62 F:0) Run 3 (-j 16): 3403 (e:293.79 P:635% U:857.21 S:1008.83 F:0) mean: 283.193 stddev: 23.262 = test 4: pvticketlocks, PLE disabled (ple_gap=0) = - kernel: 3.5.0-rc3 + Rik's last_boosted patch + Raghu's pvticketlock series + PARAVIRT_SPINLOCKS=y config change Run 1 (-j 16): 3963 (e:252.29 P:647% U:736.43 S:897.16 F:0) Run 2 (-j 16): 4216 (e:237.19 P:650% U:706.68 S:837.42 F:0) Run 3 (-j 16): 4073 (e:245.48 P:649% U:709.46 S:884.68 F:0) mean: 244.987 stddev: 7.562 = test 5: pvticketlocks, PLE enabled = - kernel: 3.5.0-rc3 + Rik's last_boosted patch + Raghu's pvticketlock series + PARAVIRT_SPINLOCKS=y config change Run 1 (-j 16): 3978 (e:251.32 P:629% U:758.86 S:824.29 F:0) Run 2 (-j 16): 4369 (e:228.84 P:634% U:708.32 S:743.71 F:0) Run 3 (-j 16): 3807 (e:262.63 P:626% U:767.03 S:877.96 F:0) mean: 247.597 stddev: 17.200 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-22 15:11 ` Andrew Jones @ 2012-06-22 21:00 ` Raghavendra K T 2012-06-23 18:34 ` Raghavendra K T 0 siblings, 1 reply; 21+ messages in thread From: Raghavendra K T @ 2012-06-22 21:00 UTC (permalink / raw) To: Andrew Jones Cc: Rik van Riel, Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML, Gleb Natapov, chegu_vinod, Jeremy Fitzhardinge On 06/22/2012 08:41 PM, Andrew Jones wrote: > On Thu, Jun 21, 2012 at 04:56:08PM +0530, Raghavendra K T wrote: >> Here are the results from kernbench. >> >> PS: I think we have to only take that, both the patches perform better, >> than reading into actual numbers since I am seeing more variance in >> especially 3x. may be I can test with some more stable benchmark if >> somebody points >> > > Hi Raghu, > First of all Thank you for your test and raising valid points. It also made the avenue for discussion of all the different experiments done over a month (apart from tuning/benchmarking), which may bring more feedback and precious ideas from community to optimize the performance further. I shall discuss in reply to this mail separately. > I wonder if we should back up and try to determine the best > benchmark/test environment first. I agree, we have to be able to produce similar result independently. So far sysbench (even pgbench) has been consistent, Currently trying, if other benchmarks like hackbench (modified #loops), ebizzy/dbench have low variance. [ but they too are dependent on #client/threads etc ] I think kernbench is good, but Yes kernbench atleast helped me to tune SPIN_THRESHOLD to good extent. But Jeremy also had pointed out that kernbench is little inconsistent. > I wonder about how to simulate the overcommit, and to what degree > (1x, 3x, ??). What are you currently running to simulate overcommit > now? Originally we were running kernbench in one VM and cpu hogs > (bash infinite loops) in other VMs. Then we added vcpus and infinite > loops to get up to the desired overcommit. I saw later that you've > experimented with running kernbench in the other VMs as well, rather > than cpu hogs. Is that still the case? > Yes, I am now running same benchmark on all the guest. on non PLE, while 1 cpuhogs, played good role of simulating LHP, but on PLE machine It did not seem to be the case. > I started playing with benchmarking these proposals myself, but so > far have stuck to the cpu hog, since I wanted to keep variability > limited. However, when targeting a reasonable host loadavg with a > bunch of cpu hog vcpus, it limits the overcommit too much. I certainly > haven't tried 3x this way. So I'm inclined to throw out the cpu hog > approach as well. The question is, what to replace it with? It appears > that the performance of the PLE and pvticketlock proposals are quite > dependant on the level of overcommit, so we should choose a target > overcommit level and also a constraint on the host loadavg first, > then determine how to setup a test environment that fits it and yields > results with low variance. > > Here are results from my 1.125x overcommit test environment using > cpu hogs. At first, result seemed backward, but after seeing individual runs and variations, it seems, except for rand start I believe all the result should converge to zero difference. So if we run the same again we may get completely different result. IMO, on a 64 vcpu guest if we run -j16 it may not represent 1x load, so what I believe is it has resulted in more of under-commit/nearly 1x commit result. May be we should try atleast #threads = #vcpu or 2*#vcpu > > kcbench (a.k.a kernbench) results; 'mean-time (stddev)' > base-noPLE: 235.730 (25.932) > base-PLE: 238.820 (11.199) > rand_start-PLE: 283.193 (23.262) Problem currently as we know, in PLE handler we may end up choosing same VCPU, which was in spinloop, that would unfortunately result in more cpu burning. And with randomizing start_vcpu, we are making that probability more. we need to have a logic, not choose a vcpu that has recently PL exited since it cannot be a lock-holder. and next eligible lock-holder can be picked up easily with PV patches. > pvticketlocks-noPLE: 244.987 (7.562) > pvticketlocks-PLE: 247.597 (17.200) > > base kernel: 3.5.0-rc3 + Rik's new last_boosted patch > rand_start kernel: 3.5.0-rc3 + Raghu's proposed random start patch > pvticketlocks kernel: 3.5.0-rc3 + Rik's new last_boosted patch > + Raghu's pvticketlock series Ok, I believe SPIN_THRESHOLD was 2k right? what I had observed is with 2k THRESHOLD, we see halt exit overheads. currently I am trying with mostly 4k. > > The relative standard deviations are as high as 11%. So I'm not > real pleased with the results, and they show degradation everywhere. > Below are the details of the benchmarking. Everything is there except > the kernel config, but our benchmarking should be reproducible with > nearly random configs anyway. > > Drew > > = host = > - Intel(R) Xeon(R) CPU X7560 @ 2.27GHz > - 64 cpus, 4 nodes, 64G mem > - Fedora 17 with test kernels (see tests) > > = benchmark = > - one cpu hog F17 VM > - 64 vcpus, 8G mem > - all vcpus run a bash infinite loop > - kernel: 3.5.0-rc3 > - one kcbench (a.k.a kernbench) F17 VM > - 8 vcpus, 8G mem > - 'kcbench -d /mnt/ram', /mnt/ram is 1G ramfs may be we have to check whether 1GB RAM is ok when we have 128 threads, not sure.. > - kcbench-0.3-8.1.noarch, kcbench-data-2.6.38-0.1-9.fc17.noarch, > kcbench-data-0.1-9.fc17.noarch > - gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5) > - kernel: same test kernel as host > > = test 1: base, PLE disabled (ple_gap=0) = > - kernel: 3.5.0-rc3 + Rik's last_boosted patch > > Run 1 (-j 16): 4211 (e:237.43 P:637% U:697.98 S:815.46 F:0) > Run 2 (-j 16): 3834 (e:260.77 P:631% U:729.69 S:917.56 F:0) > Run 3 (-j 16): 4784 (e:208.99 P:644% U:638.17 S:708.63 F:0) > > mean: 235.730 stddev: 25.932 > > = test 2: base, PLE enabled = > - kernel: 3.5.0-rc3 + Rik's last_boosted patch > > Run 1 (-j 16): 4335 (e:230.67 P:639% U:657.74 S:818.28 F:0) > Run 2 (-j 16): 4269 (e:234.20 P:647% U:743.43 S:772.52 F:0) > Run 3 (-j 16): 3974 (e:251.59 P:639% U:724.29 S:884.21 F:0) > > mean: 238.820 stddev: 11.199 > > = test 3: rand_start, PLE enabled = > - kernel: 3.5.0-rc3 + Raghu's random start patch > > Run 1 (-j 16): 3898 (e:256.52 P:639% U:756.14 S:884.63 F:0) > Run 2 (-j 16): 3341 (e:299.27 P:633% U:857.49 S:1039.62 F:0) > Run 3 (-j 16): 3403 (e:293.79 P:635% U:857.21 S:1008.83 F:0) > > mean: 283.193 stddev: 23.262 > > = test 4: pvticketlocks, PLE disabled (ple_gap=0) = > - kernel: 3.5.0-rc3 + Rik's last_boosted patch + Raghu's pvticketlock series > + PARAVIRT_SPINLOCKS=y config change > > Run 1 (-j 16): 3963 (e:252.29 P:647% U:736.43 S:897.16 F:0) > Run 2 (-j 16): 4216 (e:237.19 P:650% U:706.68 S:837.42 F:0) > Run 3 (-j 16): 4073 (e:245.48 P:649% U:709.46 S:884.68 F:0) > > mean: 244.987 stddev: 7.562 > > = test 5: pvticketlocks, PLE enabled = > - kernel: 3.5.0-rc3 + Rik's last_boosted patch + Raghu's pvticketlock series > + PARAVIRT_SPINLOCKS=y config change > > Run 1 (-j 16): 3978 (e:251.32 P:629% U:758.86 S:824.29 F:0) > Run 2 (-j 16): 4369 (e:228.84 P:634% U:708.32 S:743.71 F:0) > Run 3 (-j 16): 3807 (e:262.63 P:626% U:767.03 S:877.96 F:0) > > mean: 247.597 stddev: 17.200 > > Ok in summary, can we agree like, for kernbench 1x= -j (2*#vcpu) in 1 vm. 1.5x = -j (2*#vcpu) in 1 vm and -j (#vcpu) in other.. and so on. also a SPIN_THRESHOLD of 4k? Any ideas on benchmarks is welcome from all. - Raghu ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-22 21:00 ` Raghavendra K T @ 2012-06-23 18:34 ` Raghavendra K T 2012-06-27 20:27 ` Raghavendra K T 0 siblings, 1 reply; 21+ messages in thread From: Raghavendra K T @ 2012-06-23 18:34 UTC (permalink / raw) To: Andrew Jones Cc: Rik van Riel, Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML, Gleb Natapov, chegu_vinod, Jeremy Fitzhardinge On 06/23/2012 02:30 AM, Raghavendra K T wrote: > On 06/22/2012 08:41 PM, Andrew Jones wrote: >> On Thu, Jun 21, 2012 at 04:56:08PM +0530, Raghavendra K T wrote: >>> Here are the results from kernbench. >>> >>> PS: I think we have to only take that, both the patches perform better, >>> than reading into actual numbers since I am seeing more variance in >>> especially 3x. may be I can test with some more stable benchmark if >>> somebody points >>> [...] > can we agree like, for kernbench 1x= -j (2*#vcpu) in 1 vm. > 1.5x = -j (2*#vcpu) in 1 vm and -j (#vcpu) in other.. and so on. > also a SPIN_THRESHOLD of 4k? Please forget about 1.5x above. I am not too sure on that. > > Any ideas on benchmarks is welcome from all. > My run for other benchmarks did not have Rik's patches, so re-spinning everything with that now. Here is the detailed info on env and benchmark I am currently trying. Let me know if you have any comments ======= kernel 3.5.0-rc1 with Rik's Ple handler fix as base Machine : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz, 4 numa node, 256GB RAM, 32 core machine Host: enterprise linux gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) with test kernels Guest: fedora 16 with different built-in kernel from same source tree. 32 vcpus 8GB memory. (configs not changed with patches except for CONFIG_PARAVIRT_SPINLOCK) Note: for Pv patches, SPIN_THRESHOLD is set to 4k Benchmarks: 1) kernbench: kernbench-0.50 cmd: echo "3" > /proc/sys/vm/drop_caches ccache -C kernbench -f -H -M -o 2*vcpu Very first run in kernbench is omitted. 2) dbench: dbench version 4.00 cmd: dbench --warmup=30 -t 120 2*vcpu 3) hackbench: https://build.opensuse.org/package/files?package=hackbench&project=benchmark hackbench.c modified with loops=10000 used hackbench with num-threads = 2* vcpu 4) Specjbb: specjbb2000-1.02 Input Properties: ramp_up_seconds = 30 measurement_seconds = 120 forcegc = true starting_number_warehouses = 1 increment_number_warehouses = 1 ending_number_warehouses = 8 5) sysbench: 0.4.12 sysbench --test=oltp --db-driver=pgsql prepare sysbench --num-threads=2*vcpu --max-requests=100000 --test=oltp --oltp-table-size=500000 --db-driver=pgsql --oltp-read-only run Note that driver for this pgsql. 6) ebizzy: release 0.3 cmd: ebizzy -S 120 - specjbb ran for 1x and 2x others mostly for 1x, 2x, 3x overcommit. - overcommit of 2x means same benchmark running on 2 guests. - sample for each overcommit is mostly 8 Note: I ran kernbench with old kernbench0.50, may be I can try kcbench with ramfs if necessary will soon come with detailed results > - Raghu ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-23 18:34 ` Raghavendra K T @ 2012-06-27 20:27 ` Raghavendra K T 2012-06-27 20:29 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case with benchmark detail attachment Raghavendra K T 2012-06-28 16:00 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Andrew Jones 0 siblings, 2 replies; 21+ messages in thread From: Raghavendra K T @ 2012-06-27 20:27 UTC (permalink / raw) To: Andrew Jones, Avi Kivity, Ingo Molnar Cc: Rik van Riel, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, chegu_vinod, Jeremy Fitzhardinge On 06/24/2012 12:04 AM, Raghavendra K T wrote: > On 06/23/2012 02:30 AM, Raghavendra K T wrote: >> On 06/22/2012 08:41 PM, Andrew Jones wrote: [...] > My run for other benchmarks did not have Rik's patches, so re-spinning > everything with that now. > > Here is the detailed info on env and benchmark I am currently trying. > Let me know if you have any comments > > ======= > kernel 3.5.0-rc1 with Rik's Ple handler fix as base > > Machine : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz, 4 numa node, 256GB RAM, > 32 core machine > > Host: enterprise linux gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) > (GCC) with test kernels > Guest: fedora 16 with different built-in kernel from same source tree. > 32 vcpus 8GB memory. (configs not changed with patches except for > CONFIG_PARAVIRT_SPINLOCK) > > Note: for Pv patches, SPIN_THRESHOLD is set to 4k > > Benchmarks: > 1) kernbench: kernbench-0.50 > > cmd: > echo "3" > /proc/sys/vm/drop_caches > ccache -C > kernbench -f -H -M -o 2*vcpu > > Very first run in kernbench is omitted. > > 2) dbench: dbench version 4.00 > cmd: dbench --warmup=30 -t 120 2*vcpu > > 3) hackbench: >https://build.opensuse.org/package/files?package=hackbench&project=benchmark > > hackbench.c modified with loops=10000 > used hackbench with num-threads = 2* vcpu > > 4) Specjbb: specjbb2000-1.02 > Input Properties: > ramp_up_seconds = 30 > measurement_seconds = 120 > forcegc = true > starting_number_warehouses = 1 > increment_number_warehouses = 1 > ending_number_warehouses = 8 > > > 5) sysbench: 0.4.12 > sysbench --test=oltp --db-driver=pgsql prepare > sysbench --num-threads=2*vcpu --max-requests=100000 --test=oltp > --oltp-table-size=500000 --db-driver=pgsql --oltp-read-only run > Note that driver for this pgsql. > > > 6) ebizzy: release 0.3 > cmd: ebizzy -S 120 > > - specjbb ran for 1x and 2x others mostly for 1x, 2x, 3x overcommit. > - overcommit of 2x means same benchmark running on 2 guests. > - sample for each overcommit is mostly 8 > > Note: I ran kernbench with old kernbench0.50, may be I can try kcbench > with ramfs if necessary > > will soon come with detailed results With the above env, Here is the result I have for 4k SPIN_THRESHOLD. Lower is better for following benchmarks: kernbench: (time in sec) hackbench: (time in sec) sysbench : (time in sec) Higher is better for following benchmarks: specjbb: score (Throughput) dbench : Throughput in MB/sec ebizzy : records/sec In summary, current PV has huge benefit on non-PLE machine. On PLE machine, the results become very sensitive to load, type of workload and SPIN_THRESHOLD. Also PLE interference has significant effect on them. But still it has slight edge over non PV. Overall, specjbb, sysbench, kernbench seem to do well with PV. dbench has been little unreliable (same reason I have not published 2x, 3x result but experimental values are included in tarball) but seem to be on par with PV hackbench non-overcommit case is better and ebizzy overcommit case is better. [ebizzy seems to very sensitive w.r.t SPIN_THRESHOLD]. I have still not experimented with SPIN_THRESHOLD of 2k/8k and w/, w/o PLE after having Rik's fix. +-----------+-----------+-----------+------------+---------+ specjbb +-----------+-----------+-----------+------------+---------+ | value | stdev | value | stdev | %improve| +-----------+-----------+-----------+------------+---------+ |114232.2500|21774.0660 |122591.0000| 18239.0900 | 7.31733 | |112154.5000|19696.6860 |113386.2500| 22262.5890 | 1.09826 | +-----------+-----------+-----------+------------+---------+ +-----------+-----------+-----------+------------+---------+ kernbench +-----------+-----------+-----------+------------+---------+ | value | stdev | value | stdev | %improve| +-----------+-----------+-----------+------------+---------+ | 48.9150 | 0.8608 | 48.5550 | 0.7372 | 0.74143 | | 96.3691 | 7.9724 | 96.6367 | 1.6938 |-0.27691 | | 192.6972 | 9.1881 | 188.3195 | 8.1267 | 2.32461 | | 320.6500 | 29.6892 | 302.1225 | 16.0515 | 6.13245 | ++-----------+-----------+-----------+------------+---------+ +-----------+-----------+-----------+------------+---------+ sysbench +-----------+-----------+-----------+------------+---------+ | value | stdev | value | stdev | %improve| +-----------+-----------+-----------+------------+---------+ | 12.4082 | 0.2370 | 12.2797 | 0.1037 | 1.04644 | | 14.1705 | 0.4272 | 14.0300 | 1.1478 | 1.00143 | | 19.3769 | 1.0833 | 18.9745 | 0.0560 | 2.12074 | | 24.5373 | 1.3237 | 22.3078 | 0.8999 | 9.99426 | +-----------+-----------+-----------+------------+---------+ +-----------+-----------+-----------+------------+---------+ hackbench +-----------+-----------+-----------+------------+---------+ | value | stdev | value | stdev | %improve| +-----------+-----------+-----------+------------+---------+ | 73.2627 | 11.2413 | 67.5125 | 2.5722 | 8.51724| | 134.4294 | 1.9688 | 153.6160 | 5.2033 |-12.48998| | 215.4521 | 3.8672 | 238.8965 | 3.0035 | -9.81362| | 303.8553 | 5.0427 | 310.3569 | 6.1463 | -2.09488| ++-----------+-----------+-----------+------------+--------+ +-----------+-----------+-----------+------------+---------+ ebizzy +-----------+-----------+-----------+------------+---------+ | value | stdev | value | stdev | %improve| +-----------+-----------+-----------+------------+---------+ | 1108.6250 | 19.3090 | 1088.2500 | 11.0809 |-1.83786 | | 1662.6250 | 150.5466 | 1064.0000 | 2.8284 |-36.00481| | 1394.0000 | 85.0867 | 1073.2857 | 10.3877 |-23.00676| | 1172.1250 | 20.3501 | 1245.8750 | 25.3852 | 6.29199 | +-----------+-----------+-----------+------------+---------+ +-----------+-----------+-----------+------------+---------+ dbench +-----------+-----------+-----------+------------+---------+ | value | stdev | value | stdev | %improve| +-----------+-----------+-----------+------------+---------+ | 29.0378 | 1.1625 | 28.8466 | 1.1132 |-0.65845 | +-----------+-----------+-----------+------------+---------+ (benchmark values will be attached in reply to this mail) Planning to post patches rebased to 3.5-rc. Avi, Ingo.. Please let me know. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case with benchmark detail attachment 2012-06-27 20:27 ` Raghavendra K T @ 2012-06-27 20:29 ` Raghavendra K T 2012-06-28 16:00 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Andrew Jones 1 sibling, 0 replies; 21+ messages in thread From: Raghavendra K T @ 2012-06-27 20:29 UTC (permalink / raw) To: Andrew Jones, Avi Kivity, Ingo Molnar Cc: Rik van Riel, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, chegu_vinod, Jeremy Fitzhardinge [-- Attachment #1: Type: text/plain, Size: 262 bytes --] On 06/28/2012 01:57 AM, Raghavendra K T wrote: > On 06/24/2012 12:04 AM, Raghavendra K T wrote: >> On 06/23/2012 02:30 AM, Raghavendra K T wrote: >>> On 06/22/2012 08:41 PM, Andrew Jones wrote: [...] > > (benchmark values will be attached in reply to this mail) [-- Attachment #2: pv_benchmark_summary.bz2 --] [-- Type: application/x-bzip, Size: 7068 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-27 20:27 ` Raghavendra K T 2012-06-27 20:29 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case with benchmark detail attachment Raghavendra K T @ 2012-06-28 16:00 ` Andrew Jones 2012-06-28 16:22 ` Raghavendra K T 1 sibling, 1 reply; 21+ messages in thread From: Andrew Jones @ 2012-06-28 16:00 UTC (permalink / raw) To: Raghavendra K T Cc: Rik van Riel, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, chegu vinod, Jeremy Fitzhardinge, Avi Kivity, Ingo Molnar ----- Original Message ----- > In summary, current PV has huge benefit on non-PLE machine. > > On PLE machine, the results become very sensitive to load, type of > workload and SPIN_THRESHOLD. Also PLE interference has significant > effect on them. But still it has slight edge over non PV. > Hi Raghu, sorry for my slow response. I'm on vacation right now (until the 9th of July) and I have limited access to mail. Also, thanks for continuing the benchmarking. Question, when you compare PLE vs. non-PLE, are you using different machines (one with and one without), or are you disabling its use by loading the kvm module with the ple_gap=0 modparam as I did? Drew ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-28 16:00 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Andrew Jones @ 2012-06-28 16:22 ` Raghavendra K T 2012-06-28 22:55 ` Vinod, Chegu 0 siblings, 1 reply; 21+ messages in thread From: Raghavendra K T @ 2012-06-28 16:22 UTC (permalink / raw) To: Andrew Jones Cc: Rik van Riel, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, chegu vinod, Jeremy Fitzhardinge, Avi Kivity, Ingo Molnar On 06/28/2012 09:30 PM, Andrew Jones wrote: > > > ----- Original Message ----- >> In summary, current PV has huge benefit on non-PLE machine. >> >> On PLE machine, the results become very sensitive to load, type of >> workload and SPIN_THRESHOLD. Also PLE interference has significant >> effect on them. But still it has slight edge over non PV. >> > > Hi Raghu, > > sorry for my slow response. I'm on vacation right now (until the > 9th of July) and I have limited access to mail. Ok. Happy Vacation :) Also, thanks for > continuing the benchmarking. Question, when you compare PLE vs. > non-PLE, are you using different machines (one with and one > without), or are you disabling its use by loading the kvm module > with the ple_gap=0 modparam as I did? Yes, I am doing the same when I say with PLE disabled and comparing the benchmarks (i.e loading kvm module with ple_gap=0). But older non-PLE results were on a different machine altogether. (I had limited access to PLE machine). ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-28 16:22 ` Raghavendra K T @ 2012-06-28 22:55 ` Vinod, Chegu 0 siblings, 0 replies; 21+ messages in thread From: Vinod, Chegu @ 2012-06-28 22:55 UTC (permalink / raw) To: Raghavendra K T, Andrew Jones Cc: Rik van Riel, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, Jeremy Fitzhardinge, Avi Kivity, Ingo Molnar [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2371 bytes --] Hello, I am just catching up on this email thread... Perhaps one of you may be able to help answer this query.. preferably along with some data. [BTW, I do understand the basic intent behind PLE in a typical [sweet spot] use case where there is over subscription etc. and the need to optimize the PLE handler in the host etc. ] In a use case where the host has fewer but much larger guests (say 40VCPUs and higher) and there is no over subscription (i.e. # of vcpus across guests <= physical cpus in the host and perhaps each guest has their vcpu's pinned to specific physical cpus for other reasons), I would like to understand if/how the PLE really helps ? For these use cases would it be ok to turn PLE off (ple_gap=0) since is no real need to take an exit and find some other VCPU to yield to ? Thanks Vinod -----Original Message----- From: Raghavendra K T [mailto:raghavendra.kt@linux.vnet.ibm.com] Sent: Thursday, June 28, 2012 9:22 AM To: Andrew Jones Cc: Rik van Riel; Marcelo Tosatti; Srikar; Srivatsa Vaddagiri; Peter Zijlstra; Nikunj A. Dadhania; KVM; LKML; Gleb Natapov; Vinod, Chegu; Jeremy Fitzhardinge; Avi Kivity; Ingo Molnar Subject: Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case On 06/28/2012 09:30 PM, Andrew Jones wrote: > > > ----- Original Message ----- >> In summary, current PV has huge benefit on non-PLE machine. >> >> On PLE machine, the results become very sensitive to load, type of >> workload and SPIN_THRESHOLD. Also PLE interference has significant >> effect on them. But still it has slight edge over non PV. >> > > Hi Raghu, > > sorry for my slow response. I'm on vacation right now (until the 9th > of July) and I have limited access to mail. Ok. Happy Vacation :) Also, thanks for > continuing the benchmarking. Question, when you compare PLE vs. > non-PLE, are you using different machines (one with and one without), > or are you disabling its use by loading the kvm module with the > ple_gap=0 modparam as I did? Yes, I am doing the same when I say with PLE disabled and comparing the benchmarks (i.e loading kvm module with ple_gap=0). But older non-PLE results were on a different machine altogether. (I had limited access to PLE machine). ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] kvm: handle last_boosted_vcpu = 0 case @ 2012-06-28 22:55 ` Vinod, Chegu 0 siblings, 0 replies; 21+ messages in thread From: Vinod, Chegu @ 2012-06-28 22:55 UTC (permalink / raw) To: Raghavendra K T, Andrew Jones Cc: Rik van Riel, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, Jeremy Fitzhardinge, Avi Kivity, Ingo Molnar Hello, I am just catching up on this email thread... Perhaps one of you may be able to help answer this query.. preferably along with some data. [BTW, I do understand the basic intent behind PLE in a typical [sweet spot] use case where there is over subscription etc. and the need to optimize the PLE handler in the host etc. ] In a use case where the host has fewer but much larger guests (say 40VCPUs and higher) and there is no over subscription (i.e. # of vcpus across guests <= physical cpus in the host and perhaps each guest has their vcpu's pinned to specific physical cpus for other reasons), I would like to understand if/how the PLE really helps ? For these use cases would it be ok to turn PLE off (ple_gap=0) since is no real need to take an exit and find some other VCPU to yield to ? Thanks Vinod -----Original Message----- From: Raghavendra K T [mailto:raghavendra.kt@linux.vnet.ibm.com] Sent: Thursday, June 28, 2012 9:22 AM To: Andrew Jones Cc: Rik van Riel; Marcelo Tosatti; Srikar; Srivatsa Vaddagiri; Peter Zijlstra; Nikunj A. Dadhania; KVM; LKML; Gleb Natapov; Vinod, Chegu; Jeremy Fitzhardinge; Avi Kivity; Ingo Molnar Subject: Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case On 06/28/2012 09:30 PM, Andrew Jones wrote: > > > ----- Original Message ----- >> In summary, current PV has huge benefit on non-PLE machine. >> >> On PLE machine, the results become very sensitive to load, type of >> workload and SPIN_THRESHOLD. Also PLE interference has significant >> effect on them. But still it has slight edge over non PV. >> > > Hi Raghu, > > sorry for my slow response. I'm on vacation right now (until the 9th > of July) and I have limited access to mail. Ok. Happy Vacation :) Also, thanks for > continuing the benchmarking. Question, when you compare PLE vs. > non-PLE, are you using different machines (one with and one without), > or are you disabling its use by loading the kvm module with the > ple_gap=0 modparam as I did? Yes, I am doing the same when I say with PLE disabled and comparing the benchmarks (i.e loading kvm module with ple_gap=0). But older non-PLE results were on a different machine altogether. (I had limited access to PLE machine). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-28 22:55 ` Vinod, Chegu (?) @ 2012-07-02 14:49 ` Rik van Riel 2012-07-03 3:30 ` Raghavendra K T 2012-07-05 14:45 ` Andrew Theurer -1 siblings, 2 replies; 21+ messages in thread From: Rik van Riel @ 2012-07-02 14:49 UTC (permalink / raw) To: Vinod, Chegu Cc: Raghavendra K T, Andrew Jones, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, Jeremy Fitzhardinge, Avi Kivity, Ingo Molnar On 06/28/2012 06:55 PM, Vinod, Chegu wrote: > Hello, > > I am just catching up on this email thread... > > Perhaps one of you may be able to help answer this query.. preferably along with some data. [BTW, I do understand the basic intent behind PLE in a typical [sweet spot] use case where there is over subscription etc. and the need to optimize the PLE handler in the host etc. ] > > In a use case where the host has fewer but much larger guests (say 40VCPUs and higher) and there is no over subscription (i.e. # of vcpus across guests<= physical cpus in the host and perhaps each guest has their vcpu's pinned to specific physical cpus for other reasons), I would like to understand if/how the PLE really helps ? For these use cases would it be ok to turn PLE off (ple_gap=0) since is no real need to take an exit and find some other VCPU to yield to ? Yes, that should be ok. On a related note, I wonder if we should increase the ple_gap significantly. After all, 4096 cycles of spinning is not that much, when you consider how much time is spent doing the subsequent vmexit, scanning the other VCPU's status (200 cycles per cache miss), deciding what to do, maybe poking another CPU, and eventually a vmenter. A factor 4 increase in ple_gap might be what it takes to get the amount of time spent spinning equal to the amount of time spent on the host side doing KVM stuff... -- All rights reversed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-07-02 14:49 ` Rik van Riel @ 2012-07-03 3:30 ` Raghavendra K T 2012-07-05 14:45 ` Andrew Theurer 1 sibling, 0 replies; 21+ messages in thread From: Raghavendra K T @ 2012-07-03 3:30 UTC (permalink / raw) To: Rik van Riel, Vinod, Chegu Cc: Andrew Jones, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, Jeremy Fitzhardinge, Avi Kivity, Ingo Molnar On 07/02/2012 08:19 PM, Rik van Riel wrote: > On 06/28/2012 06:55 PM, Vinod, Chegu wrote: >> Hello, >> >> I am just catching up on this email thread... >> >> Perhaps one of you may be able to help answer this query.. preferably >> along with some data. [BTW, I do understand the basic intent behind >> PLE in a typical [sweet spot] use case where there is over >> subscription etc. and the need to optimize the PLE handler in the host >> etc. ] >> >> In a use case where the host has fewer but much larger guests (say >> 40VCPUs and higher) and there is no over subscription (i.e. # of vcpus >> across guests<= physical cpus in the host and perhaps each guest has >> their vcpu's pinned to specific physical cpus for other reasons), I >> would like to understand if/how the PLE really helps ? For these use >> cases would it be ok to turn PLE off (ple_gap=0) since is no real need >> to take an exit and find some other VCPU to yield to ? > > Yes, that should be ok. I think this should be true when we have ple_window tuned to correct value for guest. (same what you raised) But otherwise, IMO, it is a very tricky question to answer. PLE is currently benefiting even flush_tlb_ipi etc apart from spinlock. Having a properly tuned value for all types of workload, (+load) is really complicated. Coming back to ple_handler, IMHO, if we have slight increase in run_queue length, having directed yield may worsen the scenario. (In the case Vinod explained, even-though we will succeed in setting other vcpu task as next_buddy, caller itself gets scheduled out, so ganging effect reduces. on top of this we always have a question, have we chosen right guy OR a really bad guy for yielding.) > > On a related note, I wonder if we should increase the ple_gap > significantly. Did you mean ple_window? > > After all, 4096 cycles of spinning is not that much, when you > consider how much time is spent doing the subsequent vmexit, > scanning the other VCPU's status (200 cycles per cache miss), > deciding what to do, maybe poking another CPU, and eventually > a vmenter. > > A factor 4 increase in ple_gap might be what it takes to > get the amount of time spent spinning equal to the amount of > time spent on the host side doing KVM stuff... > I agree, I am experimenting with all these things left and right, along with several optimization ideas I have. Hope to comeback on the experiments soon. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-07-02 14:49 ` Rik van Riel 2012-07-03 3:30 ` Raghavendra K T @ 2012-07-05 14:45 ` Andrew Theurer 1 sibling, 0 replies; 21+ messages in thread From: Andrew Theurer @ 2012-07-05 14:45 UTC (permalink / raw) To: Rik van Riel Cc: Vinod, Chegu, Raghavendra K T, Andrew Jones, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, LKML, Gleb Natapov, Jeremy Fitzhardinge, Avi Kivity, Ingo Molnar On Mon, 2012-07-02 at 10:49 -0400, Rik van Riel wrote: > On 06/28/2012 06:55 PM, Vinod, Chegu wrote: > > Hello, > > > > I am just catching up on this email thread... > > > > Perhaps one of you may be able to help answer this query.. preferably along with some data. [BTW, I do understand the basic intent behind PLE in a typical [sweet spot] use case where there is over subscription etc. and the need to optimize the PLE handler in the host etc. ] > > > > In a use case where the host has fewer but much larger guests (say 40VCPUs and higher) and there is no over subscription (i.e. # of vcpus across guests<= physical cpus in the host and perhaps each guest has their vcpu's pinned to specific physical cpus for other reasons), I would like to understand if/how the PLE really helps ? For these use cases would it be ok to turn PLE off (ple_gap=0) since is no real need to take an exit and find some other VCPU to yield to ? > > Yes, that should be ok. > > On a related note, I wonder if we should increase the ple_gap > significantly. > > After all, 4096 cycles of spinning is not that much, when you > consider how much time is spent doing the subsequent vmexit, > scanning the other VCPU's status (200 cycles per cache miss), > deciding what to do, maybe poking another CPU, and eventually > a vmenter. > > A factor 4 increase in ple_gap might be what it takes to > get the amount of time spent spinning equal to the amount of > time spent on the host side doing KVM stuff... I was recently thinking the same thing as I have observed over 180,000 exits/sec from a 40-way VM on a 80-way host, where there should be no cpu overcommit. Also, the number of directed yields for this was only 1800/sec, so we have a 1% usefulness for our exits. I am wondering if the ple_window should be similar to the host scheduler task switching granularity, and not what we think a typical max cycles should be for holding a lock. BTW, I have a patch to add a couple PLE stats to kvmstat which I will send out shortly. -Andrew ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-19 20:51 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Rik van Riel 2012-06-20 20:12 ` Raghavendra K T @ 2012-06-21 6:43 ` Gleb Natapov 2012-06-21 10:23 ` Raghavendra K T 2012-06-28 2:14 ` Raghavendra K T 2012-07-06 17:11 ` Marcelo Tosatti 2 siblings, 2 replies; 21+ messages in thread From: Gleb Natapov @ 2012-06-21 6:43 UTC (permalink / raw) To: Rik van Riel Cc: Raghavendra K T, Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote: > On Wed, 20 Jun 2012 01:50:50 +0530 > Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote: > > > > > In ple handler code, last_boosted_vcpu (lbv) variable is > > serving as reference point to start when we enter. > > > Also statistical analysis (below) is showing lbv is not very well > > distributed with current approach. > > You are the second person to spot this bug today (yes, today). > > Due to time zones, the first person has not had a chance yet to > test the patch below, which might fix the issue... > > Please let me know how it goes. > > ====8<==== > > If last_boosted_vcpu == 0, then we fall through all test cases and > may end up with all VCPUs pouncing on vcpu 0. With a large enough > guest, this can result in enormous runqueue lock contention, which > can prevent vcpu0 from running, leading to a livelock. > > Changing < to <= makes sure we properly handle that case. > > Signed-off-by: Rik van Riel <riel@redhat.com> > --- > virt/kvm/kvm_main.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 7e14068..1da542b 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > */ > for (pass = 0; pass < 2 && !yielded; pass++) { > kvm_for_each_vcpu(i, vcpu, kvm) { > - if (!pass && i < last_boosted_vcpu) { > + if (!pass && i <= last_boosted_vcpu) { > i = last_boosted_vcpu; > continue; > } else if (pass && i > last_boosted_vcpu) > Looks correct. We can simplify this by introducing something like: #define kvm_for_each_vcpu_from(idx, n, vcpup, kvm) \ for (n = atomic_read(&kvm->online_vcpus); \ n && (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ n--, idx = (idx+1) % atomic_read(&kvm->online_vcpus)) -- Gleb. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-21 6:43 ` Gleb Natapov @ 2012-06-21 10:23 ` Raghavendra K T 2012-06-28 2:14 ` Raghavendra K T 1 sibling, 0 replies; 21+ messages in thread From: Raghavendra K T @ 2012-06-21 10:23 UTC (permalink / raw) To: Gleb Natapov Cc: Rik van Riel, Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML On 06/21/2012 12:13 PM, Gleb Natapov wrote: > On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote: >> On Wed, 20 Jun 2012 01:50:50 +0530 >> Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> wrote: >> >>> >>> In ple handler code, last_boosted_vcpu (lbv) variable is >>> serving as reference point to start when we enter. >> >>> Also statistical analysis (below) is showing lbv is not very well >>> distributed with current approach. >> >> You are the second person to spot this bug today (yes, today). >> >> Due to time zones, the first person has not had a chance yet to >> test the patch below, which might fix the issue... >> >> Please let me know how it goes. >> >> ====8<==== >> >> If last_boosted_vcpu == 0, then we fall through all test cases and >> may end up with all VCPUs pouncing on vcpu 0. With a large enough >> guest, this can result in enormous runqueue lock contention, which >> can prevent vcpu0 from running, leading to a livelock. >> >> Changing< to<= makes sure we properly handle that case. >> >> Signed-off-by: Rik van Riel<riel@redhat.com> >> --- >> virt/kvm/kvm_main.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 7e14068..1da542b 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) >> */ >> for (pass = 0; pass< 2&& !yielded; pass++) { >> kvm_for_each_vcpu(i, vcpu, kvm) { >> - if (!pass&& i< last_boosted_vcpu) { >> + if (!pass&& i<= last_boosted_vcpu) { >> i = last_boosted_vcpu; >> continue; >> } else if (pass&& i> last_boosted_vcpu) >> > Looks correct. We can simplify this by introducing something like: > > #define kvm_for_each_vcpu_from(idx, n, vcpup, kvm) \ > for (n = atomic_read(&kvm->online_vcpus); \ > n&& (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ > n--, idx = (idx+1) % atomic_read(&kvm->online_vcpus)) > Thumbs up for this simplification. This really helps in all the places where we want to start iterating from middle. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-21 6:43 ` Gleb Natapov 2012-06-21 10:23 ` Raghavendra K T @ 2012-06-28 2:14 ` Raghavendra K T 1 sibling, 0 replies; 21+ messages in thread From: Raghavendra K T @ 2012-06-28 2:14 UTC (permalink / raw) To: Gleb Natapov Cc: Rik van Riel, Avi Kivity, Marcelo Tosatti, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML On 06/21/2012 12:13 PM, Gleb Natapov wrote: > On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote: >> On Wed, 20 Jun 2012 01:50:50 +0530 >> Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com> wrote: >> >>> >>> In ple handler code, last_boosted_vcpu (lbv) variable is >>> serving as reference point to start when we enter. >> >>> Also statistical analysis (below) is showing lbv is not very well >>> distributed with current approach. >> >> You are the second person to spot this bug today (yes, today). >> >> Due to time zones, the first person has not had a chance yet to >> test the patch below, which might fix the issue... >> >> Please let me know how it goes. >> >> ====8<==== >> >> If last_boosted_vcpu == 0, then we fall through all test cases and >> may end up with all VCPUs pouncing on vcpu 0. With a large enough >> guest, this can result in enormous runqueue lock contention, which >> can prevent vcpu0 from running, leading to a livelock. >> >> Changing< to<= makes sure we properly handle that case. >> >> Signed-off-by: Rik van Riel<riel@redhat.com> >> --- >> virt/kvm/kvm_main.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 7e14068..1da542b 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) >> */ >> for (pass = 0; pass< 2&& !yielded; pass++) { >> kvm_for_each_vcpu(i, vcpu, kvm) { >> - if (!pass&& i< last_boosted_vcpu) { >> + if (!pass&& i<= last_boosted_vcpu) { >> i = last_boosted_vcpu; >> continue; >> } else if (pass&& i> last_boosted_vcpu) >> > Looks correct. We can simplify this by introducing something like: > > #define kvm_for_each_vcpu_from(idx, n, vcpup, kvm) \ > for (n = atomic_read(&kvm->online_vcpus); \ > n&& (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ > n--, idx = (idx+1) % atomic_read(&kvm->online_vcpus)) > Gleb, Rik, Any updates on this or Rik's patch status? I can come up with the above suggested cleanup patch with Gleb's from,sob. Please let me know. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case 2012-06-19 20:51 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Rik van Riel 2012-06-20 20:12 ` Raghavendra K T 2012-06-21 6:43 ` Gleb Natapov @ 2012-07-06 17:11 ` Marcelo Tosatti 2 siblings, 0 replies; 21+ messages in thread From: Marcelo Tosatti @ 2012-07-06 17:11 UTC (permalink / raw) To: Rik van Riel Cc: Raghavendra K T, Avi Kivity, Srikar, Srivatsa Vaddagiri, Peter Zijlstra, Nikunj A. Dadhania, KVM, Ingo Molnar, LKML On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote: > On Wed, 20 Jun 2012 01:50:50 +0530 > Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote: > > > > > In ple handler code, last_boosted_vcpu (lbv) variable is > > serving as reference point to start when we enter. > > > Also statistical analysis (below) is showing lbv is not very well > > distributed with current approach. > > You are the second person to spot this bug today (yes, today). > > Due to time zones, the first person has not had a chance yet to > test the patch below, which might fix the issue... > > Please let me know how it goes. > > ====8<==== > > If last_boosted_vcpu == 0, then we fall through all test cases and > may end up with all VCPUs pouncing on vcpu 0. With a large enough > guest, this can result in enormous runqueue lock contention, which > can prevent vcpu0 from running, leading to a livelock. > > Changing < to <= makes sure we properly handle that case. > > Signed-off-by: Rik van Riel <riel@redhat.com> Applied, thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-07-06 18:12 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-06-19 20:20 Regarding improving ple handler (vcpu_on_spin) Raghavendra K T 2012-06-19 20:51 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Rik van Riel 2012-06-20 20:12 ` Raghavendra K T 2012-06-21 2:11 ` Rik van Riel 2012-06-21 11:26 ` Raghavendra K T 2012-06-22 15:11 ` Andrew Jones 2012-06-22 21:00 ` Raghavendra K T 2012-06-23 18:34 ` Raghavendra K T 2012-06-27 20:27 ` Raghavendra K T 2012-06-27 20:29 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case with benchmark detail attachment Raghavendra K T 2012-06-28 16:00 ` [PATCH] kvm: handle last_boosted_vcpu = 0 case Andrew Jones 2012-06-28 16:22 ` Raghavendra K T 2012-06-28 22:55 ` Vinod, Chegu 2012-06-28 22:55 ` Vinod, Chegu 2012-07-02 14:49 ` Rik van Riel 2012-07-03 3:30 ` Raghavendra K T 2012-07-05 14:45 ` Andrew Theurer 2012-06-21 6:43 ` Gleb Natapov 2012-06-21 10:23 ` Raghavendra K T 2012-06-28 2:14 ` Raghavendra K T 2012-07-06 17:11 ` Marcelo Tosatti
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.