From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Mon, 14 Feb 2011 17:57:09 +0000 Message-ID: References: <4D41FD3A.5090506@amd.com> <201102021539.06664.stephan.diestelhorst@amd.com> <4D4974D1.1080503@ts.fujitsu.com> <201102021701.05665.stephan.diestelhorst@amd.com> <4D4A43B7.5040707@ts.fujitsu.com> <4D4A72D8.3020502@ts.fujitsu.com> <4D4C08B6.30600@amd.com> <4D4FE7E2.9070605@amd.com> <4D4FF452.6060508@ts.fujitsu.com> <4D50D80F.9000007@ts.fujitsu.com> <4D517051.10402@amd.com> <4D529BD9.5050200@amd.com> <4D52A2CD.9090507@ts.fujitsu.com> <4D5388DF.8040900@ts.fujitsu.com> <4D53AF27.7030909@amd.com> <4D53F3BC.4070807@amd.com> <4D54D478.9000402@ts.fujitsu.com> <4D54E79E.3000800@amd.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=0016e6da7ae3e5a5e8049c41c690 Return-path: In-Reply-To: <4D54E79E.3000800@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andre Przywara Cc: Juergen Gross , "xen-devel@lists.xensource.com" , "Diestelhorst, Stephan" List-Id: xen-devel@lists.xenproject.org --0016e6da7ae3e5a5e8049c41c690 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The good news is, I've managed to reproduce this on my local test hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the attached script. It's time to go home now, but I should be able to dig something up tomorrow. To use the script: * Rename cpupool0 to "p0", and create an empty second pool, "p1" * You can modify elements by adding "arg=3Dval" as arguments. * Arguments are: + dryrun=3D{true,false} Do the work, but don't actually execute any xl arguments. Default false. + left: Number commands to execute. Default 10. + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is 8 c= pus). + verbose=3D{true,false} Print what you're doing. Default is true. The script sometimes attempts to remove the last cpu from cpupool0; in this case, libxl will print an error. If the script gets an error under that condition, it will ignore it; under any other condition, it will print diagnostic information. What finally crashed it for me was this command: # ./cpupool-test.sh verbose=3Dfalse left=3D1000 -George On Fri, Feb 11, 2011 at 7:39 AM, Andre Przywara wr= ote: > Juergen Gross wrote: >> >> On 02/10/11 15:18, Andre Przywara wrote: >>> >>> Andre Przywara wrote: >>>> >>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>> >>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>> >>>>>> Andre, George, >>>>>> >>>>>> >>>>>> What seems to be interesting: I think the problem did always occur >>>>>> when >>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>> >>>>>> I think my previous assumption regarding the master_ticker was not >>>>>> too bad. >>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>> active >>>>>> before the scheduler is really initialized properly. This could >>>>>> happen, if >>>>>> enough time is spent between alloc_pdata for the cpu to be moved and >>>>>> the >>>>>> critical section in schedule_cpu_switch(). >>>>>> >>>>>> The solution should be to activate the timers only if the scheduler = is >>>>>> ready for them. >>>>>> >>>>>> George, do you think the master_ticker should be stopped in >>>>>> suspend_ticker >>>>>> as well? I still see potential problems for entering deep C-States. >>>>>> I think >>>>>> I'll prepare a patch which will keep the master_ticker active for th= e >>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>> >>>>> Okay, here is a patch for this. It ran on my 4-core machine without a= ny >>>>> problems. >>>>> Andre, could you give it a try? >>>> >>>> Did, but unfortunately it crashed as always. Tried twice and made sure >>>> I booted the right kernel. Sorry. >>>> The idea with the race between the timer and the state changing >>>> sounded very appealing, actually that was suspicious to me from the >>>> beginning. >>>> >>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>> to see in which situation we are when the bug triggers. >>> >>> OK, here is a first try of this, the patch iterates over all CPU pools >>> and outputs some data if the BUG_ON >>> ((sdom->weight * sdom->active_vcpu_count) > weight_left) condition >>> triggers: >>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f >>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>> (XEN) Xen BUG at sched_credit.c:1010 >>> .... >>> The masks look proper (6 cores per node), the bug triggers when the >>> first CPU is about to be(?) inserted. >> >> Sure? I'm missing the cpu with mask 2000. >> I'll try to reproduce the problem on a larger machine here (24 cores, 4 >> numa >> nodes). >> Andre, can you give me your xen boot parameters? Which xen changeset are >> you >> running, and do you have any additional patches in use? > > The grub lines: > kernel (hd1,0)/boot/xen-22858_debug_04.gz console=3Dcom1,vga com1=3D11520= 0 > module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=3Dtty0 > console=3DttyS0,115200 ro root=3D/dev/sdb1 xencons=3Dhvc0 > > All of my experiments are use c/s 22858 as a base. > If you use a AMD Magny-Cours box for your experiments (socket C32 or G34)= , > you should add the following patch (removing the line) > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) > =A0 =A0 =A0 =A0 __clear_bit(X86_FEATURE_SKINIT % 32, &c); > =A0 =A0 =A0 =A0 __clear_bit(X86_FEATURE_WDT % 32, &c); > =A0 =A0 =A0 =A0 __clear_bit(X86_FEATURE_LWP % 32, &c); > - =A0 =A0 =A0 =A0__clear_bit(X86_FEATURE_NODEID_MSR % 32, &c); > =A0 =A0 =A0 =A0 __clear_bit(X86_FEATURE_TOPOEXT % 32, &c); > =A0 =A0 =A0 =A0 break; > =A0 =A0 case 5: /* MONITOR/MWAIT */ > > This is not necessary (in fact that reverts my patch c/s 22815), but rais= es > the probability to trigger the bug, probably because it increases the > pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try t= o > create a guest with many VCPUs and squeeze it into a small CPU-pool. > > Good luck ;-) > Andre. > > -- > Andre Przywara > AMD-OSRC (Dresden) > Tel: x29712 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > --0016e6da7ae3e5a5e8049c41c690 Content-Type: application/x-sh; name="cpupool-test.sh" Content-Disposition: attachment; filename="cpupool-test.sh" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gk5oqgju0 IyEvYmluL2Jhc2gKCiMgRGVmYXVsdHMKbWF4Y3B1cz03CmxlZnQ9MTAKZHJ5cnVuPWZhbHNlCnZl cmJvc2U9dHJ1ZQoKYXJncz0oJEApCgojIFJlYWQgYXJncwp3aGlsZSBbWyAtbiAiJHthcmdzW0Bd fSIgXV0gOyBkbwogICAgYT0ke2FyZ3NbMF19OyAgICAgICAjIFVzZSBmaXJzdCBlbGVtZW50CiAg ICBpZiBbWyBgZXhwciBtYXRjaCAke2F9ICcuKj0nYCAhPSAiMCIgXV0gOyB0aGVuCglhcmdzPSgk e2FyZ3NbQF06MX0pICMgRWxlbWVudCBwcm9jZXNzZWQsIHBvcCBpdCBvZmYKCSNlY2hvIEV2YWx1 YXRpbmcgIiRhIgoJZXZhbCAiJGEiCiAgICBlbHNlCglicmVhawogICAgZmkKZG9uZQoKCgpldmFs ICJwMD0oezAuLiRtYXhjcHVzfSkiCnAxPSgpCnU9KCkKCmVjaG8gIlJ1bm5pbmcgdGVzdCIKd2hp bGUgW1sgJGxlZnQgLWd0IDAgXV0gOyBkbwogICAgJHZlcmJvc2UgJiYgZWNobyAiTGVmdDogJGxl ZnQiCiAgICBzb3VyY2U9JChzaHVmIC1uIDEgLWUgcDAgcDEgdSkKICAgICR2ZXJib3NlICYmIGVj aG8gIiBzb3VyY2U6ICRzb3VyY2UiCgogICAgIyBDb21tb246IENob29zZSBhIHZpY3RpbQogICAg ZXZhbCAiY291bnQ9XCR7IyRzb3VyY2VbQF19IgogICAgaWYgW1sgJGNvdW50ID09ICIwIiBdXSA7 IHRoZW4KCSR2ZXJib3NlICYmIGVjaG8gIiAgc291cmNlIGVtcHR5OyBubyBhY3Rpb24gdGFrZW4i Cgljb250aW51ZQogICAgZmkKICAgIGNvdW50PSQoKCRjb3VudC0xKSkKICAgIGV2YWwgImluZGV4 PVwkKHNodWYgLW4gMSAtZSB7MC4uJGNvdW50fSkiCiAgICAkdmVyYm9zZSAmJiBlY2hvICIgaW5k ZXg6ICRpbmRleCBvZiAkY291bnQiCiAgICBldmFsICJjcHU9XCR7JHNvdXJjZVskaW5kZXhdfSIK ICAgICR2ZXJib3NlICYmIGVjaG8gIiBjcHU6ICRjcHUiCiAgICAKICAgIGNhc2UgJHNvdXJjZSBp bgoJcCopCgkgICAgZGVzdD0idSIKCgkgICAgIyBHZW5lcmF0ZSB0aGUgY29tbWFuZAoJICAgIGNt ZD0ieGwgY3B1cG9vbC1jcHUtcmVtb3ZlICRzb3VyY2UgJGNwdSIKCSAgICA7OwoKCXUpCgkgICAg IyBDaG9vc2UgYSBkZXN0aW5hdGlvbgoJICAgIGRlc3Q9JChzaHVmIC1uIDEgLWUgcDAgcDEpCgkg ICAgJHZlcmJvc2UgJiYgZWNobyAiIGRlc3Q6ICRkZXN0IgoJICAgIAoJICAgICMgR2VuZXJhdGUg dGhlIGNvbW1hbmQKCSAgICBjbWQ9InhsIGNwdXBvb2wtY3B1LWFkZCAkZGVzdCAkY3B1IgoJICAg IDs7CiAgICBlc2FjCgogICAgIyBUcnkgdGhlIGNvbW1hbmQKICAgIHN1Y2Nlc3M9dHJ1ZQogICAg JHZlcmJvc2UgJiYgZWNobyAiIGNtZDogJGNtZCIKICAgIGlmICEgJGRyeXJ1biA7IHRoZW4KCWlm ICAhICRjbWQgOyB0aGVuCgkgICAgc3VjY2Vzcz1mYWxzZQoJICAgICMgVGhpcyBpcyBleHBlY3Rl ZCBpZiB3ZSdyZSByZW1vdmluZyB0aGUgbGFzdCBjcHUgZnJvbSBwMAoJICAgIGlmICEgW1sgJHNv dXJjZT09InAwIiAmJiAkY291bnQ9PSIwIiBdXSA7IHRoZW4KCQllY2hvICJDb21tYW5kICRjbWQg ZmFpbGVkIgoJCXhsIGNwdXBvb2wtbGlzdCAtYwoJCWZvciBpIGluIHAwIHAxIHUgOyBkbwoJCSAg ICBldmFsICJlY2hvIFwiICRpOiBcJHskaVtAXX1cIiIKCQlkb25lCgkgICAgZmkKCWZpCiAgICBm aQoKICAgICMgTW92ZSB0aGUgdmljdGltCiAgICBpZiAkc3VjY2VzcyA7IHRoZW4KCWV2YWwgInVu c2V0ICRzb3VyY2VbJGluZGV4XSIKCWV2YWwgIiR7c291cmNlfT0oXCR7JHNvdXJjZVtAXX0pIgoJ ZXZhbCAiJHtkZXN0fT0oXCR7JGRlc3RbQF19ICRjcHUpIgoJJHZlcmJvc2UgJiYgZXZhbCAiZWNo byBcIiAkc291cmNlOiBcJHskc291cmNlW0BdfVwiIgoJJHZlcmJvc2UgJiYgZXZhbCAiZWNobyBc IiAkZGVzdDogXCR7JGRlc3RbQF19XCIiCgoJbGVmdD0kKCgkbGVmdC0xKSkKICAgIGZpCmRvbmUK CiRkcnlydW4gfHwgeGwgY3B1cG9vbC1saXN0IC1jCgplY2hvICJSZXNldHRpbmciCmZvciBzb3Vy Y2UgaW4gcDEgdTsgZG8KICAgIGV2YWwgImNvdW50PVwkeyMkc291cmNlW0BdfSIKICAgIGlmIFtb ICRjb3VudCA9PSAiMCIgXV0gOyB0aGVuCgkkdmVyYm9zZSAmJiBlY2hvICIgIHNvdXJjZSBlbXB0 eTsgbm8gYWN0aW9uIHRha2VuIgoJY29udGludWU7CiAgICBmaQoKICAgIHdoaWxlIGV2YWwgIltb IC1uIFwiXCR7JHNvdXJjZVtAXX1cIiBdXSIgOyBkbwoJIyBEZXNpZ25hdGUgdmljdGltIAoJaW5k ZXg9MAoJZXZhbCAiY291bnQ9XCR7IyRzb3VyY2VbQF19IgoJJHZlcmJvc2UgJiYgZWNobyAiIGlu ZGV4OiAkaW5kZXggb2YgJGNvdW50IgoJZXZhbCAiY3B1PVwkeyRzb3VyY2VbJGluZGV4XX0iCgkk dmVyYm9zZSAmJiBlY2hvICIgY3B1OiAkY3B1IgoKCWNhc2UgJHNvdXJjZSBpbgoJICAgIHAqKQoJ CWRlc3Q9InUiCgkJCgkgICAgIyBHZW5lcmF0ZSB0aGUgY29tbWFuZAoJCWNtZD0ieGwgY3B1cG9v bC1jcHUtcmVtb3ZlICRzb3VyY2UgJGNwdSIKCQk7OwoJICAgIAoJICAgIHUpCgkgICAgIyBDaG9v c2UgYSBkZXN0aW5hdGlvbgoJCWRlc3Q9InAwIgoJCSR2ZXJib3NlICYmIGVjaG8gIiBkZXN0OiAk ZGVzdCIKCQkKCSAgICAjIEdlbmVyYXRlIHRoZSBjb21tYW5kCgkJY21kPSJ4bCBjcHVwb29sLWNw dS1hZGQgJGRlc3QgJGNwdSIKCQk7OwoJZXNhYwoKCWV2YWwgInVuc2V0ICRzb3VyY2VbJGluZGV4 XSIKCWV2YWwgIiR7c291cmNlfT0oXCR7JHNvdXJjZVtAXX0pIgoJZXZhbCAiJHtkZXN0fT0oXCR7 JGRlc3RbQF19ICRjcHUpIgoJJHZlcmJvc2UgJiYgZXZhbCBlY2hvICIgJHNvdXJjZTogXCR7JHNv dXJjZVtAXX0iCgkkdmVyYm9zZSAmJiBldmFsIGVjaG8gIiAkZGVzdDogXCR7JGRlc3RbQF19IgoK CSR2ZXJib3NlICYmIGVjaG8gIiBjbWQ6ICRjbWQiCglpZiAhICRkcnlydW4gOyB0aGVuCgkgICAg aWYgICEgJGNtZCA7IHRoZW4KCQllY2hvICJDb21tYW5kICRjbWQgZmFpbGVkIgoJCXhsIGNwdXBv b2wtbGlzdCAtYwoJCWZvciBpIGluIHAwIHAxIHUgOyBkbwoJCSAgICBldmFsICJlY2hvIFwiICRp OiBcJHskaVtAXX1cIiIKCQlkb25lCgkgICAgZmkKCWZpCiAgICBkb25lCmRvbmUKCiRkcnlydW4g fHwgeGwgY3B1cG9vbC1saXN0IC1jCg== --0016e6da7ae3e5a5e8049c41c690 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --0016e6da7ae3e5a5e8049c41c690--