From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Tue, 15 Feb 2011 08:22:40 +0100 Message-ID: <4D5A29C0.4050702@ts.fujitsu.com> References: <4D41FD3A.5090506@amd.com> <201102021539.06664.stephan.diestelhorst@amd.com> <4D4974D1.1080503@ts.fujitsu.com> <201102021701.05665.stephan.diestelhorst@amd.com> <4D4A43B7.5040707@ts.fujitsu.com> <4D4A72D8.3020502@ts.fujitsu.com> <4D4C08B6.30600@amd.com> <4D4FE7E2.9070605@amd.com> <4D4FF452.6060508@ts.fujitsu.com> <4D50D80F.9000007@ts.fujitsu.com> <4D517051.10402@amd.com> <4D529BD9.5050200@amd.com> <4D52A2CD.9090507@ts.fujitsu.com> <4D5388DF.8040900@ts.fujitsu.com> <4D53AF27.7030909@amd.com> <4D53F3BC.4070807@amd.com> <4D54D478.9000402@ts.fujitsu.com> <4D54E79E.3000800@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: George Dunlap Cc: Andre Przywara , "xen-devel@lists.xensource.com" , "Diestelhorst, Stephan" List-Id: xen-devel@lists.xenproject.org On 02/14/11 18:57, George Dunlap wrote: > The good news is, I've managed to reproduce this on my local test > hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the > attached script. It's time to go home now, but I should be able to > dig something up tomorrow. > > To use the script: > * Rename cpupool0 to "p0", and create an empty second pool, "p1" > * You can modify elements by adding "arg=val" as arguments. > * Arguments are: > + dryrun={true,false} Do the work, but don't actually execute any xl > arguments. Default false. > + left: Number commands to execute. Default 10. > + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is 8 cpus). > + verbose={true,false} Print what you're doing. Default is true. > > The script sometimes attempts to remove the last cpu from cpupool0; in > this case, libxl will print an error. If the script gets an error > under that condition, it will ignore it; under any other condition, it > will print diagnostic information. > > What finally crashed it for me was this command: > # ./cpupool-test.sh verbose=false left=1000 Nice! With your script I finally managed to get the error, too. On my box (2 sockets a 6 cores) I had to use ./cpupool-test.sh verbose=false left=10000 maxcpus=11 to trigger it. Looking for more data now... Juergen > > -George > > On Fri, Feb 11, 2011 at 7:39 AM, Andre Przywara wrote: >> Juergen Gross wrote: >>> >>> On 02/10/11 15:18, Andre Przywara wrote: >>>> >>>> Andre Przywara wrote: >>>>> >>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>> >>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>> >>>>>>> Andre, George, >>>>>>> >>>>>>> >>>>>>> What seems to be interesting: I think the problem did always occur >>>>>>> when >>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>> >>>>>>> I think my previous assumption regarding the master_ticker was not >>>>>>> too bad. >>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>> active >>>>>>> before the scheduler is really initialized properly. This could >>>>>>> happen, if >>>>>>> enough time is spent between alloc_pdata for the cpu to be moved and >>>>>>> the >>>>>>> critical section in schedule_cpu_switch(). >>>>>>> >>>>>>> The solution should be to activate the timers only if the scheduler is >>>>>>> ready for them. >>>>>>> >>>>>>> George, do you think the master_ticker should be stopped in >>>>>>> suspend_ticker >>>>>>> as well? I still see potential problems for entering deep C-States. >>>>>>> I think >>>>>>> I'll prepare a patch which will keep the master_ticker active for the >>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>> >>>>>> Okay, here is a patch for this. It ran on my 4-core machine without any >>>>>> problems. >>>>>> Andre, could you give it a try? >>>>> >>>>> Did, but unfortunately it crashed as always. Tried twice and made sure >>>>> I booted the right kernel. Sorry. >>>>> The idea with the race between the timer and the state changing >>>>> sounded very appealing, actually that was suspicious to me from the >>>>> beginning. >>>>> >>>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>>> to see in which situation we are when the bug triggers. >>>> >>>> OK, here is a first try of this, the patch iterates over all CPU pools >>>> and outputs some data if the BUG_ON >>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>> triggers: >>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f >>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>> (XEN) Xen BUG at sched_credit.c:1010 >>>> .... >>>> The masks look proper (6 cores per node), the bug triggers when the >>>> first CPU is about to be(?) inserted. >>> >>> Sure? I'm missing the cpu with mask 2000. >>> I'll try to reproduce the problem on a larger machine here (24 cores, 4 >>> numa >>> nodes). >>> Andre, can you give me your xen boot parameters? Which xen changeset are >>> you >>> running, and do you have any additional patches in use? >> >> The grub lines: >> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 >> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >> >> All of my experiments are use c/s 22858 as a base. >> If you use a AMD Magny-Cours box for your experiments (socket C32 or G34), >> you should add the following patch (removing the line) >> --- a/xen/arch/x86/traps.c >> +++ b/xen/arch/x86/traps.c >> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >> __clear_bit(X86_FEATURE_WDT % 32,&c); >> __clear_bit(X86_FEATURE_LWP % 32,&c); >> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >> break; >> case 5: /* MONITOR/MWAIT */ >> >> This is not necessary (in fact that reverts my patch c/s 22815), but raises >> the probability to trigger the bug, probably because it increases the >> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try to >> create a guest with many VCPUs and squeeze it into a small CPU-pool. >> >> Good luck ;-) >> Andre. >> >> -- >> Andre Przywara >> AMD-OSRC (Dresden) >> Tel: x29712 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html