From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Tue, 15 Feb 2011 08:22:40 +0100
Message-ID: <4D5A29C0.4050702@ts.fujitsu.com>
References: <4D41FD3A.5090506@amd.com>	<201102021539.06664.stephan.diestelhorst@amd.com>	<4D4974D1.1080503@ts.fujitsu.com>	<201102021701.05665.stephan.diestelhorst@amd.com>	<4D4A43B7.5040707@ts.fujitsu.com>
	<4D4A72D8.3020502@ts.fujitsu.com>	<4D4C08B6.30600@amd.com>
	<4D4FE7E2.9070605@amd.com>	<4D4FF452.6060508@ts.fujitsu.com>	<AANLkTinoRUQC_suVYFM9-x3D00KvYofq3R=XkCQUj6RP@mail.gmail.com>	<4D50D80F.9000007@ts.fujitsu.com>	<AANLkTinKJUAXhiXpKui_XX8XCD6T5fmzNARwHE6Fjafv@mail.gmail.com>	<AANLkTinP0z9GynF1RFd8RwzWuqvxYdb+UBE+7xKpX6D4@mail.gmail.com>	<4D517051.10402@amd.com>	<AANLkTi=MiELBnPFvb6-jzVth+T7aKxP5JMFhVh3Crdmo@mail.gmail.com>	<AANLkTikgGNz=imS1xRVVjntY5P=+MuT_Qsb=-h3QHajY@mail.gmail.com>	<4D529BD9.5050200@amd.com>
	<4D52A2CD.9090507@ts.fujitsu.com>	<4D5388DF.8040900@ts.fujitsu.com>
	<4D53AF27.7030909@amd.com>	<4D53F3BC.4070807@amd.com>
	<4D54D478.9000402@ts.fujitsu.com>	<4D54E79E.3000800@amd.com>
	<AANLkTimkRAHtM4CoTskQ7w6B-8Pis4B2+k7=frxM3oyW@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <AANLkTimkRAHtM4CoTskQ7w6B-8Pis4B2+k7=frxM3oyW@mail.gmail.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
List-Id: xen-devel@lists.xenproject.org

On 02/14/11 18:57, George Dunlap wrote:
> The good news is, I've managed to reproduce this on my local test
> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
> attached script.  It's time to go home now, but I should be able to
> dig something up tomorrow.
>
> To use the script:
> * Rename cpupool0 to "p0", and create an empty second pool, "p1"
> * You can modify elements by adding "arg=val" as arguments.
> * Arguments are:
>   + dryrun={true,false} Do the work, but don't actually execute any xl
> arguments.  Default false.
>   + left: Number commands to execute.  Default 10.
>   + maxcpus: highest numerical value for a cpu.  Default 7 (i.e., 0-7 is 8 cpus).
>   + verbose={true,false} Print what you're doing.  Default is true.
>
> The script sometimes attempts to remove the last cpu from cpupool0; in
> this case, libxl will print an error.  If the script gets an error
> under that condition, it will ignore it; under any other condition, it
> will print diagnostic information.
>
> What finally crashed it for me was this command:
> # ./cpupool-test.sh verbose=false left=1000

Nice!
With your script I finally managed to get the error, too. On my box (2 sockets
a 6 cores) I had to use

./cpupool-test.sh verbose=false left=10000 maxcpus=11

to trigger it.
Looking for more data now...


Juergen

>
>   -George
>
> On Fri, Feb 11, 2011 at 7:39 AM, Andre Przywara<andre.przywara@amd.com>  wrote:
>> Juergen Gross wrote:
>>>
>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>
>>>> Andre Przywara wrote:
>>>>>
>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>
>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>
>>>>>>> Andre, George,
>>>>>>>
>>>>>>>
>>>>>>> What seems to be interesting: I think the problem did always occur
>>>>>>> when
>>>>>>> a new cpupool was created and the first cpu was moved to it.
>>>>>>>
>>>>>>> I think my previous assumption regarding the master_ticker was not
>>>>>>> too bad.
>>>>>>> I think somehow the master_ticker of the new cpupool is becoming
>>>>>>> active
>>>>>>> before the scheduler is really initialized properly. This could
>>>>>>> happen, if
>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved and
>>>>>>> the
>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>
>>>>>>> The solution should be to activate the timers only if the scheduler is
>>>>>>> ready for them.
>>>>>>>
>>>>>>> George, do you think the master_ticker should be stopped in
>>>>>>> suspend_ticker
>>>>>>> as well? I still see potential problems for entering deep C-States.
>>>>>>> I think
>>>>>>> I'll prepare a patch which will keep the master_ticker active for the
>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case.
>>>>>>
>>>>>> Okay, here is a patch for this. It ran on my 4-core machine without any
>>>>>> problems.
>>>>>> Andre, could you give it a try?
>>>>>
>>>>> Did, but unfortunately it crashed as always. Tried twice and made sure
>>>>> I booted the right kernel. Sorry.
>>>>> The idea with the race between the timer and the state changing
>>>>> sounded very appealing, actually that was suspicious to me from the
>>>>> beginning.
>>>>>
>>>>> I will add some code to dump the state of all cpupools to the BUG_ON
>>>>> to see in which situation we are when the bug triggers.
>>>>
>>>> OK, here is a first try of this, the patch iterates over all CPU pools
>>>> and outputs some data if the BUG_ON
>>>> ((sdom->weight * sdom->active_vcpu_count)>  weight_left) condition
>>>> triggers:
>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f
>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>> ....
>>>> The masks look proper (6 cores per node), the bug triggers when the
>>>> first CPU is about to be(?) inserted.
>>>
>>> Sure? I'm missing the cpu with mask 2000.
>>> I'll try to reproduce the problem on a larger machine here (24 cores, 4
>>> numa
>>> nodes).
>>> Andre, can you give me your xen boot parameters? Which xen changeset are
>>> you
>>> running, and do you have any additional patches in use?
>>
>> The grub lines:
>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>
>> All of my experiments are use c/s 22858 as a base.
>> If you use a AMD Magny-Cours box for your experiments (socket C32 or G34),
>> you should add the following patch (removing the line)
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>>          __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>          __clear_bit(X86_FEATURE_WDT % 32,&c);
>>          __clear_bit(X86_FEATURE_LWP % 32,&c);
>> -        __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>          __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>          break;
>>      case 5: /* MONITOR/MWAIT */
>>
>> This is not necessary (in fact that reverts my patch c/s 22815), but raises
>> the probability to trigger the bug, probably because it increases the
>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try to
>> create a guest with many VCPUs and squeeze it into a small CPU-pool.
>>
>> Good luck ;-)
>> Andre.
>>
>> --
>> Andre Przywara
>> AMD-OSRC (Dresden)
>> Tel: x29712
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel


-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html