From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Wed, 02 Feb 2011 11:05:48 +0100 Message-ID: <4D492C7C.5040005@ts.fujitsu.com> References: <4D41FD3A.5090506@amd.com> <4D426673.7020200@ts.fujitsu.com> <4D42A35D.3050507@amd.com> <4D42AC00.8050109@ts.fujitsu.com> <4D42C153.5050104@amd.com> <4D465F0D.4010408@ts.fujitsu.com> <4D46CE4F.3090003@amd.com> <4D483599.1060807@amd.com> <4D48F954.5000103@ts.fujitsu.com> <4D491AB7.40204@ts.fujitsu.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------010502040507030808070302" Return-path: In-Reply-To: <4D491AB7.40204@ts.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andre Przywara Cc: George Dunlap , Ian Jackson , "xen-devel@lists.xensource.com" , Keir Fraser , Stephan Diestelhorst List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------010502040507030808070302 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Andre, could you try the attached patch? It should verify if your problems are due to the master ticker kicking in at a time when the cpu is already gone from the cpupool. I'm not sure if the patch is complete - Disabling the master ticker in csched_tick_suspend might lead to problems with cstates. The functionality is different, at least. George, do you think this is correct? Juergen On 02/02/11 09:49, Juergen Gross wrote: > On 02/02/11 07:27, Juergen Gross wrote: >> On 02/01/11 17:32, Andre Przywara wrote: >>> Hi folks, >>> >>> I asked Stephan Diestelhorst for help and after I convinced him that >>> removing credit and making SEDF the default again is not an option he >>> worked together with me on that ;-) Many thanks for that! >>> We haven't come to a final solution but could gather some debug data. >>> I will simply dump some data here, maybe somebody has got a clue. We >>> will work further on this tomorrow. >>> >>> First I replaced the BUG_ON with some printks to get some insight: >>> (XEN) sdom->active_vcpu_count: 18 >>> (XEN) sdom->weight: 256 >>> (XEN) weight_left: 4096, weight_total: 4096 >>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>> (XEN) Xen BUG at sched_credit.c:591 >>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >>> >>> So that one shows that the number of VCPUs is not up-to-date with the >>> computed weight sum, we have seen a difference of one or two VCPUs (in >>> this case here the weight has been computed from 16 VCPUs). Also it >>> shows that the assertion kicks in in the first iteration of the loop, >>> where weight_left and weight_total are still equal. >>> >>> So I additionally instrumented alloc_pdata and free_pdata, the >>> unprefixed lines come from a shell script mimicking the functionality of >>> cpupool-numa-split. >>> ------------ >>> Removing CPUs from Pool 0 >>> Creating new pool >>> Using config file "cpupool.test" >>> cpupool name: Pool-node6 >>> scheduler: credit >>> number of cpus: 1 >>> (XEN) adding CPU 36, now 1 CPUs >>> (XEN) removing CPU 36, remaining: 17 >>> Populating new pool >>> (XEN) sdom->active_vcpu_count: 9 >>> (XEN) sdom->weight: 256 >>> (XEN) weight_left: 2048, weight_total: 2048 >>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>> (XEN) adding CPU 37, now 2 CPUs >>> (XEN) removing CPU 37, remaining: 16 >>> (XEN) adding CPU 38, now 3 CPUs >>> (XEN) removing CPU 38, remaining: 15 >>> (XEN) adding CPU 39, now 4 CPUs >>> (XEN) removing CPU 39, remaining: 14 >>> (XEN) adding CPU 40, now 5 CPUs >>> (XEN) removing CPU 40, remaining: 13 >>> (XEN) sdom->active_vcpu_count: 17 >>> (XEN) sdom->weight: 256 >>> (XEN) weight_left: 4096, weight_total: 4096 >>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>> (XEN) adding CPU 41, now 6 CPUs >>> (XEN) removing CPU 41, remaining: 12 >>> ... >>> Two thing startled me: >>> 1) There is quite some between the "Removing CPUs" message from the >>> script and the actual HV printk showing it's done, why is that not >>> synchronous? >> >> Removing cpus from Pool-0 requires no switching of the scheduler, so you >> see no calls of alloc/free_pdata here. >> >> > Looking at the code it shows that >>> __csched_vcpu_acct_start() is eventually triggered by a timer, shouldn't >>> that be triggered synchronously by add/removal events? >> >> The vcpus are not moved explicitly, they are migrated by the normal >> scheduler mechanisms, same as for vcpu-pin. >> >>> 2) It clearly shows that each CPU gets added to the new pool _before_ it >>> gets removed from the old one (Pool-0), isn't that violating the "only >>> one pool per CPU" rule? Even it that is fine for a short period of time, >>> maybe the timer kicks in in this very moment resulting in violated >>> invariants? >> >> The sequence you are seeing seems to be okay. The alloc_pdata for the >> new pool >> is called before the free_pdata for the old pool. >> >> And the timer is not relevant, as only the idle vcpu should be running >> on the >> moving cpu and the accounting stuff is never called during idle. > > Uhh, this could be wrong! > The normal ticker doesn't call accounting in idle and it is stopped during > cpu move. The master_ticker is handled wrong, perhaps. I'll check this and > prepare a patch if necessary. > > > Juergen > -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html --------------010502040507030808070302 Content-Type: text/plain; name="patch.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="patch.txt" diff -r 76e1f7018b01 xen/common/sched_credit.c --- a/xen/common/sched_credit.c Mon Jan 31 08:10:00 2011 +0100 +++ b/xen/common/sched_credit.c Wed Feb 02 10:59:44 2011 +0100 @@ -50,6 +50,8 @@ (CSCHED_CREDITS_PER_MSEC * CSCHED_MSECS_PER_TSLICE) #define CSCHED_CREDITS_PER_ACCT \ (CSCHED_CREDITS_PER_MSEC * CSCHED_MSECS_PER_TICK * CSCHED_TICKS_PER_ACCT) +#define CSCHED_ACCT_TSLICE \ + (MILLISECS(CSCHED_MSECS_PER_TICK) * CSCHED_TICKS_PER_ACCT) /* @@ -320,6 +322,7 @@ csched_free_pdata(const struct scheduler struct csched_private *prv = CSCHED_PRIV(ops); struct csched_pcpu *spc = pcpu; unsigned long flags; + uint64_t now = NOW(); if ( spc == NULL ) return; @@ -334,6 +337,8 @@ csched_free_pdata(const struct scheduler { prv->master = first_cpu(prv->cpus); migrate_timer(&prv->master_ticker, prv->master); + set_timer(&prv->master_ticker, now + CSCHED_ACCT_TSLICE + - now % CSCHED_ACCT_TSLICE); } kill_timer(&spc->ticker); if ( prv->ncpus == 0 ) @@ -367,8 +372,7 @@ csched_alloc_pdata(const struct schedule { prv->master = cpu; init_timer(&prv->master_ticker, csched_acct, prv, cpu); - set_timer(&prv->master_ticker, NOW() + - MILLISECS(CSCHED_MSECS_PER_TICK) * CSCHED_TICKS_PER_ACCT); + set_timer(&prv->master_ticker, NOW() + CSCHED_ACCT_TSLICE); } init_timer(&spc->ticker, csched_tick, (void *)(unsigned long)cpu, cpu); @@ -1138,8 +1142,7 @@ csched_acct(void* dummy) prv->runq_sort++; out: - set_timer( &prv->master_ticker, NOW() + - MILLISECS(CSCHED_MSECS_PER_TICK) * CSCHED_TICKS_PER_ACCT ); + set_timer( &prv->master_ticker, NOW() + CSCHED_ACCT_TSLICE ); } static void @@ -1531,22 +1534,31 @@ csched_deinit(const struct scheduler *op static void csched_tick_suspend(const struct scheduler *ops, unsigned int cpu) { + struct csched_private *prv; struct csched_pcpu *spc; + prv = CSCHED_PRIV(ops); spc = CSCHED_PCPU(cpu); stop_timer(&spc->ticker); + if ( prv->master == cpu ) + stop_timer(&prv->master_ticker); } static void csched_tick_resume(const struct scheduler *ops, unsigned int cpu) { + struct csched_private *prv; struct csched_pcpu *spc; uint64_t now = NOW(); + prv = CSCHED_PRIV(ops); spc = CSCHED_PCPU(cpu); set_timer(&spc->ticker, now + MILLISECS(CSCHED_MSECS_PER_TICK) - now % MILLISECS(CSCHED_MSECS_PER_TICK) ); + if ( prv->master == cpu ) + set_timer(&prv->master_ticker, now + CSCHED_ACCT_TSLICE + - now % CSCHED_ACCT_TSLICE); } static struct csched_private _csched_priv; --------------010502040507030808070302 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --------------010502040507030808070302--