All of lore.kernel.org
 help / color / mirror / Atom feed
* Oops in rapl_cpu_prepare()
@ 2016-10-20 20:27 Charles (Chas) Williams
  2016-10-21 10:56 ` [PREEMPT-RT] " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Charles (Chas) Williams @ 2016-10-20 20:27 UTC (permalink / raw)
  To: linux-kernel

Recent 4.8 kernels have been oopsing when running under VMWare:

[    2.270203] BUG: unable to handle kernel NULL pointer dereference at 0000000000000408
[    2.270325] IP: [<ffffffff81012bb9>] rapl_cpu_online+0x59/0x70
[    2.270448] PGD 0
[    2.270570] Oops: 0002 [#1] SMP
[    2.270693] Modules linked in:
[    2.270815] CPU: 2 PID: 21 Comm: cpuhp/2 Not tainted 4.8.2-1-amd64-vyatta #1
[    2.270938] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
[    2.271060] task: ffff8802361fc2c0 task.stack: ffff880236208000
[    2.271183] RIP: 0010:[<ffffffff81012bb9>]  [<ffffffff81012bb9>] rapl_cpu_online+0x59/0x70
[    2.271306] RSP: 0000:ffff88023620be68  EFLAGS: 00010246
[    2.271428] RAX: 0000000000000004 RBX: ffff88023fd0d940 RCX: 0000000000000000
[    2.271551] RDX: 0000000000000040 RSI: 0000000000000004 RDI: 0000000000000004
[    2.271673] RBP: 0000000000000002 R08: fffffffffffffffc R09: 0000000000000000
[    2.271796] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000400
[    2.271918] R13: ffff8802361fc2c0 R14: ffff8802361fc2c0 R15: ffff8802361fc2c0
[    2.272041] FS:  0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
[    2.272163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.272286] CR2: 0000000000000408 CR3: 0000000001a06000 CR4: 00000000000406e0
[    2.272408] Stack:
[    2.272531]  ffff88023fd0d940 0000000000000002 ffffffff81a38240 ffffffff81061231
[    2.272654]  ffff8802361fc2c0 ffff880237002180 ffffffff8107ddcf 0000000000000000
[    2.272776]  ffff8802361a5a80 ffff880237002180 ffffffff8107dcb0 ffffffff81a6a380
[    2.272899] Call Trace:
[    2.273021]  [<ffffffff81061231>] ? cpuhp_thread_fun+0x31/0x100
[    2.273144]  [<ffffffff8107ddcf>] ? smpboot_thread_fn+0x11f/0x180
[    2.273266]  [<ffffffff8107dcb0>] ? sort_range+0x20/0x20
[    2.273389]  [<ffffffff8107b05a>] ? kthread+0xca/0xe0
[    2.273511]  [<ffffffff8157677f>] ? ret_from_fork+0x1f/0x40
[    2.273634]  [<ffffffff8107af90>] ? kthread_park+0x50/0x50
[    2.273757] Code: 00 00 48 83 c0 22 4c 8b 24 c1 48 c7 c0 30 a1 00 00 48 8b 14 10 e8 a8 61 26 00 3b 05 b6 56 ae 00 7c 0e f0 48 0f a
[    2.279445] RIP  [<ffffffff81012bb9>] rapl_cpu_online+0x59/0x70
[    2.279568]  RSP <ffff88023620be68>
[    2.279690] CR2: 0000000000000408
[    2.279813] ---[ end trace c95da920748eb432 ]---


gdb tells me:

(gdb) info line *(rapl_cpu_online+0x59)
Line 595 of "arch/x86/events/intel/rapl.c" starts at address 0xffffffff81012bb9 <rapl_cpu_online+89>
    and ends at 0xffffffff81012bbe <rapl_cpu_online+94>.

Which is:


         target = cpumask_any_and(&rapl_cpu_mask, topology_core_cpumask(cpu));
         if (target < nr_cpu_ids)
                 return 0;

         cpumask_set_cpu(cpu, &rapl_cpu_mask);
         pmu->cpu = cpu;		<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
         return 0;

This code was recently changed by commit 8b5b773d6245138c
"perf/x86/intel/rapl: Convert to hotplug state machine" and it
appears that the setup is done as a callback:

         /*
          * Install callbacks. Core will call them for each online cpu.
          */

         ret = cpuhp_setup_state(CPUHP_PERF_X86_RAPL_PREP, "PERF_X86_RAPL_PREP",
                                 rapl_cpu_prepare, NULL);
         if (ret)
                 goto out;

         ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE,
                                 "AP_PERF_X86_RAPL_ONLINE",
                                 rapl_cpu_online, rapl_cpu_offline);

Is there a particular order guaranteed by the callbacks?  Will
rapl_cpu_prepare() always happen before online/offline?  Additionally,
rapl_cpu_prepare() can fail to allocate pmu,

	static int rapl_cpu_prepare(unsigned int cpu)
	{
		struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);

		if (pmu)
			return 0;

		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
		if (!pmu)
			return -ENOMEM;

But rapl_cpu_online() would have no idea about this.  What should be
done in this case?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-20 20:27 Oops in rapl_cpu_prepare() Charles (Chas) Williams
@ 2016-10-21 10:56 ` Sebastian Andrzej Siewior
  2016-10-21 21:03   ` Charles (Chas) Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-10-21 10:56 UTC (permalink / raw)
  To: Charles (Chas) Williams; +Cc: linux-kernel, rt

On 2016-10-20 16:27:55 [-0400], Charles (Chas) Williams wrote:
> Recent 4.8 kernels have been oopsing when running under VMWare:

can you reproduce this on bare metal?

> [    2.270203] BUG: unable to handle kernel NULL pointer dereference at 0000000000000408
> [    2.270325] IP: [<ffffffff81012bb9>] rapl_cpu_online+0x59/0x70
> 
> gdb tells me:
> 
> (gdb) info line *(rapl_cpu_online+0x59)
> Line 595 of "arch/x86/events/intel/rapl.c" starts at address 0xffffffff81012bb9 <rapl_cpu_online+89>
>    and ends at 0xffffffff81012bbe <rapl_cpu_online+94>.
> 
> Which is:
> 
> 
>         target = cpumask_any_and(&rapl_cpu_mask, topology_core_cpumask(cpu));
>         if (target < nr_cpu_ids)
>                 return 0;
> 
>         cpumask_set_cpu(cpu, &rapl_cpu_mask);
>         pmu->cpu = cpu;		<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

can you check if pmu is NULL?

>         return 0;
> Is there a particular order guaranteed by the callbacks?  Will
> rapl_cpu_prepare() always happen before online/offline?  Additionally,

yes, see include/linux/cpuhotplug.h. On CPU-up the array ids are invoked
from CPUHP_OFFLINE till CPUHP_ONLINE.

> rapl_cpu_prepare() can fail to allocate pmu,

error codes callbacks are handled.

…
> But rapl_cpu_online() would have no idea about this.  What should be
> done in this case?

If a callback (such as CPUHP_PERF_X86_RAPL_PREP) fail then we rollback
to the starting point (in case of CPU up it would be CPUHP_OFFLINE.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-21 10:56 ` [PREEMPT-RT] " Sebastian Andrzej Siewior
@ 2016-10-21 21:03   ` Charles (Chas) Williams
  2016-10-25 12:22     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Charles (Chas) Williams @ 2016-10-21 21:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, rt

On 10/21/2016 06:56 AM, Sebastian Andrzej Siewior wrote:
> On 2016-10-20 16:27:55 [-0400], Charles (Chas) Williams wrote:
>> Recent 4.8 kernels have been oopsing when running under VMWare:
>
> can you reproduce this on bare metal?

I can't get dedicated access to the specific bare metal since it is
running as a dedicated hypervisor.  I haven't seen this issue anywhere
else though with the 4.8 kernel.

>> [    2.270203] BUG: unable to handle kernel NULL pointer dereference at 0000000000000408
>> [    2.270325] IP: [<ffffffff81012bb9>] rapl_cpu_online+0x59/0x70
>
> can you check if pmu is NULL?

It's not.  The dereference at 0x408 and pmu->cpu being fairly early in
the struct seems to indicate that pmu wasn't pointing to 0 at the time
(but fairly close).  I should have noticed that earlier.

>> Is there a particular order guaranteed by the callbacks?  Will
>> rapl_cpu_prepare() always happen before online/offline?  Additionally,
>
> yes, see include/linux/cpuhotplug.h. On CPU-up the array ids are invoked
> from CPUHP_OFFLINE till CPUHP_ONLINE.

Yes, I see that now.  Thanks for the pointer!

> If a callback (such as CPUHP_PERF_X86_RAPL_PREP) fail then we rollback
> to the starting point (in case of CPU up it would be CPUHP_OFFLINE.

You'll like this, I just did a little printk debugging because it was
easier than trying to get a debugger running:

	[    3.107126] init_rapl_pmus: maxpkg 4
	[    3.107263] rapl_cpu_prepare: pmu ffff880234faa540  cpu 0  pkgid 0
	[    3.107400] rapl_cpu_prepare: pmu ffff880234faa600  cpu 1  pkgid 2
	[    3.107537] rapl_cpu_prepare: pmu ffff880234faa6c0  cpu 2  pkgid 65535
	[    3.107662] rapl_cpu_online: pmu ffff880234faa540 cpu 0 pkgid 0
	[    3.107907] rapl_cpu_online: pmu ffff880234faa600 cpu 1 pkgid 2
	[    3.108133] rapl_cpu_online: pmu ffff880234faa6c0 cpu 2 pkgid 65535
	[    3.108333] rapl_cpu_online: pmu ffff880234faa6c0 cpu 3 pkgid 65535

where pkgid is topology_logical_package_id(cpu).

I can't understand why I don't see a cpu 3 during cpu prepare, when I
see one later.  The 65535 is a -1 from topology_phys_to_logical_pkg()
getting assigned to the logical_proc_id apparently.

So this is pretty puzzling.  Since this is a guest running under VMWare, I
don't know that there is any particular CPU pinning or emulation of RAPL.

It looks there was a proposal to not run in guests:

https://lkml.org/lkml/2015/12/3/559

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-21 21:03   ` Charles (Chas) Williams
@ 2016-10-25 12:22     ` Sebastian Andrzej Siewior
  2016-10-25 12:42       ` Sebastian Andrzej Siewior
  2016-10-27 19:00       ` Charles (Chas) Williams
  0 siblings, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-10-25 12:22 UTC (permalink / raw)
  To: Charles (Chas) Williams; +Cc: linux-kernel, rt

On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote:
> I can't get dedicated access to the specific bare metal since it is
> running as a dedicated hypervisor.  I haven't seen this issue anywhere
> else though with the 4.8 kernel.

That is something :)

> > If a callback (such as CPUHP_PERF_X86_RAPL_PREP) fail then we rollback
> > to the starting point (in case of CPU up it would be CPUHP_OFFLINE.
> 
> You'll like this, I just did a little printk debugging because it was
> easier than trying to get a debugger running:
> 
> 	[    3.107126] init_rapl_pmus: maxpkg 4
there! vmware bug. It probably worked by chance.

> 	[    3.107263] rapl_cpu_prepare: pmu ffff880234faa540  cpu 0  pkgid 0
> 	[    3.107400] rapl_cpu_prepare: pmu ffff880234faa600  cpu 1  pkgid 2
> 	[    3.107537] rapl_cpu_prepare: pmu ffff880234faa6c0  cpu 2  pkgid 65535
> 	[    3.107662] rapl_cpu_online: pmu ffff880234faa540 cpu 0 pkgid 0
> 	[    3.107907] rapl_cpu_online: pmu ffff880234faa600 cpu 1 pkgid 2
> 	[    3.108133] rapl_cpu_online: pmu ffff880234faa6c0 cpu 2 pkgid 65535
> 	[    3.108333] rapl_cpu_online: pmu ffff880234faa6c0 cpu 3 pkgid 65535
> 
> where pkgid is topology_logical_package_id(cpu).
> 
> I can't understand why I don't see a cpu 3 during cpu prepare, when I
> see one later.  

because cpu 2 and 3 share the same package and if your printk is at the
bottom of the function, it will return early.

> The 65535 is a -1 from topology_phys_to_logical_pkg()
> getting assigned to the logical_proc_id apparently.

yes. The topology field is u16.

> So this is pretty puzzling.  Since this is a guest running under VMWare, I
> don't know that there is any particular CPU pinning or emulation of RAPL.

I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
and 3. That -1 comes probably from topology_update_package_map(). Could
you please send a complete boot log and try the following patch? This
one should fix your boot problem and disable RAPL if the info is
invalid.

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 0a535cea8ff3..f5d85f2853d7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void)
 {
 	int maxpkg = topology_max_packages();
 	size_t size;
+	unsigned int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (topology_logical_package_id(cpu) >= maxpkg) {
+			pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n",
+			       maxpkg, cpu, topology_logical_package_id(cpu));
+			return -EINVAL;
+		}
+	}
 
 	size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
 	rapl_pmus = kzalloc(size, GFP_KERNEL);

Sebastian

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-25 12:22     ` Sebastian Andrzej Siewior
@ 2016-10-25 12:42       ` Sebastian Andrzej Siewior
  2016-10-27 19:00       ` Charles (Chas) Williams
  1 sibling, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-10-25 12:42 UTC (permalink / raw)
  To: Charles (Chas) Williams; +Cc: linux-kernel, rt

On 2016-10-25 14:22:05 [+0200], To Charles (Chas) Williams wrote:
> > 	[    3.107263] rapl_cpu_prepare: pmu ffff880234faa540  cpu 0  pkgid 0
> > 	[    3.107400] rapl_cpu_prepare: pmu ffff880234faa600  cpu 1  pkgid 2
> > 	[    3.107537] rapl_cpu_prepare: pmu ffff880234faa6c0  cpu 2  pkgid 65535
> > 	[    3.107662] rapl_cpu_online: pmu ffff880234faa540 cpu 0 pkgid 0
> > 	[    3.107907] rapl_cpu_online: pmu ffff880234faa600 cpu 1 pkgid 2
> > 	[    3.108133] rapl_cpu_online: pmu ffff880234faa6c0 cpu 2 pkgid 65535
> > 	[    3.108333] rapl_cpu_online: pmu ffff880234faa6c0 cpu 3 pkgid 65535

One thing I forgot to ask: Could you please check if you get the same
pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
rework).

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-25 12:22     ` Sebastian Andrzej Siewior
  2016-10-25 12:42       ` Sebastian Andrzej Siewior
@ 2016-10-27 19:00       ` Charles (Chas) Williams
  2016-10-28  8:03         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 12+ messages in thread
From: Charles (Chas) Williams @ 2016-10-27 19:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, rt

On 10/25/2016 08:22 AM, Sebastian Andrzej Siewior wrote:
> On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote:
>> 	[    3.107126] init_rapl_pmus: maxpkg 4
> there! vmware bug. It probably worked by chance.

Yes, the behavior is a bit random.

> I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
> topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
> and 3. That -1 comes probably from topology_update_package_map(). Could
> you please send a complete boot log and try the following patch? This
> one should fix your boot problem and disable RAPL if the info is
> invalid.

But sometimes the topology info is correct and if I get lucky, the
package id could be valid for all the CPU's.  Given the behavior,
I have seen so far it makes me thing the RAPL isn't being emulated.
So even if I did boot onto a "valid" set of cores, would I always be
certain that I will be on those cores?

> diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
> index 0a535cea8ff3..f5d85f2853d7 100644
> --- a/arch/x86/events/intel/rapl.c
> +++ b/arch/x86/events/intel/rapl.c
> @@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void)
>  {
>  	int maxpkg = topology_max_packages();
>  	size_t size;
> +	unsigned int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (topology_logical_package_id(cpu) >= maxpkg) {
> +			pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n",
> +			       maxpkg, cpu, topology_logical_package_id(cpu));
> +			return -EINVAL;
> +		}
> +	}
>
>  	size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
>  	rapl_pmus = kzalloc(size, GFP_KERNEL);

Per your request in your next email:

>One thing I forgot to ask: Could you please check if you get the same
>pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
>rework).

Our previous kernel was 4.4, and didn't use the logical package id:

         /* check if phys_is is already covered */
         for_each_cpu(i, &rapl_cpu_mask) {
                 if (phys_id == topology_physical_package_id(i))
                         return;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-27 19:00       ` Charles (Chas) Williams
@ 2016-10-28  8:03         ` Sebastian Andrzej Siewior
  2016-11-01 10:15           ` M. Vefa Bicakci
  2016-11-02  9:16           ` Charles (Chas) Williams
  0 siblings, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-10-28  8:03 UTC (permalink / raw)
  To: Charles (Chas) Williams; +Cc: linux-kernel, rt

On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:
> > I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
> > topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
> > and 3. That -1 comes probably from topology_update_package_map(). Could
> > you please send a complete boot log and try the following patch? This
> > one should fix your boot problem and disable RAPL if the info is
> > invalid.
> 
> But sometimes the topology info is correct and if I get lucky, the
> package id could be valid for all the CPU's.  Given the behavior,
> I have seen so far it makes me thing the RAPL isn't being emulated.
> So even if I did boot onto a "valid" set of cores, would I always be
> certain that I will be on those cores?

I don't what vmware does here. Nor do they ship source to check. So if
you have a big HW box with say two packages, it might make sense to give
this information to the guest _if_ the CPUs are pinned and the guest
never migrates.

> Per your request in your next email:
> 
> > One thing I forgot to ask: Could you please check if you get the same
> > pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
> > rework).
> 
> Our previous kernel was 4.4, and didn't use the logical package id:
I see.

Did the patch I sent fixed it for you and were you not able to test?

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-28  8:03         ` Sebastian Andrzej Siewior
@ 2016-11-01 10:15           ` M. Vefa Bicakci
  2016-11-02 17:23             ` Sebastian Andrzej Siewior
  2016-11-02  9:16           ` Charles (Chas) Williams
  1 sibling, 1 reply; 12+ messages in thread
From: M. Vefa Bicakci @ 2016-11-01 10:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, Charles (Chas) Williams

> On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:
>>
>> [snip]
>>
>> But sometimes the topology info is correct and if I get lucky, the
>> package id could be valid for all the CPU's.  Given the behavior,
>> I have seen so far it makes me thing the RAPL isn't being emulated.
>> So even if I did boot onto a "valid" set of cores, would I always be
>> certain that I will be on those cores?
> 
> I don't what vmware does here. Nor do they ship source to check. So if
> you have a big HW box with say two packages, it might make sense to give
> this information to the guest _if_ the CPUs are pinned and the guest
> never migrates.
> 
>> Per your request in your next email:
>> 
>> > One thing I forgot to ask: Could you please check if you get the same
>> > pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
>> > rework).
>> 
>> Our previous kernel was 4.4, and didn't use the logical package id:
>
> I see.
> 
> Did the patch I sent fixed it for you and were you not able to test?

Hello Sebastian,

The patch fixes the kernel oops for me.

I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based
on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding
the allocation of package identifiers for the virtual CPUs.

Prior to your patch, my Xen-based virtual machines would intermittently
crash most of the time at boot-up with the backtrace reported by Charles.
Due to this, I was under the impression that this is a subtle race
condition.

With your patch, the virtual machines boot-up successfully, all the time.
Here are the relevant excerpts from dmesg:

=== 8< ===
[    0.263936] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535
...
[    2.213669] intel_rapl: Found RAPL domain package
[    2.213689] intel_rapl: Found RAPL domain core
[    2.216337] intel_rapl: Found RAPL domain uncore
[    2.216370] intel_rapl: RAPL package 0 domain package locked by BIOS
=== >8 ===

Thank you,

Vefa

Please note: I am not subscribed to the Linux kernel mailing list, so
I had to manually construct the headers of this reply with the proper
In-Reply-To and References values (which were extracted from marc.info).
As a result, this e-mail may not show up as a reply to your earlier
conversation with Charles.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-10-28  8:03         ` Sebastian Andrzej Siewior
  2016-11-01 10:15           ` M. Vefa Bicakci
@ 2016-11-02  9:16           ` Charles (Chas) Williams
  2016-11-02  9:58             ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 12+ messages in thread
From: Charles (Chas) Williams @ 2016-11-02  9:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, rt

On 10/28/2016 04:03 AM, Sebastian Andrzej Siewior wrote:
> On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:
>>> I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
>>> topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
>>> and 3. That -1 comes probably from topology_update_package_map(). Could
>>> you please send a complete boot log and try the following patch? This
>>> one should fix your boot problem and disable RAPL if the info is
>>> invalid.
>>
>> But sometimes the topology info is correct and if I get lucky, the
>> package id could be valid for all the CPU's.  Given the behavior,
>> I have seen so far it makes me thing the RAPL isn't being emulated.
>> So even if I did boot onto a "valid" set of cores, would I always be
>> certain that I will be on those cores?
>
> I don't what vmware does here. Nor do they ship source to check. So if
> you have a big HW box with say two packages, it might make sense to give
> this information to the guest _if_ the CPUs are pinned and the guest
> never migrates.

Yes, I agree _if_.  That's why it simply isn't clear to me that we should
attempt do any RAPL at all for VMWare.  The current behavior doesn't seem
to make sense and I don't expect it to suddenly start acting reasonable.
Since I don't understand why some package id's are valid and others
are not, I would prefer not to trust any of the information as far as
enabling/disabling the RAPL monitoring.

>
>> Per your request in your next email:
>>
>>> One thing I forgot to ask: Could you please check if you get the same
>>> pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
>>> rework).
>>
>> Our previous kernel was 4.4, and didn't use the logical package id:
> I see.
>
> Did the patch I sent fixed it for you and were you not able to test?

Yes, it does prevent RAPL from starting and loading.  From the boot log:

[    2.711481] RAPL PMU: rapl pmu error: max package: 4 but CPU2 belongs to 65535
[    2.711639] rapl pmu error: max package: 4 but CPU2 belongs to 65535

This was consistent across several reboots.  I poked around in the
VM settings.  Apparently this guest is configured for four virtual
sockets with one core per socket.  Testing with two virtual sockets,
one core per socket:

[    2.163177] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 65535
[    2.163304] rapl pmu error: max package: 2 but CPU1 belongs to 65535

Booting with 1 virtual socket, 1 core per socket:

[    1.750311] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 10737418240 ms ovfl timer
[    1.750312] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[    1.750313] RAPL PMU: hw unit of domain package 2^-0 Joules
[    1.750314] RAPL PMU: hw unit of domain dram 2^-0 Joules

Booting with 1 virtual socket, 4 cores per socket:

[    3.527298] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 10737418240 ms ovfl timer
[    3.527302] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[    3.527304] RAPL PMU: hw unit of domain package 2^-0 Joules
[    3.527307] RAPL PMU: hw unit of domain dram 2^-0 Joules

So, it looks like VMWare tends to always get something wrong if you have
more than one virtual socket.  The above behavior was consistent across
several reboots.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-11-02  9:16           ` Charles (Chas) Williams
@ 2016-11-02  9:58             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-11-02  9:58 UTC (permalink / raw)
  To: Charles (Chas) Williams; +Cc: linux-kernel, rt

On 2016-11-02 05:16:03 [-0400], Charles (Chas) Williams wrote:
> Yes, it does prevent RAPL from starting and loading.  From the boot log:
please send the whole bootlog. offlist if you want.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-11-01 10:15           ` M. Vefa Bicakci
@ 2016-11-02 17:23             ` Sebastian Andrzej Siewior
  2016-11-03 18:21               ` M. Vefa Bicakci
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-11-02 17:23 UTC (permalink / raw)
  To: M. Vefa Bicakci; +Cc: linux-kernel, Charles (Chas) Williams

On 2016-11-01 13:15:53 [+0300], M. Vefa Bicakci wrote:
> Hello Sebastian,
Hi,

> The patch fixes the kernel oops for me.
> 
> I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based
> on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding
> the allocation of package identifiers for the virtual CPUs.
> 
> Prior to your patch, my Xen-based virtual machines would intermittently
> crash most of the time at boot-up with the backtrace reported by Charles.
> Due to this, I was under the impression that this is a subtle race
> condition.

how hard is it to get such a xen setup up and running?

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
  2016-11-02 17:23             ` Sebastian Andrzej Siewior
@ 2016-11-03 18:21               ` M. Vefa Bicakci
  0 siblings, 0 replies; 12+ messages in thread
From: M. Vefa Bicakci @ 2016-11-03 18:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, Charles (Chas) Williams

On 11/02/2016 08:23 PM, Sebastian Andrzej Siewior wrote:
> On 2016-11-01 13:15:53 [+0300], M. Vefa Bicakci wrote:
>> Hello Sebastian,
>
> Hi,
> 
>> The patch fixes the kernel oops for me.
>>
>> I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based
>> on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding
>> the allocation of package identifiers for the virtual CPUs.
>>
>> Prior to your patch, my Xen-based virtual machines would intermittently
>> crash most of the time at boot-up with the backtrace reported by Charles.
>> Due to this, I was under the impression that this is a subtle race
>> condition.
> 
> how hard is it to get such a xen setup up and running?

Hello Sebastian,

Sorry about my late reply!

The set-up I use is a bit involved/complicated. To replicate it, you
would need to install Qubes OS R3.2 (assuming that you have compatible
hardware with a lot of RAM) and then build a custom 4.8.y-based kernel
with a set of cherry-picked commits. After installing this kernel in
dom0 with dnf or rpm, you would need to run:

  # Generate a domU initrd and copy the kernel image and the generated
  # initrd to Qubes OS's domU kernel directory (/var/lib/qubes/...)
  $ sudo /usr/sbin/qubes-prepare-vm-kernel <kernel_version>

  # From now on, use kernel_version when starting domU instances.
  $ qubes-prefs -s default-kernel <kernel_version>

Afterwards, starting a domU (i.e., AppVM in Qubes OS terminology) should
exhibit the issue in question related to RAPL:

  $ sudo truncate -s0 /var/log/xen/console/guest-<app_vm_name>.log
  $ qvm-start --debug <app_vm_name>
  $ cat /var/log/xen/console/guest-<app_vm_name>.log

As you may appreciate, the set-up is a bit involved. Nevertheless, in
case you would like to replicate my set-up, I can try to publish my
linux-4.8.y-based git branch so that you can build a similar kernel as
the one I use.

Vefa

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-03 18:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-20 20:27 Oops in rapl_cpu_prepare() Charles (Chas) Williams
2016-10-21 10:56 ` [PREEMPT-RT] " Sebastian Andrzej Siewior
2016-10-21 21:03   ` Charles (Chas) Williams
2016-10-25 12:22     ` Sebastian Andrzej Siewior
2016-10-25 12:42       ` Sebastian Andrzej Siewior
2016-10-27 19:00       ` Charles (Chas) Williams
2016-10-28  8:03         ` Sebastian Andrzej Siewior
2016-11-01 10:15           ` M. Vefa Bicakci
2016-11-02 17:23             ` Sebastian Andrzej Siewior
2016-11-03 18:21               ` M. Vefa Bicakci
2016-11-02  9:16           ` Charles (Chas) Williams
2016-11-02  9:58             ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.