linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Charles (Chas) Williams" <ciwillia@brocade.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <x86@kernel.org>, <linux-kernel@vger.kernel.org>,
	"M. Vefa Bicakci" <m.v.b@runbox.com>
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
Date: Fri, 4 Nov 2016 16:42:33 -0400	[thread overview]
Message-ID: <efe9a4fa-d018-0a0c-f094-310dee1b2a41@brocade.com> (raw)
In-Reply-To: <20161104180313.wyaheuajevkrf6o7@linutronix.de>

On 11/04/2016 02:03 PM, Sebastian Andrzej Siewior wrote:
> On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote:
>> The initial CPU boots and is identified:
>>
>> [    0.009018] identify_boot_cpu
>> [    0.009174] generic_identify: phys_proc_id is now 0
>> ...
>> [    0.009427] identify_cpu: before c ffffffff81ae2680  logical_proc_id 0  c->phys_proc_id 0
>> [    0.009506] identify_cpu: after c ffffffff81ae2680  logical_proc_id 65535  c->phys_proc_id 0
>>
>> So, this is fine because the APIC hasn't been scanned yet.  APIC
>> now gets scanned:
>>
>> [    0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
>> [    0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040)
>> [    0.015797] smpboot: Max logical packages: 2
>
> where is the APICID here is comming from?

This comes from here:

                 unsigned int apicid = apic->cpu_present_to_apicid(cpu);

                 if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
                         continue;
                 if (!topology_update_package_map(apicid, cpu))

And I think this is the part that is "wrong".  The apicid appears to
be a logical CPU id.  I believe that in most cases this mapping comes
from x86_bios_cpu_apicid (or x86_cpu_to_apicid) which is generated in
generic_processor_info() which maps apicid's to logical cpu indexes.

Note that apic->cpu_present_to_apicid() is using just the cpu_index.

         for_each_present_cpu(cpu) {
                 unsigned int apicid = apic->cpu_present_to_apicid(cpu);

                 if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
                         continue;
                 if (!topology_update_package_map(apicid, cpu))
                         continue;
                 pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
                 per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
                 set_cpu_possible(cpu, false);
                 set_cpu_present(cpu, false);
         }

>> So, at this point, I think everything is correct.  But now the secondary
>> CPU's "boot":
>>
>> [    0.236569] identify_secondary_cpu
>> [    0.236620] generic_identify: phys_proc_id is now 2
>
> so here is where fun starts. Xen has also
> arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But
> isn't done for vmware but it might a place where they duct tape things.
>
> How is this APIC id different from the earlier? I guess based on your
> output that generic_identify() changes the content of phys_proc_id.
>
>> [    0.236745] identify_cpu: before c ffff88023fd0a040  logical_proc_id 65535  c->phys_proc_id 2
>> [    0.236747] identify_cpu: after c ffff88023fd0a040  logical_proc_id 65535  c->phys_proc_id 2
>>
>> So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
>> my second CPU, 2.  This is >= max_physical_pkg_id, so it is going to get
>> set to -1.
>
> Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is
> set to -1 on init so slot two has the value -1. That is what you see -
> not the -1 because of ">= max_physical_pkg_id".
>
>> The comment at the end of identfy_cpu() says:
>>
>>         /* The boot/hotplug time assigment got cleared, restore it */
>>
>> So, logical_proc_id being wrong here before restoration doesn't bother
>> me since I assume something in booting the secondary CPU's clears any
>> existing cpu data.
>>
>> I know detect_extended_topology() is likely being called for both CPU's
>> and getting the right values (checking this now).  I don't know why
>> generic_identify() is resetting this value.
>
> I don't know either. But it is clearly reading the apic id twice and
> second approach is different from the first which leads to different
> results. So if you figure out how the first APICID for the second CPU is
> retrieved and then you see how it happens for the second time. There
> must be a difference.

The phys core id from generic_identify() comes from the CPU's EBX register
so we _know_ this is right.

	       if (c->cpuid_level >= 0x00000001) {
			c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
	#ifdef CONFIG_X86_32
	# ifdef CONFIG_SMP
			c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
	# else
			c->apicid = c->initial_apicid;
	# endif
	#endif
			c->phys_proc_id = c->initial_apicid;
		}

The intel docs http://x86.renejeschke.de/html/file_module_x86_id_45.html
claims this is the Local APIC ID.  So it seems likely this is correct
value.  It's not clear it matter if this is the right value or not
though.  Even if this is the correct apicid, nothing knows about it.

An argument could be made that instead of checking the cpuid level, we
could just use the apicid based on the cpu index just like the other code.
It would be consistent at least then.

  reply	other threads:[~2016-11-04 20:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-02 12:25 [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory Sebastian Andrzej Siewior
2016-11-02 22:47 ` Charles (Chas) Williams
2016-11-03 17:47   ` Sebastian Andrzej Siewior
2016-11-04 12:20     ` Charles (Chas) Williams
2016-11-04 18:03       ` Sebastian Andrzej Siewior
2016-11-04 20:42         ` Charles (Chas) Williams [this message]
2016-11-04 20:57           ` Sebastian Andrzej Siewior
2016-11-07 16:19   ` Thomas Gleixner
2016-11-07 16:59     ` Charles (Chas) Williams
2016-11-07 20:20       ` Thomas Gleixner
2016-11-08 14:20         ` Charles (Chas) Williams
2016-11-08 14:31           ` Thomas Gleixner
2016-11-08 14:57             ` Charles (Chas) Williams
2016-11-08 16:22               ` Thomas Gleixner
2016-11-09 15:35                 ` [PATCH] x86/cpuid: Deal with broken firmware once more Thomas Gleixner
2016-11-09 15:37                   ` Thomas Gleixner
2016-11-09 16:03                   ` Peter Zijlstra
2016-11-09 16:34                     ` Charles (Chas) Williams
2016-11-09 18:37                       ` Thomas Gleixner
2016-11-09 18:15                   ` Charles (Chas) Williams
2016-11-09 20:27                   ` [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN) tip-bot for Thomas Gleixner
2016-11-11  5:49                     ` Alok Kataria
2016-11-10  3:57                   ` [PATCH] x86/cpuid: Deal with broken firmware once more M. Vefa Bicakci
2016-11-10 10:50                     ` Charles (Chas) Williams
2016-11-10 11:14                       ` Thomas Gleixner
2016-11-12 22:05                       ` M. Vefa Bicakci
2016-11-10 11:13                     ` Thomas Gleixner
2016-11-10 11:39                       ` Peter Zijlstra
2016-11-10 14:02                       ` Boris Ostrovsky
2016-11-10 15:05                         ` Charles (Chas) Williams
2016-11-10 15:31                           ` Boris Ostrovsky
2016-11-10 15:54                             ` Sebastian Andrzej Siewior
2016-11-10 17:15                             ` Thomas Gleixner
2016-11-12 22:05                             ` M. Vefa Bicakci
2016-11-13 18:04                               ` Boris Ostrovsky
2016-11-13 23:42                                 ` M. Vefa Bicakci
2016-11-15  1:21                                   ` Boris Ostrovsky
2016-11-18 11:16                                     ` Thomas Gleixner
2016-11-18 14:22                                       ` Boris Ostrovsky
2016-11-10 15:12                         ` Thomas Gleixner
2016-11-10 15:38                           ` Boris Ostrovsky
2016-11-10 17:13                             ` Thomas Gleixner
2016-11-10 18:01                               ` Boris Ostrovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efe9a4fa-d018-0a0c-f094-310dee1b2a41@brocade.com \
    --to=ciwillia@brocade.com \
    --cc=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.v.b@runbox.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).