From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938493AbcKDSDX (ORCPT ); Fri, 4 Nov 2016 14:03:23 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:42062 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934249AbcKDSDW (ORCPT ); Fri, 4 Nov 2016 14:03:22 -0400 Date: Fri, 4 Nov 2016 19:03:14 +0100 From: Sebastian Andrzej Siewior To: "Charles (Chas) Williams" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, "M. Vefa Bicakci" Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory Message-ID: <20161104180313.wyaheuajevkrf6o7@linutronix.de> References: <20161102122557.qs4rl6mb7n7l7j7p@linutronix.de> <24e69019-60d0-29e7-e31f-c6f00f9ed98a@brocade.com> <20161103174753.o5ynquul2rjuiq77@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20161014 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote: > The initial CPU boots and is identified: > > [ 0.009018] identify_boot_cpu > [ 0.009174] generic_identify: phys_proc_id is now 0 > ... > [ 0.009427] identify_cpu: before c ffffffff81ae2680 logical_proc_id 0 c->phys_proc_id 0 > [ 0.009506] identify_cpu: after c ffffffff81ae2680 logical_proc_id 65535 c->phys_proc_id 0 > > So, this is fine because the APIC hasn't been scanned yet. APIC > now gets scanned: > > [ 0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040) > [ 0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040) > [ 0.015797] smpboot: Max logical packages: 2 where is the APICID here is comming from? > So, at this point, I think everything is correct. But now the secondary > CPU's "boot": > > [ 0.236569] identify_secondary_cpu > [ 0.236620] generic_identify: phys_proc_id is now 2 so here is where fun starts. Xen has also arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But isn't done for vmware but it might a place where they duct tape things. How is this APIC id different from the earlier? I guess based on your output that generic_identify() changes the content of phys_proc_id. > [ 0.236745] identify_cpu: before c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 > [ 0.236747] identify_cpu: after c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 > > So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called > my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get > set to -1. Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is set to -1 on init so slot two has the value -1. That is what you see - not the -1 because of ">= max_physical_pkg_id". > The comment at the end of identfy_cpu() says: > > /* The boot/hotplug time assigment got cleared, restore it */ > > So, logical_proc_id being wrong here before restoration doesn't bother > me since I assume something in booting the secondary CPU's clears any > existing cpu data. > > I know detect_extended_topology() is likely being called for both CPU's > and getting the right values (checking this now). I don't know why > generic_identify() is resetting this value. I don't know either. But it is clearly reading the apic id twice and second approach is different from the first which leads to different results. So if you figure out how the first APICID for the second CPU is retrieved and then you see how it happens for the second time. There must be a difference. Sebastian