From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932403AbbCCSql (ORCPT ); Tue, 3 Mar 2015 13:46:41 -0500 Received: from cantor2.suse.de ([195.135.220.15]:36255 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932261AbbCCSqk (ORCPT ); Tue, 3 Mar 2015 13:46:40 -0500 Date: Tue, 3 Mar 2015 19:45:27 +0100 From: Borislav Petkov To: Andre Przywara , Sudeep Holla , Tejun Heo Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org Subject: Re: [PATCH RFT v2] x86: move cacheinfo sysfs to generic cacheinfo infrastructure Message-ID: <20150303184527.GE3648@pd.tnic> References: <20150224175749.GC3575@pd.tnic> <1425249564-28347-1-git-send-email-andre.przywara@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1425249564-28347-1-git-send-email-andre.przywara@arm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org @Tejun: scroll to the end :) On Sun, Mar 01, 2015 at 10:39:24PM +0000, Andre Przywara wrote: > So I gave that a spin on my old PhenomII X6 box, and indeed there is a > NULL pointer dereference. I tracked it down to > __cache_amd_cpumap_setup(), where the sibling map for the L3 is > populated. When the function is called for the first CPU, subsequent > CPU's cacheinfo structures obviously haven't been initialized yet, so Right, let me try to understand the splat: [ 3.030598] RIP: 0010:[] [] _populate_cache_leaves+0x440/0x460 This is: 9d6: 49 0f a3 06 bt %rax,(%r14) 9da: 19 c9 sbb %ecx,%ecx 9dc: 85 c9 test %ecx,%ecx 9de: 74 d0 je 9b0 <_populate_cache_leaves+0x410> 9e0: f0 49 0f ab 84 24 f8 lock bts %rax,0xf8(%r12) <---- 9e7: 00 00 00 9ea: eb c4 jmp 9b0 <_populate_cache_leaves+0x410> 9ec: 4c 8b 65 98 mov -0x68(%rbp),%r12 9f0: e9 eb fd ff ff jmpq 7e0 <_populate_cache_leaves+0x240> 9f5: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) %r12 is 0 and that is this hunk in __cache_amd_cpumap_setup(): } else if (index == 3) { for_each_cpu(i, cpu_llc_shared_mask(cpu)) { this_cpu_ci = get_cpu_cacheinfo(i); this_leaf = this_cpu_ci->info_list + index; for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) { if (!cpu_online(sibling)) continue; cpumask_set_cpu(sibling, <--- set_bit() does LOCK &this_leaf->shared_cpu_map); } } so this_leaf is NULL. We get it by doing this_leaf = this_cpu_ci->info_list + index; and this_cpu_ci is a per-CPU variable. Now, previously the code did - if (!per_cpu(ici_cpuid4_info, i)) - continue; and __cache_cpumap_setup() already does: if (i == cpu || !sib_cpu_ci->info_list) continue;/* skip if itself or no cacheinfo */ so maybe we should do that too in __cache_amd_cpumap_setup(): if (!this_cpu_ci->info_list) continue; for the index == 3 case? It boots fine here with that change and it is consistent with the previous code. And yes, the x86 cacheinfo code could use a serious rubbing and cleanup. Btw, this patch introduces a bunch of new sysfs nodes in the caches hierarchy: --- caches-guest-before.txt 2015-03-03 15:11:09.168276423 +0100 +++ caches-guest-after.txt 2015-03-03 18:19:04.084426130 +0100 @@ -1,6 +1,22 @@ +/sys/devices/system/cpu/cpu0/cache/power/control:1:auto +/sys/devices/system/cpu/cpu0/cache/power/async:1:disabled +/sys/devices/system/cpu/cpu0/cache/power/runtime_enabled:1:disabled +/sys/devices/system/cpu/cpu0/cache/power/runtime_active_kids:1:0 +/sys/devices/system/cpu/cpu0/cache/power/runtime_active_time:1:0 +/sys/devices/system/cpu/cpu0/cache/power/runtime_status:1:unsupported +/sys/devices/system/cpu/cpu0/cache/power/runtime_usage:1:0 +/sys/devices/system/cpu/cpu0/cache/power/runtime_suspended_time:1:0 /sys/devices/system/cpu/cpu0/cache/index0/size:1:64K /sys/devices/system/cpu/cpu0/cache/index0/type:1:Data /sys/devices/system/cpu/cpu0/cache/index0/level:1:1 +/sys/devices/system/cpu/cpu0/cache/index0/power/control:1:auto +/sys/devices/system/cpu/cpu0/cache/index0/power/async:1:disabled +/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_enabled:1:disabled +/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_active_kids:1:0 +/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_active_time:1:0 +/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_status:1:unsupported +/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_usage:1:0 +/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_suspended_time:1:0 /sys/devices/system/cpu/cpu0/cache/index0/number_of_sets:1:512 /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map:1:1 /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list:1:0 ... What do those things mean? runtime_active_kids ?? Kids are active during runtime?! Well, that's a given, no need for a sysfs node for that :-) > > Indeed I see a regression between 3.19.0 and a clean 4.0.0-rc1 in this > > respect, the diff is like: > > -cpu5/cache/index3/shared_cpu_map:00000000,0000003f > > +cpu5/cache/index3/shared_cpu_map:3f > > (for each cpu and index) > > > > Can someone confirm (and investigate) this? > > I saw this even on ARM64 platforms with 4.0-rc1 and did narrow down to > series of formatting functions changes introduced by Tejun Heo, mainly > commit 4a0792b0e7a6("bitmap: use %*pb[l] to print bitmaps including > cpumasks and nodemasks") Strictly speaking, this is a regression if userspace is parsing those masks. I hardly doubt anything is doing that but if we had to be completely strict we should add back the zeroes. Tejun, we have this change in user-visible masks formatting in sysfs after your bitmaps printing changes: -cpu5/cache/index3/shared_cpu_map:00000000,0000003f +cpu5/cache/index3/shared_cpu_map:3f What's the rationale on userspace parsing bitmaps in sysfs, we don't care? Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --