From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753391AbdBAUDC (ORCPT ); Wed, 1 Feb 2017 15:03:02 -0500 Received: from mail.skyhub.de ([78.46.96.112]:53898 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753135AbdBAUDA (ORCPT ); Wed, 1 Feb 2017 15:03:00 -0500 Date: Wed, 1 Feb 2017 21:02:37 +0100 From: Borislav Petkov To: x86-ml Cc: Yves Dionne , Brice Goglin , Peter Zijlstra , Yazen Ghannam , lkml Subject: [RFC PATCH] x86/CPU/AMD: Bring back Compute Unit ID Message-ID: <20170201200237.36s2jwjgxi24we66@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: NeoMutt/20161014 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Borislav Petkov a33d331761bc ("x86/CPU/AMD: Fix Bulldozer topology") restored the initial approach we had with the Fam15h topology of enumerating CU (Compute Unit) threads as cores. And this is still correct - they're beefier than HT threads but still have some shared functionality. Our current approach has a problem with the Mad Max Steam game, for example. Yves Dionne reported a certain "choppiness" while playing on 4.9.5. That problem stems most likely from the fact that the CU threads share resources within one CU and when we schedule to a thread of a different compute unit, this incurs latency due to migrating the working set to a different CU through the caches. When the thread siblings mask mirrors that aspect of the CUs and threads, the scheduler pays attention to it and tries to schedule within one CU first. Which takes care of the latency, of course. Reported-by: Yves Dionne Signed-off-by: Borislav Petkov Cc: Brice Goglin Cc: Peter Zijlstra Cc: Yazen Ghannam --- I know Yazen is working on another issue which touches the same code path but we'll synchronize the patches. I'm marking it RFC for now but once we have agreed on it, it should be CC:stable for 4.9. Initial testing looks good, I still need to get on a bigger F15h box to make sure everything is still kosher there too. arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/cpu/amd.c | 11 ++++++++++- arch/x86/kernel/smpboot.c | 10 +++++++--- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 1be64da0384e..e6cfe7ba2d65 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -104,6 +104,7 @@ struct cpuinfo_x86 { __u8 x86_phys_bits; /* CPUID returned core id bits: */ __u8 x86_coreid_bits; + __u8 cu_id; /* Max extended CPUID function supported: */ __u32 extended_cpuid_level; /* Maximum supported CPUID level, -1=no CPUID: */ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 80e657e89eed..e7158afb322b 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -309,8 +309,15 @@ static void amd_get_topology(struct cpuinfo_x86 *c) /* get information required for multi-node processors */ if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { + u32 eax, ebx, ecx, edx; - node_id = cpuid_ecx(0x8000001e) & 7; + cpuid(0x8000001e, &eax, &ebx, &ecx, &edx); + + node_id = ecx & 7; + smp_num_siblings = ((ebx >> 8) & 0xff) + 1; + + if (c->x86 == 0x15) + c->cu_id = ebx & 0xff; /* * We may have multiple LLCs if L3 caches exist, so check if we @@ -328,6 +335,8 @@ static void amd_get_topology(struct cpuinfo_x86 *c) per_cpu(cpu_llc_id, cpu) = node_id; } } + + } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) { u64 value; diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 548da5a8013e..f06fa338076b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -433,9 +433,13 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) int cpu1 = c->cpu_index, cpu2 = o->cpu_index; if (c->phys_proc_id == o->phys_proc_id && - per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2) && - c->cpu_core_id == o->cpu_core_id) - return topology_sane(c, o, "smt"); + per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) { + if (c->cpu_core_id == o->cpu_core_id) + return topology_sane(c, o, "smt"); + + if (c->cu_id == o->cu_id) + return topology_sane(c, o, "smt"); + } } else if (c->phys_proc_id == o->phys_proc_id && c->cpu_core_id == o->cpu_core_id) { -- 2.11.0 -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.