From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6589C43603 for ; Mon, 16 Dec 2019 10:21:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7B63D20684 for ; Mon, 16 Dec 2019 10:21:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7B63D20684 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=bugs.launchpad.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:49608 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ignVK-0000H9-Na for qemu-devel@archiver.kernel.org; Mon, 16 Dec 2019 05:21:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:53619) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ignUd-00082e-KB for qemu-devel@nongnu.org; Mon, 16 Dec 2019 05:20:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ignUb-0007Fq-VE for qemu-devel@nongnu.org; Mon, 16 Dec 2019 05:20:47 -0500 Received: from indium.canonical.com ([91.189.90.7]:47412) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ignUb-0007Bw-PQ for qemu-devel@nongnu.org; Mon, 16 Dec 2019 05:20:45 -0500 Received: from loganberry.canonical.com ([91.189.90.37]) by indium.canonical.com with esmtp (Exim 4.86_2 #2 (Debian)) id 1ignUZ-00048g-E3 for ; Mon, 16 Dec 2019 10:20:43 +0000 Received: from loganberry.canonical.com (localhost [127.0.0.1]) by loganberry.canonical.com (Postfix) with ESMTP id 4AA542E80C8 for ; Mon, 16 Dec 2019 10:20:43 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Mon, 16 Dec 2019 10:06:05 -0000 From: Damir <1856335@bugs.launchpad.net> To: qemu-devel@nongnu.org X-Launchpad-Notification-Type: bug X-Launchpad-Bug: product=qemu; status=New; importance=Undecided; assignee=None; X-Launchpad-Bug-Information-Type: Public X-Launchpad-Bug-Private: no X-Launchpad-Bug-Security-Vulnerability: no X-Launchpad-Bug-Commenters: djdatte X-Launchpad-Bug-Reporter: Damir (djdatte) X-Launchpad-Bug-Modifier: Damir (djdatte) References: <157625616239.22064.10423897892496347105.malonedeb@gac.canonical.com> Message-Id: <157649076618.19251.2274802547008878034.launchpad@wampee.canonical.com> Subject: [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs X-Launchpad-Message-Rationale: Subscriber (QEMU) @qemu-devel-ml X-Launchpad-Message-For: qemu-devel-ml Precedence: bulk X-Generated-By: Launchpad (canonical.com); Revision="c597c3229eb023b1e626162d5947141bf7befb13"; Instance="production-secrets-lazr.conf" X-Launchpad-Hash: b31e90666a7df27d74c6e127a32282374b35c2bd X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 91.189.90.7 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Bug 1856335 <1856335@bugs.launchpad.net> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" ** Description changed: AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications. = Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): = - - EPYC-IBPB - AMD - + =C2=A0=C2=A0 + =C2=A0=C2=A0=C2=A0=C2=A0EPYC-IBPB + =C2=A0=C2=A0=C2=A0=C2=A0AMD + =C2=A0=C2=A0=C2=A0=C2=A0 = In windows, coreinfo reports correctly: = ****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 = On a 3-CCX CPU (3960X /w 6 cores and no SMT): = - - EPYC-IBPB - AMD - + =C2=A0 + =C2=A0=C2=A0=C2=A0=C2=A0EPYC-IBPB + =C2=A0=C2=A0=C2=A0=C2=A0AMD + =C2=A0=C2=A0=C2=A0=C2=A0 = in windows, coreinfo reports incorrectly: = ****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 = + Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. = - Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. = - = - With newer Qemu there is a fix (that does behave correctly) in using the = dies parameter: = - + With newer Qemu there is a fix (that does behave correctly) in using the = dies parameter: + =C2=A0 = The problem is that the dies are exposed differently than how AMD does - it natively, they are exposed to Windows as sockets, which means, you - can't ever have a machine with more than two CCX (6 cores) as Windows - only supports two sockets. (Should this be reported as a separate bug?) + it natively, they are exposed to Windows as sockets, which means, that + if you are nto a business user, you can't ever have a machine with more + than two CCX (6 cores) as consumer versions of Windows only supports two + sockets. (Should this be reported as a separate bug?) -- = You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1856335 Title: Cache Layout wrong on many Zen Arch CPUs Status in QEMU: New Bug description: AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications. Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0EPYC-IBPB =C2=A0=C2=A0=C2=A0=C2=A0AMD =C2=A0=C2=A0=C2=A0=C2=A0 In windows, coreinfo reports correctly: ****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 On a 3-CCX CPU (3960X /w 6 cores and no SMT): =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0EPYC-IBPB =C2=A0=C2=A0=C2=A0=C2=A0AMD =C2=A0=C2=A0=C2=A0=C2=A0 in windows, coreinfo reports incorrectly: ****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. With newer Qemu there is a fix (that does behave correctly) in using the = dies parameter: =C2=A0 The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, that if you are nto a business user, you can't ever have a machine with more than two CCX (6 cores) as consumer versions of Windows only supports two sockets. (Should this be reported as a separate bug?) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions