All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/9] i386: Enable TOPOEXT to support hyperthreading on AMD CPU
@ 2018-04-10 23:16 ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 4462 bytes --]

This series enables the TOPOEXT feature for AMD CPUs. This is required to
support hyperthreading on kvm guests.

This addresses the issues reported in these bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1481253
https://bugs.launchpad.net/qemu/+bug/1703506 

v6:
1.Fixed problem with patch#4(Add new property to control cache info). The parameter
 legacy_cache should be "on" by default on machine type "pc-q35-2.10". This was
 found by Alexandr Iarygin.
2.Fixed the l3 cache size for EPYC based machines(patch#3). Also, fixed the number of
 logical processors sharing the cache(patch#6). Only L3 cache is shared by multiple
 cores but not L1 or L2. This was a bug while decoding. This was found by Geoffrey McRae
 and he verified the fix. 

v5:
 In this series I tried to address the feedback from Eduardo Habkost.
 The discussion thread is here.
 https://patchwork.kernel.org/patch/10299745/
 The previous thread is here.
 http://patchwork.ozlabs.org/cover/884885/

Reason for these changes.
 The cache properties for AMD family of processors have changed from
 previous releases. We don't want to display the new information on the
 old family of processors as this might cause compatibility issues.

Changes:
1.Based the patches on top of Eduardo's(patch#1) patch.
  Changed few things.
  Moved the Cache definitions to cpu.h file.
  Changed the CPUID_4 names to generic names.
2.Added a new propery "legacy-cache" in cpu object(patch#2). This can be
  used to display the old property even if the host supports the new cache
  properties.
3.Added cache information in X86CPUDefinition and CPUX86State
4.Patch 6-7 changed quite a bit from previous version does to new approach.
5.Addressed few issues with CPUID_8000_001d and CPUID_8000_001E.

v4:
1.Removed the checks under cpuid 0x8000001D leaf(patch #2). These check are
  not necessary. Found this during internal review.
2.Added CPUID_EXT3_TOPOEXT feature for all the 17 family(patch #4). This was
  found by Kash Pande during his testing.
3.Removed th hardcoded cpuid xlevel and dynamically extended if CPUID_EXT3_TOPOEXT
  is supported(Suggested by Brijesh Singh). 

v3:
1.Removed the patch #1. Radim mentioned that original typo problem is in 
  linux kernel header. qemu is just copying those files.
2.In previous version, I used the cpuid 4 definitions for AMDs cpuid leaf
  0x8000001D. CPUID 4 is very intel specific and we dont want to expose those
  details under AMD. I have renamed some of these definitions as generic.
  These changes are in patch#1. Radim, let me know if this is what you intended.
3.Added assert to for core_id(Suggested by Radim Krčmář).
4.Changed the if condition under "L3 cache info"(Suggested by Gary Hook).
5.Addressed few more text correction and code cleanup(Suggested by Thomas Lendacky).

v2:
  Fixed few more minor issues per Gary Hook's comments. Thank you Gary.
  Removed the patch#1. We need to handle the instruction cache associativity 
  seperately. It varies based on the cpu family. I will comeback to that later.
  Added two more typo corrections in patch#1 and patch#5.

v1:
  Stanislav Lanci posted few patches earlier. 
  https://patchwork.kernel.org/patch/10040903/

Rebased his patches with few changes.
1.Spit the patches into two, separating cpuid functions 
  0x8000001D and 0x8000001E (Patch 2 and 3).
2.Removed the generic non-intel check and made a separate patch
  with some changes(Patch 5).
3.Fixed L3_N_SETS_AMD(from 4096 to 8192) based on CPUID_Fn8000001D_ECX_x03.

Added 2 more patches.
Patch 1. Fixes cache associativity.
Patch 4. Adds TOPOEXT feature on AMD EPYC CPU.


Babu Moger (8):
  i386: Add cache information in X86CPUDefinition
  i386: Initialize cache information for EPYC family processors
  i386: Add new property to control cache info
  i386: Use the statically loaded cache definitions
  i386: Populate AMD Processor Cache Information for cpuid 0x8000001D
  i386: Add support for CPUID_8000_001E for AMD
  i386: Enable TOPOEXT feature on AMD EPYC CPU
  i386: Remove generic SMT thread check

Eduardo Habkost (1):
  i386: Helpers to encode cache information consistently

 include/hw/i386/pc.h |   4 +
 target/i386/cpu.c    | 736 ++++++++++++++++++++++++++++++++++++++++++---------
 target/i386/cpu.h    |  66 +++++
 target/i386/kvm.c    |  29 +-
 4 files changed, 702 insertions(+), 133 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 0/9] i386: Enable TOPOEXT to support hyperthreading on AMD CPU
@ 2018-04-10 23:16 ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 4462 bytes --]

This series enables the TOPOEXT feature for AMD CPUs. This is required to
support hyperthreading on kvm guests.

This addresses the issues reported in these bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1481253
https://bugs.launchpad.net/qemu/+bug/1703506 

v6:
1.Fixed problem with patch#4(Add new property to control cache info). The parameter
 legacy_cache should be "on" by default on machine type "pc-q35-2.10". This was
 found by Alexandr Iarygin.
2.Fixed the l3 cache size for EPYC based machines(patch#3). Also, fixed the number of
 logical processors sharing the cache(patch#6). Only L3 cache is shared by multiple
 cores but not L1 or L2. This was a bug while decoding. This was found by Geoffrey McRae
 and he verified the fix. 

v5:
 In this series I tried to address the feedback from Eduardo Habkost.
 The discussion thread is here.
 https://patchwork.kernel.org/patch/10299745/
 The previous thread is here.
 http://patchwork.ozlabs.org/cover/884885/

Reason for these changes.
 The cache properties for AMD family of processors have changed from
 previous releases. We don't want to display the new information on the
 old family of processors as this might cause compatibility issues.

Changes:
1.Based the patches on top of Eduardo's(patch#1) patch.
  Changed few things.
  Moved the Cache definitions to cpu.h file.
  Changed the CPUID_4 names to generic names.
2.Added a new propery "legacy-cache" in cpu object(patch#2). This can be
  used to display the old property even if the host supports the new cache
  properties.
3.Added cache information in X86CPUDefinition and CPUX86State
4.Patch 6-7 changed quite a bit from previous version does to new approach.
5.Addressed few issues with CPUID_8000_001d and CPUID_8000_001E.

v4:
1.Removed the checks under cpuid 0x8000001D leaf(patch #2). These check are
  not necessary. Found this during internal review.
2.Added CPUID_EXT3_TOPOEXT feature for all the 17 family(patch #4). This was
  found by Kash Pande during his testing.
3.Removed th hardcoded cpuid xlevel and dynamically extended if CPUID_EXT3_TOPOEXT
  is supported(Suggested by Brijesh Singh). 

v3:
1.Removed the patch #1. Radim mentioned that original typo problem is in 
  linux kernel header. qemu is just copying those files.
2.In previous version, I used the cpuid 4 definitions for AMDs cpuid leaf
  0x8000001D. CPUID 4 is very intel specific and we dont want to expose those
  details under AMD. I have renamed some of these definitions as generic.
  These changes are in patch#1. Radim, let me know if this is what you intended.
3.Added assert to for core_id(Suggested by Radim Krčmář).
4.Changed the if condition under "L3 cache info"(Suggested by Gary Hook).
5.Addressed few more text correction and code cleanup(Suggested by Thomas Lendacky).

v2:
  Fixed few more minor issues per Gary Hook's comments. Thank you Gary.
  Removed the patch#1. We need to handle the instruction cache associativity 
  seperately. It varies based on the cpu family. I will comeback to that later.
  Added two more typo corrections in patch#1 and patch#5.

v1:
  Stanislav Lanci posted few patches earlier. 
  https://patchwork.kernel.org/patch/10040903/

Rebased his patches with few changes.
1.Spit the patches into two, separating cpuid functions 
  0x8000001D and 0x8000001E (Patch 2 and 3).
2.Removed the generic non-intel check and made a separate patch
  with some changes(Patch 5).
3.Fixed L3_N_SETS_AMD(from 4096 to 8192) based on CPUID_Fn8000001D_ECX_x03.

Added 2 more patches.
Patch 1. Fixes cache associativity.
Patch 4. Adds TOPOEXT feature on AMD EPYC CPU.


Babu Moger (8):
  i386: Add cache information in X86CPUDefinition
  i386: Initialize cache information for EPYC family processors
  i386: Add new property to control cache info
  i386: Use the statically loaded cache definitions
  i386: Populate AMD Processor Cache Information for cpuid 0x8000001D
  i386: Add support for CPUID_8000_001E for AMD
  i386: Enable TOPOEXT feature on AMD EPYC CPU
  i386: Remove generic SMT thread check

Eduardo Habkost (1):
  i386: Helpers to encode cache information consistently

 include/hw/i386/pc.h |   4 +
 target/i386/cpu.c    | 736 ++++++++++++++++++++++++++++++++++++++++++---------
 target/i386/cpu.h    |  66 +++++
 target/i386/kvm.c    |  29 +-
 4 files changed, 702 insertions(+), 133 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v6 1/9] i386: Helpers to encode cache information consistently
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

From: Eduardo Habkost <ehabkost@redhat.com>

Instead of having a collection of macros that need to be used in
complex expressions to build CPUID data, define a CPUCacheInfo
struct that can hold information about a given cache.  Helper
functions will take a CPUCacheInfo struct as input to encode
CPUID leaves for a cache.

This will help us ensure consistency between cache information
CPUID leaves, and make the existing inconsistencies in CPUID info
more visible.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 495 ++++++++++++++++++++++++++++++++++++++++--------------
 target/i386/cpu.h |  53 ++++++
 2 files changed, 424 insertions(+), 124 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 555ae79..2d3d7d8 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -56,33 +56,240 @@
 
 #include "disas/capstone.h"
 
+/* Helpers for building CPUID[2] descriptors: */
+
+struct CPUID2CacheDescriptorInfo {
+    enum CacheType type;
+    int level;
+    int size;
+    int line_size;
+    int associativity;
+};
 
-/* Cache topology CPUID constants: */
+#define KiB 1024
+#define MiB (1024 * 1024)
 
-/* CPUID Leaf 2 Descriptors */
+/*
+ * Known CPUID 2 cache descriptors.
+ * From Intel SDM Volume 2A, CPUID instruction
+ */
+struct CPUID2CacheDescriptorInfo cpuid2_cache_descriptors[] = {
+    [0x06] = { .level = 1, .type = ICACHE,        .size =   8 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x08] = { .level = 1, .type = ICACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x09] = { .level = 1, .type = ICACHE,        .size =  32 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x0A] = { .level = 1, .type = DCACHE,        .size =   8 * KiB,
+               .associativity = 2,  .line_size = 32, },
+    [0x0C] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x0D] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x0E] = { .level = 1, .type = DCACHE,        .size =  24 * KiB,
+               .associativity = 6,  .line_size = 64, },
+    [0x1D] = { .level = 2, .type = UNIFIED_CACHE, .size = 128 * KiB,
+               .associativity = 2,  .line_size = 64, },
+    [0x21] = { .level = 2, .type = UNIFIED_CACHE, .size = 256 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    /* lines per sector is not supported cpuid2_cache_descriptor(),
+    * so descriptors 0x22, 0x23 are not included
+    */
+    [0x24] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 16, .line_size = 64, },
+    /* lines per sector is not supported cpuid2_cache_descriptor(),
+    * so descriptors 0x25, 0x20 are not included
+    */
+    [0x2C] = { .level = 1, .type = DCACHE,        .size =  32 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x30] = { .level = 1, .type = ICACHE,        .size =  32 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x41] = { .level = 2, .type = UNIFIED_CACHE, .size = 128 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x42] = { .level = 2, .type = UNIFIED_CACHE, .size = 256 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x43] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x44] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x45] = { .level = 2, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x46] = { .level = 3, .type = UNIFIED_CACHE, .size =   4 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x47] = { .level = 3, .type = UNIFIED_CACHE, .size =   8 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x48] = { .level = 2, .type = UNIFIED_CACHE, .size =   3 * MiB,
+               .associativity = 12, .line_size = 64, },
+    /* Descriptor 0x49 depends on CPU family/model, so it is not included */
+    [0x4A] = { .level = 3, .type = UNIFIED_CACHE, .size =   6 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0x4B] = { .level = 3, .type = UNIFIED_CACHE, .size =   8 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0x4C] = { .level = 3, .type = UNIFIED_CACHE, .size =  12 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0x4D] = { .level = 3, .type = UNIFIED_CACHE, .size =  16 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0x4E] = { .level = 2, .type = UNIFIED_CACHE, .size =   6 * MiB,
+               .associativity = 24, .line_size = 64, },
+    [0x60] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x66] = { .level = 1, .type = DCACHE,        .size =   8 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x67] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x68] = { .level = 1, .type = DCACHE,        .size =  32 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x78] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    /* lines per sector is not supported cpuid2_cache_descriptor(),
+    * so descriptors 0x79, 0x7A, 0x7B, 0x7C are not included.
+    */
+    [0x7D] = { .level = 2, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x7F] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 2,  .line_size = 64, },
+    [0x80] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x82] = { .level = 2, .type = UNIFIED_CACHE, .size = 256 * KiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x83] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x84] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x85] = { .level = 2, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x86] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x87] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xD0] = { .level = 3, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0xD1] = { .level = 3, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    [0xD2] = { .level = 3, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    [0xD6] = { .level = 3, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xD7] = { .level = 3, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xD8] = { .level = 3, .type = UNIFIED_CACHE, .size =   4 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xDC] = { .level = 3, .type = UNIFIED_CACHE, .size = 1.5 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0xDD] = { .level = 3, .type = UNIFIED_CACHE, .size =   3 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0xDE] = { .level = 3, .type = UNIFIED_CACHE, .size =   6 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0xE2] = { .level = 3, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0xE3] = { .level = 3, .type = UNIFIED_CACHE, .size =   4 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0xE4] = { .level = 3, .type = UNIFIED_CACHE, .size =   8 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0xEA] = { .level = 3, .type = UNIFIED_CACHE, .size =  12 * MiB,
+               .associativity = 24, .line_size = 64, },
+    [0xEB] = { .level = 3, .type = UNIFIED_CACHE, .size =  18 * MiB,
+               .associativity = 24, .line_size = 64, },
+    [0xEC] = { .level = 3, .type = UNIFIED_CACHE, .size =  24 * MiB,
+               .associativity = 24, .line_size = 64, },
+};
+
+/*
+ * "CPUID leaf 2 does not report cache descriptor information,
+ * use CPUID leaf 4 to query cache parameters"
+ */
+#define CACHE_DESCRIPTOR_UNAVAILABLE 0xFF
 
-#define CPUID_2_L1D_32KB_8WAY_64B 0x2c
-#define CPUID_2_L1I_32KB_8WAY_64B 0x30
-#define CPUID_2_L2_2MB_8WAY_64B   0x7d
-#define CPUID_2_L3_16MB_16WAY_64B 0x4d
+/*
+ * Return a CPUID 2 cache descriptor for a given cache.
+ * If no known descriptor is found, return CACHE_DESCRIPTOR_UNAVAILABLE
+ */
+static uint8_t cpuid2_cache_descriptor(CPUCacheInfo *cache)
+{
+    int i;
+
+    assert(cache->size > 0);
+    assert(cache->level > 0);
+    assert(cache->line_size > 0);
+    assert(cache->associativity > 0);
+    for (i = 0; i < ARRAY_SIZE(cpuid2_cache_descriptors); i++) {
+        struct CPUID2CacheDescriptorInfo *d = &cpuid2_cache_descriptors[i];
+        if (d->level == cache->level && d->type == cache->type &&
+            d->size == cache->size && d->line_size == cache->line_size &&
+            d->associativity == cache->associativity) {
+                return i;
+            }
+    }
 
+    return CACHE_DESCRIPTOR_UNAVAILABLE;
+}
 
 /* CPUID Leaf 4 constants: */
 
 /* EAX: */
-#define CPUID_4_TYPE_DCACHE  1
-#define CPUID_4_TYPE_ICACHE  2
-#define CPUID_4_TYPE_UNIFIED 3
+#define CACHE_TYPE_D    1
+#define CACHE_TYPE_I    2
+#define CACHE_TYPE_UNIFIED   3
 
-#define CPUID_4_LEVEL(l)          ((l) << 5)
+#define CACHE_LEVEL(l)        (l << 5)
 
-#define CPUID_4_SELF_INIT_LEVEL (1 << 8)
-#define CPUID_4_FULLY_ASSOC     (1 << 9)
+#define CACHE_SELF_INIT_LEVEL (1 << 8)
 
 /* EDX: */
-#define CPUID_4_NO_INVD_SHARING (1 << 0)
-#define CPUID_4_INCLUSIVE       (1 << 1)
-#define CPUID_4_COMPLEX_IDX     (1 << 2)
+#define CACHE_NO_INVD_SHARING   (1 << 0)
+#define CACHE_INCLUSIVE       (1 << 1)
+#define CACHE_COMPLEX_IDX     (1 << 2)
+
+/* Encode CacheType for CPUID[4].EAX */
+#define CACHE_TYPE(t) (((t) == DCACHE)  ? CACHE_TYPE_D  : \
+                         ((t) == ICACHE)  ? CACHE_TYPE_I  : \
+                         ((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
+                         0 /* Invalid value */)
+
+
+/* Encode cache info for CPUID[4] */
+static void encode_cache_cpuid4(CPUCacheInfo *cache,
+                                int num_apic_ids, int num_cores,
+                                uint32_t *eax, uint32_t *ebx,
+                                uint32_t *ecx, uint32_t *edx)
+{
+    assert(cache->size == cache->line_size * cache->associativity *
+                          cache->partitions * cache->sets);
+
+    assert(num_apic_ids > 0);
+    *eax = CACHE_TYPE(cache->type) |
+           CACHE_LEVEL(cache->level) |
+           (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
+           ((num_cores - 1) << 26) |
+           ((num_apic_ids - 1) << 14);
+
+    assert(cache->line_size > 0);
+    assert(cache->partitions > 0);
+    assert(cache->associativity > 0);
+    /* We don't implement fully-associative caches */
+    assert(cache->associativity < cache->sets);
+    *ebx = (cache->line_size - 1) |
+           ((cache->partitions - 1) << 12) |
+           ((cache->associativity - 1) << 22);
+
+    assert(cache->sets > 0);
+    *ecx = cache->sets - 1;
+
+    *edx = (cache->no_invd_sharing ? CACHE_NO_INVD_SHARING : 0) |
+           (cache->inclusive ? CACHE_INCLUSIVE : 0) |
+           (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
+}
+
+/* Encode cache info for CPUID[0x80000005].ECX or CPUID[0x80000005].EDX */
+static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
+{
+    assert(cache->size % 1024 == 0);
+    assert(cache->lines_per_tag > 0);
+    assert(cache->associativity > 0);
+    assert(cache->line_size > 0);
+    return ((cache->size / 1024) << 24) | (cache->associativity << 16) |
+           (cache->lines_per_tag << 8) | (cache->line_size);
+}
 
 #define ASSOC_FULL 0xFF
 
@@ -100,57 +307,140 @@
                           a == ASSOC_FULL ? 0xF : \
                           0 /* invalid value */)
 
+/*
+ * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
+ * @l3 can be NULL.
+ */
+static void encode_cache_cpuid80000006(CPUCacheInfo *l2,
+                                       CPUCacheInfo *l3,
+                                       uint32_t *ecx, uint32_t *edx)
+{
+    assert(l2->size % 1024 == 0);
+    assert(l2->associativity > 0);
+    assert(l2->lines_per_tag > 0);
+    assert(l2->line_size > 0);
+    *ecx = ((l2->size / 1024) << 16) |
+           (AMD_ENC_ASSOC(l2->associativity) << 12) |
+           (l2->lines_per_tag << 8) | (l2->line_size);
+
+    if (l3) {
+        assert(l3->size % (512 * 1024) == 0);
+        assert(l3->associativity > 0);
+        assert(l3->lines_per_tag > 0);
+        assert(l3->line_size > 0);
+        *edx = ((l3->size / (512 * 1024)) << 18) |
+               (AMD_ENC_ASSOC(l3->associativity) << 12) |
+               (l3->lines_per_tag << 8) | (l3->line_size);
+    } else {
+        *edx = 0;
+    }
+}
 
 /* Definitions of the hardcoded cache entries we expose: */
 
 /* L1 data cache: */
-#define L1D_LINE_SIZE         64
-#define L1D_ASSOCIATIVITY      8
-#define L1D_SETS              64
-#define L1D_PARTITIONS         1
-/* Size = LINE_SIZE*ASSOCIATIVITY*SETS*PARTITIONS = 32KiB */
-#define L1D_DESCRIPTOR CPUID_2_L1D_32KB_8WAY_64B
+static CPUCacheInfo l1d_cache = {
+    .type = DCACHE,
+    .level = 1,
+    .size = 32 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 8,
+    .sets = 64,
+    .partitions = 1,
+    .no_invd_sharing = true,
+};
+
 /*FIXME: CPUID leaf 0x80000005 is inconsistent with leaves 2 & 4 */
-#define L1D_LINES_PER_TAG      1
-#define L1D_SIZE_KB_AMD       64
-#define L1D_ASSOCIATIVITY_AMD  2
+static CPUCacheInfo l1d_cache_amd = {
+    .type = DCACHE,
+    .level = 1,
+    .size = 64 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 2,
+    .sets = 512,
+    .partitions = 1,
+    .lines_per_tag = 1,
+    .no_invd_sharing = true,
+};
 
 /* L1 instruction cache: */
-#define L1I_LINE_SIZE         64
-#define L1I_ASSOCIATIVITY      8
-#define L1I_SETS              64
-#define L1I_PARTITIONS         1
-/* Size = LINE_SIZE*ASSOCIATIVITY*SETS*PARTITIONS = 32KiB */
-#define L1I_DESCRIPTOR CPUID_2_L1I_32KB_8WAY_64B
+static CPUCacheInfo l1i_cache = {
+    .type = ICACHE,
+    .level = 1,
+    .size = 32 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 8,
+    .sets = 64,
+    .partitions = 1,
+    .no_invd_sharing = true,
+};
+
 /*FIXME: CPUID leaf 0x80000005 is inconsistent with leaves 2 & 4 */
-#define L1I_LINES_PER_TAG      1
-#define L1I_SIZE_KB_AMD       64
-#define L1I_ASSOCIATIVITY_AMD  2
+static CPUCacheInfo l1i_cache_amd = {
+    .type = ICACHE,
+    .level = 1,
+    .size = 64 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 2,
+    .sets = 512,
+    .partitions = 1,
+    .lines_per_tag = 1,
+    .no_invd_sharing = true,
+};
 
 /* Level 2 unified cache: */
-#define L2_LINE_SIZE          64
-#define L2_ASSOCIATIVITY      16
-#define L2_SETS             4096
-#define L2_PARTITIONS          1
-/* Size = LINE_SIZE*ASSOCIATIVITY*SETS*PARTITIONS = 4MiB */
+static CPUCacheInfo l2_cache = {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 4 * MiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 16,
+    .sets = 4096,
+    .partitions = 1,
+    .no_invd_sharing = true,
+};
+
 /*FIXME: CPUID leaf 2 descriptor is inconsistent with CPUID leaf 4 */
-#define L2_DESCRIPTOR CPUID_2_L2_2MB_8WAY_64B
+static CPUCacheInfo l2_cache_cpuid2 = {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 2 * MiB,
+    .line_size = 64,
+    .associativity = 8,
+};
+
+
 /*FIXME: CPUID leaf 0x80000006 is inconsistent with leaves 2 & 4 */
-#define L2_LINES_PER_TAG       1
-#define L2_SIZE_KB_AMD       512
+static CPUCacheInfo l2_cache_amd = {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 512 * KiB,
+    .line_size = 64,
+    .lines_per_tag = 1,
+    .associativity = 8,
+    .sets = 1024,
+    .partitions = 1,
+};
 
 /* Level 3 unified cache: */
-#define L3_SIZE_KB             0 /* disabled */
-#define L3_ASSOCIATIVITY       0 /* disabled */
-#define L3_LINES_PER_TAG       0 /* disabled */
-#define L3_LINE_SIZE           0 /* disabled */
-#define L3_N_LINE_SIZE         64
-#define L3_N_ASSOCIATIVITY     16
-#define L3_N_SETS           16384
-#define L3_N_PARTITIONS         1
-#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B
-#define L3_N_LINES_PER_TAG      1
-#define L3_N_SIZE_KB_AMD    16384
+static CPUCacheInfo l3_cache = {
+    .type = UNIFIED_CACHE,
+    .level = 3,
+    .size = 16 * MiB,
+    .line_size = 64,
+    .associativity = 16,
+    .sets = 16384,
+    .partitions = 1,
+    .lines_per_tag = 1,
+    .self_init = true,
+    .inclusive = true,
+    .complex_indexing = true,
+};
 
 /* TLB definitions: */
 
@@ -3294,85 +3584,53 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         if (!cpu->enable_l3_cache) {
             *ecx = 0;
         } else {
-            *ecx = L3_N_DESCRIPTOR;
+            *ecx = cpuid2_cache_descriptor(&l3_cache);
         }
-        *edx = (L1D_DESCRIPTOR << 16) | \
-               (L1I_DESCRIPTOR <<  8) | \
-               (L2_DESCRIPTOR);
+        *edx = (cpuid2_cache_descriptor(&l1d_cache) << 16) |
+               (cpuid2_cache_descriptor(&l1i_cache) <<  8) |
+               (cpuid2_cache_descriptor(&l2_cache_cpuid2));
         break;
     case 4:
         /* cache info: needed for Core compatibility */
         if (cpu->cache_info_passthrough) {
             host_cpuid(index, count, eax, ebx, ecx, edx);
+            /* QEMU gives out its own APIC IDs, never pass down bits 31..26.  */
             *eax &= ~0xFC000000;
+            if ((*eax & 31) && cs->nr_cores > 1) {
+                *eax |= (cs->nr_cores - 1) << 26;
+            }
         } else {
             *eax = 0;
             switch (count) {
             case 0: /* L1 dcache info */
-                *eax |= CPUID_4_TYPE_DCACHE | \
-                        CPUID_4_LEVEL(1) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                *ebx = (L1D_LINE_SIZE - 1) | \
-                       ((L1D_PARTITIONS - 1) << 12) | \
-                       ((L1D_ASSOCIATIVITY - 1) << 22);
-                *ecx = L1D_SETS - 1;
-                *edx = CPUID_4_NO_INVD_SHARING;
+                encode_cache_cpuid4(&l1d_cache,
+                                    1, cs->nr_cores,
+                                    eax, ebx, ecx, edx);
                 break;
             case 1: /* L1 icache info */
-                *eax |= CPUID_4_TYPE_ICACHE | \
-                        CPUID_4_LEVEL(1) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                *ebx = (L1I_LINE_SIZE - 1) | \
-                       ((L1I_PARTITIONS - 1) << 12) | \
-                       ((L1I_ASSOCIATIVITY - 1) << 22);
-                *ecx = L1I_SETS - 1;
-                *edx = CPUID_4_NO_INVD_SHARING;
+                encode_cache_cpuid4(&l1i_cache,
+                                    1, cs->nr_cores,
+                                    eax, ebx, ecx, edx);
                 break;
             case 2: /* L2 cache info */
-                *eax |= CPUID_4_TYPE_UNIFIED | \
-                        CPUID_4_LEVEL(2) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                if (cs->nr_threads > 1) {
-                    *eax |= (cs->nr_threads - 1) << 14;
-                }
-                *ebx = (L2_LINE_SIZE - 1) | \
-                       ((L2_PARTITIONS - 1) << 12) | \
-                       ((L2_ASSOCIATIVITY - 1) << 22);
-                *ecx = L2_SETS - 1;
-                *edx = CPUID_4_NO_INVD_SHARING;
+                encode_cache_cpuid4(&l2_cache,
+                                    cs->nr_threads, cs->nr_cores,
+                                    eax, ebx, ecx, edx);
                 break;
             case 3: /* L3 cache info */
-                if (!cpu->enable_l3_cache) {
-                    *eax = 0;
-                    *ebx = 0;
-                    *ecx = 0;
-                    *edx = 0;
+                pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads);
+                if (cpu->enable_l3_cache) {
+                    encode_cache_cpuid4(&l3_cache,
+                                        (1 << pkg_offset), cs->nr_cores,
+                                        eax, ebx, ecx, edx);
                     break;
                 }
-                *eax |= CPUID_4_TYPE_UNIFIED | \
-                        CPUID_4_LEVEL(3) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads);
-                *eax |= ((1 << pkg_offset) - 1) << 14;
-                *ebx = (L3_N_LINE_SIZE - 1) | \
-                       ((L3_N_PARTITIONS - 1) << 12) | \
-                       ((L3_N_ASSOCIATIVITY - 1) << 22);
-                *ecx = L3_N_SETS - 1;
-                *edx = CPUID_4_INCLUSIVE | CPUID_4_COMPLEX_IDX;
-                break;
+                /* fall through */
             default: /* end of info */
-                *eax = 0;
-                *ebx = 0;
-                *ecx = 0;
-                *edx = 0;
+                *eax = *ebx = *ecx = *edx = 0;
                 break;
             }
         }
-
-        /* QEMU gives out its own APIC IDs, never pass down bits 31..26.  */
-        if ((*eax & 31) && cs->nr_cores > 1) {
-            *eax |= (cs->nr_cores - 1) << 26;
-        }
         break;
     case 5:
         /* mwait info: needed for Core compatibility */
@@ -3576,10 +3834,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L1_ITLB_2M_ASSOC <<  8) | (L1_ITLB_2M_ENTRIES);
         *ebx = (L1_DTLB_4K_ASSOC << 24) | (L1_DTLB_4K_ENTRIES << 16) | \
                (L1_ITLB_4K_ASSOC <<  8) | (L1_ITLB_4K_ENTRIES);
-        *ecx = (L1D_SIZE_KB_AMD << 24) | (L1D_ASSOCIATIVITY_AMD << 16) | \
-               (L1D_LINES_PER_TAG << 8) | (L1D_LINE_SIZE);
-        *edx = (L1I_SIZE_KB_AMD << 24) | (L1I_ASSOCIATIVITY_AMD << 16) | \
-               (L1I_LINES_PER_TAG << 8) | (L1I_LINE_SIZE);
+        *ecx = encode_cache_cpuid80000005(&l1d_cache_amd);
+        *edx = encode_cache_cpuid80000005(&l1i_cache_amd);
         break;
     case 0x80000006:
         /* cache info (L2 cache) */
@@ -3595,18 +3851,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L2_DTLB_4K_ENTRIES << 16) | \
                (AMD_ENC_ASSOC(L2_ITLB_4K_ASSOC) << 12) | \
                (L2_ITLB_4K_ENTRIES);
-        *ecx = (L2_SIZE_KB_AMD << 16) | \
-               (AMD_ENC_ASSOC(L2_ASSOCIATIVITY) << 12) | \
-               (L2_LINES_PER_TAG << 8) | (L2_LINE_SIZE);
-        if (!cpu->enable_l3_cache) {
-            *edx = ((L3_SIZE_KB / 512) << 18) | \
-                   (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \
-                   (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE);
-        } else {
-            *edx = ((L3_N_SIZE_KB_AMD / 512) << 18) | \
-                   (AMD_ENC_ASSOC(L3_N_ASSOCIATIVITY) << 12) | \
-                   (L3_N_LINES_PER_TAG << 8) | (L3_N_LINE_SIZE);
-        }
+        encode_cache_cpuid80000006(&l2_cache_amd,
+                                   cpu->enable_l3_cache ? &l3_cache : NULL,
+                                   ecx, edx);
         break;
     case 0x80000007:
         *eax = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 78db1b8..eaed287 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1044,6 +1044,59 @@ typedef enum TPRAccess {
     TPR_ACCESS_WRITE,
 } TPRAccess;
 
+/* Cache information data structures: */
+
+enum CacheType {
+    DCACHE,
+    ICACHE,
+    UNIFIED_CACHE
+};
+
+typedef struct CPUCacheInfo {
+    enum CacheType type;
+    uint8_t level;
+    /* Size in bytes */
+    uint32_t size;
+    /* Line size, in bytes */
+    uint16_t line_size;
+    /*
+     * Associativity.
+     * Note: representation of fully-associative caches is not implemented
+     */
+    uint8_t associativity;
+    /* Physical line partitions. CPUID[0x8000001D].EBX, CPUID[4].EBX */
+    uint8_t partitions;
+    /* Number of sets. CPUID[0x8000001D].ECX, CPUID[4].ECX */
+    uint32_t sets;
+    /*
+     * Lines per tag.
+     * AMD-specific: CPUID[0x80000005], CPUID[0x80000006].
+     * (Is this synonym to @partitions?)
+     */
+    uint8_t lines_per_tag;
+
+    /* Self-initializing cache */
+    bool self_init;
+    /*
+     * WBINVD/INVD is not guaranteed to act upon lower level caches of
+     * non-originating threads sharing this cache.
+     * CPUID[4].EDX[bit 0], CPUID[0x8000001D].EDX[bit 0]
+     */
+    bool no_invd_sharing;
+    /*
+     * Cache is inclusive of lower cache levels.
+     * CPUID[4].EDX[bit 1], CPUID[0x8000001D].EDX[bit 1].
+     */
+    bool inclusive;
+    /*
+     * A complex function is used to index the cache, potentially using all
+     * address bits.  CPUID[4].EDX[bit 2].
+     */
+    bool complex_indexing;
+} CPUCacheInfo;
+
+
+
 typedef struct CPUX86State {
     /* standard registers */
     target_ulong regs[CPU_NB_REGS];
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 1/9] i386: Helpers to encode cache information consistently
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

From: Eduardo Habkost <ehabkost@redhat.com>

Instead of having a collection of macros that need to be used in
complex expressions to build CPUID data, define a CPUCacheInfo
struct that can hold information about a given cache.  Helper
functions will take a CPUCacheInfo struct as input to encode
CPUID leaves for a cache.

This will help us ensure consistency between cache information
CPUID leaves, and make the existing inconsistencies in CPUID info
more visible.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 495 ++++++++++++++++++++++++++++++++++++++++--------------
 target/i386/cpu.h |  53 ++++++
 2 files changed, 424 insertions(+), 124 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 555ae79..2d3d7d8 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -56,33 +56,240 @@
 
 #include "disas/capstone.h"
 
+/* Helpers for building CPUID[2] descriptors: */
+
+struct CPUID2CacheDescriptorInfo {
+    enum CacheType type;
+    int level;
+    int size;
+    int line_size;
+    int associativity;
+};
 
-/* Cache topology CPUID constants: */
+#define KiB 1024
+#define MiB (1024 * 1024)
 
-/* CPUID Leaf 2 Descriptors */
+/*
+ * Known CPUID 2 cache descriptors.
+ * From Intel SDM Volume 2A, CPUID instruction
+ */
+struct CPUID2CacheDescriptorInfo cpuid2_cache_descriptors[] = {
+    [0x06] = { .level = 1, .type = ICACHE,        .size =   8 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x08] = { .level = 1, .type = ICACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x09] = { .level = 1, .type = ICACHE,        .size =  32 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x0A] = { .level = 1, .type = DCACHE,        .size =   8 * KiB,
+               .associativity = 2,  .line_size = 32, },
+    [0x0C] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x0D] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x0E] = { .level = 1, .type = DCACHE,        .size =  24 * KiB,
+               .associativity = 6,  .line_size = 64, },
+    [0x1D] = { .level = 2, .type = UNIFIED_CACHE, .size = 128 * KiB,
+               .associativity = 2,  .line_size = 64, },
+    [0x21] = { .level = 2, .type = UNIFIED_CACHE, .size = 256 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    /* lines per sector is not supported cpuid2_cache_descriptor(),
+    * so descriptors 0x22, 0x23 are not included
+    */
+    [0x24] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 16, .line_size = 64, },
+    /* lines per sector is not supported cpuid2_cache_descriptor(),
+    * so descriptors 0x25, 0x20 are not included
+    */
+    [0x2C] = { .level = 1, .type = DCACHE,        .size =  32 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x30] = { .level = 1, .type = ICACHE,        .size =  32 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x41] = { .level = 2, .type = UNIFIED_CACHE, .size = 128 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x42] = { .level = 2, .type = UNIFIED_CACHE, .size = 256 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x43] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x44] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x45] = { .level = 2, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 4,  .line_size = 32, },
+    [0x46] = { .level = 3, .type = UNIFIED_CACHE, .size =   4 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x47] = { .level = 3, .type = UNIFIED_CACHE, .size =   8 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x48] = { .level = 2, .type = UNIFIED_CACHE, .size =   3 * MiB,
+               .associativity = 12, .line_size = 64, },
+    /* Descriptor 0x49 depends on CPU family/model, so it is not included */
+    [0x4A] = { .level = 3, .type = UNIFIED_CACHE, .size =   6 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0x4B] = { .level = 3, .type = UNIFIED_CACHE, .size =   8 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0x4C] = { .level = 3, .type = UNIFIED_CACHE, .size =  12 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0x4D] = { .level = 3, .type = UNIFIED_CACHE, .size =  16 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0x4E] = { .level = 2, .type = UNIFIED_CACHE, .size =   6 * MiB,
+               .associativity = 24, .line_size = 64, },
+    [0x60] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x66] = { .level = 1, .type = DCACHE,        .size =   8 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x67] = { .level = 1, .type = DCACHE,        .size =  16 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x68] = { .level = 1, .type = DCACHE,        .size =  32 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x78] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    /* lines per sector is not supported cpuid2_cache_descriptor(),
+    * so descriptors 0x79, 0x7A, 0x7B, 0x7C are not included.
+    */
+    [0x7D] = { .level = 2, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x7F] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 2,  .line_size = 64, },
+    [0x80] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 8,  .line_size = 64, },
+    [0x82] = { .level = 2, .type = UNIFIED_CACHE, .size = 256 * KiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x83] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x84] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x85] = { .level = 2, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 8,  .line_size = 32, },
+    [0x86] = { .level = 2, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0x87] = { .level = 2, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xD0] = { .level = 3, .type = UNIFIED_CACHE, .size = 512 * KiB,
+               .associativity = 4,  .line_size = 64, },
+    [0xD1] = { .level = 3, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    [0xD2] = { .level = 3, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 4,  .line_size = 64, },
+    [0xD6] = { .level = 3, .type = UNIFIED_CACHE, .size =   1 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xD7] = { .level = 3, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xD8] = { .level = 3, .type = UNIFIED_CACHE, .size =   4 * MiB,
+               .associativity = 8,  .line_size = 64, },
+    [0xDC] = { .level = 3, .type = UNIFIED_CACHE, .size = 1.5 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0xDD] = { .level = 3, .type = UNIFIED_CACHE, .size =   3 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0xDE] = { .level = 3, .type = UNIFIED_CACHE, .size =   6 * MiB,
+               .associativity = 12, .line_size = 64, },
+    [0xE2] = { .level = 3, .type = UNIFIED_CACHE, .size =   2 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0xE3] = { .level = 3, .type = UNIFIED_CACHE, .size =   4 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0xE4] = { .level = 3, .type = UNIFIED_CACHE, .size =   8 * MiB,
+               .associativity = 16, .line_size = 64, },
+    [0xEA] = { .level = 3, .type = UNIFIED_CACHE, .size =  12 * MiB,
+               .associativity = 24, .line_size = 64, },
+    [0xEB] = { .level = 3, .type = UNIFIED_CACHE, .size =  18 * MiB,
+               .associativity = 24, .line_size = 64, },
+    [0xEC] = { .level = 3, .type = UNIFIED_CACHE, .size =  24 * MiB,
+               .associativity = 24, .line_size = 64, },
+};
+
+/*
+ * "CPUID leaf 2 does not report cache descriptor information,
+ * use CPUID leaf 4 to query cache parameters"
+ */
+#define CACHE_DESCRIPTOR_UNAVAILABLE 0xFF
 
-#define CPUID_2_L1D_32KB_8WAY_64B 0x2c
-#define CPUID_2_L1I_32KB_8WAY_64B 0x30
-#define CPUID_2_L2_2MB_8WAY_64B   0x7d
-#define CPUID_2_L3_16MB_16WAY_64B 0x4d
+/*
+ * Return a CPUID 2 cache descriptor for a given cache.
+ * If no known descriptor is found, return CACHE_DESCRIPTOR_UNAVAILABLE
+ */
+static uint8_t cpuid2_cache_descriptor(CPUCacheInfo *cache)
+{
+    int i;
+
+    assert(cache->size > 0);
+    assert(cache->level > 0);
+    assert(cache->line_size > 0);
+    assert(cache->associativity > 0);
+    for (i = 0; i < ARRAY_SIZE(cpuid2_cache_descriptors); i++) {
+        struct CPUID2CacheDescriptorInfo *d = &cpuid2_cache_descriptors[i];
+        if (d->level == cache->level && d->type == cache->type &&
+            d->size == cache->size && d->line_size == cache->line_size &&
+            d->associativity == cache->associativity) {
+                return i;
+            }
+    }
 
+    return CACHE_DESCRIPTOR_UNAVAILABLE;
+}
 
 /* CPUID Leaf 4 constants: */
 
 /* EAX: */
-#define CPUID_4_TYPE_DCACHE  1
-#define CPUID_4_TYPE_ICACHE  2
-#define CPUID_4_TYPE_UNIFIED 3
+#define CACHE_TYPE_D    1
+#define CACHE_TYPE_I    2
+#define CACHE_TYPE_UNIFIED   3
 
-#define CPUID_4_LEVEL(l)          ((l) << 5)
+#define CACHE_LEVEL(l)        (l << 5)
 
-#define CPUID_4_SELF_INIT_LEVEL (1 << 8)
-#define CPUID_4_FULLY_ASSOC     (1 << 9)
+#define CACHE_SELF_INIT_LEVEL (1 << 8)
 
 /* EDX: */
-#define CPUID_4_NO_INVD_SHARING (1 << 0)
-#define CPUID_4_INCLUSIVE       (1 << 1)
-#define CPUID_4_COMPLEX_IDX     (1 << 2)
+#define CACHE_NO_INVD_SHARING   (1 << 0)
+#define CACHE_INCLUSIVE       (1 << 1)
+#define CACHE_COMPLEX_IDX     (1 << 2)
+
+/* Encode CacheType for CPUID[4].EAX */
+#define CACHE_TYPE(t) (((t) == DCACHE)  ? CACHE_TYPE_D  : \
+                         ((t) == ICACHE)  ? CACHE_TYPE_I  : \
+                         ((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
+                         0 /* Invalid value */)
+
+
+/* Encode cache info for CPUID[4] */
+static void encode_cache_cpuid4(CPUCacheInfo *cache,
+                                int num_apic_ids, int num_cores,
+                                uint32_t *eax, uint32_t *ebx,
+                                uint32_t *ecx, uint32_t *edx)
+{
+    assert(cache->size == cache->line_size * cache->associativity *
+                          cache->partitions * cache->sets);
+
+    assert(num_apic_ids > 0);
+    *eax = CACHE_TYPE(cache->type) |
+           CACHE_LEVEL(cache->level) |
+           (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
+           ((num_cores - 1) << 26) |
+           ((num_apic_ids - 1) << 14);
+
+    assert(cache->line_size > 0);
+    assert(cache->partitions > 0);
+    assert(cache->associativity > 0);
+    /* We don't implement fully-associative caches */
+    assert(cache->associativity < cache->sets);
+    *ebx = (cache->line_size - 1) |
+           ((cache->partitions - 1) << 12) |
+           ((cache->associativity - 1) << 22);
+
+    assert(cache->sets > 0);
+    *ecx = cache->sets - 1;
+
+    *edx = (cache->no_invd_sharing ? CACHE_NO_INVD_SHARING : 0) |
+           (cache->inclusive ? CACHE_INCLUSIVE : 0) |
+           (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
+}
+
+/* Encode cache info for CPUID[0x80000005].ECX or CPUID[0x80000005].EDX */
+static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
+{
+    assert(cache->size % 1024 == 0);
+    assert(cache->lines_per_tag > 0);
+    assert(cache->associativity > 0);
+    assert(cache->line_size > 0);
+    return ((cache->size / 1024) << 24) | (cache->associativity << 16) |
+           (cache->lines_per_tag << 8) | (cache->line_size);
+}
 
 #define ASSOC_FULL 0xFF
 
@@ -100,57 +307,140 @@
                           a == ASSOC_FULL ? 0xF : \
                           0 /* invalid value */)
 
+/*
+ * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
+ * @l3 can be NULL.
+ */
+static void encode_cache_cpuid80000006(CPUCacheInfo *l2,
+                                       CPUCacheInfo *l3,
+                                       uint32_t *ecx, uint32_t *edx)
+{
+    assert(l2->size % 1024 == 0);
+    assert(l2->associativity > 0);
+    assert(l2->lines_per_tag > 0);
+    assert(l2->line_size > 0);
+    *ecx = ((l2->size / 1024) << 16) |
+           (AMD_ENC_ASSOC(l2->associativity) << 12) |
+           (l2->lines_per_tag << 8) | (l2->line_size);
+
+    if (l3) {
+        assert(l3->size % (512 * 1024) == 0);
+        assert(l3->associativity > 0);
+        assert(l3->lines_per_tag > 0);
+        assert(l3->line_size > 0);
+        *edx = ((l3->size / (512 * 1024)) << 18) |
+               (AMD_ENC_ASSOC(l3->associativity) << 12) |
+               (l3->lines_per_tag << 8) | (l3->line_size);
+    } else {
+        *edx = 0;
+    }
+}
 
 /* Definitions of the hardcoded cache entries we expose: */
 
 /* L1 data cache: */
-#define L1D_LINE_SIZE         64
-#define L1D_ASSOCIATIVITY      8
-#define L1D_SETS              64
-#define L1D_PARTITIONS         1
-/* Size = LINE_SIZE*ASSOCIATIVITY*SETS*PARTITIONS = 32KiB */
-#define L1D_DESCRIPTOR CPUID_2_L1D_32KB_8WAY_64B
+static CPUCacheInfo l1d_cache = {
+    .type = DCACHE,
+    .level = 1,
+    .size = 32 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 8,
+    .sets = 64,
+    .partitions = 1,
+    .no_invd_sharing = true,
+};
+
 /*FIXME: CPUID leaf 0x80000005 is inconsistent with leaves 2 & 4 */
-#define L1D_LINES_PER_TAG      1
-#define L1D_SIZE_KB_AMD       64
-#define L1D_ASSOCIATIVITY_AMD  2
+static CPUCacheInfo l1d_cache_amd = {
+    .type = DCACHE,
+    .level = 1,
+    .size = 64 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 2,
+    .sets = 512,
+    .partitions = 1,
+    .lines_per_tag = 1,
+    .no_invd_sharing = true,
+};
 
 /* L1 instruction cache: */
-#define L1I_LINE_SIZE         64
-#define L1I_ASSOCIATIVITY      8
-#define L1I_SETS              64
-#define L1I_PARTITIONS         1
-/* Size = LINE_SIZE*ASSOCIATIVITY*SETS*PARTITIONS = 32KiB */
-#define L1I_DESCRIPTOR CPUID_2_L1I_32KB_8WAY_64B
+static CPUCacheInfo l1i_cache = {
+    .type = ICACHE,
+    .level = 1,
+    .size = 32 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 8,
+    .sets = 64,
+    .partitions = 1,
+    .no_invd_sharing = true,
+};
+
 /*FIXME: CPUID leaf 0x80000005 is inconsistent with leaves 2 & 4 */
-#define L1I_LINES_PER_TAG      1
-#define L1I_SIZE_KB_AMD       64
-#define L1I_ASSOCIATIVITY_AMD  2
+static CPUCacheInfo l1i_cache_amd = {
+    .type = ICACHE,
+    .level = 1,
+    .size = 64 * KiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 2,
+    .sets = 512,
+    .partitions = 1,
+    .lines_per_tag = 1,
+    .no_invd_sharing = true,
+};
 
 /* Level 2 unified cache: */
-#define L2_LINE_SIZE          64
-#define L2_ASSOCIATIVITY      16
-#define L2_SETS             4096
-#define L2_PARTITIONS          1
-/* Size = LINE_SIZE*ASSOCIATIVITY*SETS*PARTITIONS = 4MiB */
+static CPUCacheInfo l2_cache = {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 4 * MiB,
+    .self_init = 1,
+    .line_size = 64,
+    .associativity = 16,
+    .sets = 4096,
+    .partitions = 1,
+    .no_invd_sharing = true,
+};
+
 /*FIXME: CPUID leaf 2 descriptor is inconsistent with CPUID leaf 4 */
-#define L2_DESCRIPTOR CPUID_2_L2_2MB_8WAY_64B
+static CPUCacheInfo l2_cache_cpuid2 = {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 2 * MiB,
+    .line_size = 64,
+    .associativity = 8,
+};
+
+
 /*FIXME: CPUID leaf 0x80000006 is inconsistent with leaves 2 & 4 */
-#define L2_LINES_PER_TAG       1
-#define L2_SIZE_KB_AMD       512
+static CPUCacheInfo l2_cache_amd = {
+    .type = UNIFIED_CACHE,
+    .level = 2,
+    .size = 512 * KiB,
+    .line_size = 64,
+    .lines_per_tag = 1,
+    .associativity = 8,
+    .sets = 1024,
+    .partitions = 1,
+};
 
 /* Level 3 unified cache: */
-#define L3_SIZE_KB             0 /* disabled */
-#define L3_ASSOCIATIVITY       0 /* disabled */
-#define L3_LINES_PER_TAG       0 /* disabled */
-#define L3_LINE_SIZE           0 /* disabled */
-#define L3_N_LINE_SIZE         64
-#define L3_N_ASSOCIATIVITY     16
-#define L3_N_SETS           16384
-#define L3_N_PARTITIONS         1
-#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B
-#define L3_N_LINES_PER_TAG      1
-#define L3_N_SIZE_KB_AMD    16384
+static CPUCacheInfo l3_cache = {
+    .type = UNIFIED_CACHE,
+    .level = 3,
+    .size = 16 * MiB,
+    .line_size = 64,
+    .associativity = 16,
+    .sets = 16384,
+    .partitions = 1,
+    .lines_per_tag = 1,
+    .self_init = true,
+    .inclusive = true,
+    .complex_indexing = true,
+};
 
 /* TLB definitions: */
 
@@ -3294,85 +3584,53 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         if (!cpu->enable_l3_cache) {
             *ecx = 0;
         } else {
-            *ecx = L3_N_DESCRIPTOR;
+            *ecx = cpuid2_cache_descriptor(&l3_cache);
         }
-        *edx = (L1D_DESCRIPTOR << 16) | \
-               (L1I_DESCRIPTOR <<  8) | \
-               (L2_DESCRIPTOR);
+        *edx = (cpuid2_cache_descriptor(&l1d_cache) << 16) |
+               (cpuid2_cache_descriptor(&l1i_cache) <<  8) |
+               (cpuid2_cache_descriptor(&l2_cache_cpuid2));
         break;
     case 4:
         /* cache info: needed for Core compatibility */
         if (cpu->cache_info_passthrough) {
             host_cpuid(index, count, eax, ebx, ecx, edx);
+            /* QEMU gives out its own APIC IDs, never pass down bits 31..26.  */
             *eax &= ~0xFC000000;
+            if ((*eax & 31) && cs->nr_cores > 1) {
+                *eax |= (cs->nr_cores - 1) << 26;
+            }
         } else {
             *eax = 0;
             switch (count) {
             case 0: /* L1 dcache info */
-                *eax |= CPUID_4_TYPE_DCACHE | \
-                        CPUID_4_LEVEL(1) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                *ebx = (L1D_LINE_SIZE - 1) | \
-                       ((L1D_PARTITIONS - 1) << 12) | \
-                       ((L1D_ASSOCIATIVITY - 1) << 22);
-                *ecx = L1D_SETS - 1;
-                *edx = CPUID_4_NO_INVD_SHARING;
+                encode_cache_cpuid4(&l1d_cache,
+                                    1, cs->nr_cores,
+                                    eax, ebx, ecx, edx);
                 break;
             case 1: /* L1 icache info */
-                *eax |= CPUID_4_TYPE_ICACHE | \
-                        CPUID_4_LEVEL(1) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                *ebx = (L1I_LINE_SIZE - 1) | \
-                       ((L1I_PARTITIONS - 1) << 12) | \
-                       ((L1I_ASSOCIATIVITY - 1) << 22);
-                *ecx = L1I_SETS - 1;
-                *edx = CPUID_4_NO_INVD_SHARING;
+                encode_cache_cpuid4(&l1i_cache,
+                                    1, cs->nr_cores,
+                                    eax, ebx, ecx, edx);
                 break;
             case 2: /* L2 cache info */
-                *eax |= CPUID_4_TYPE_UNIFIED | \
-                        CPUID_4_LEVEL(2) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                if (cs->nr_threads > 1) {
-                    *eax |= (cs->nr_threads - 1) << 14;
-                }
-                *ebx = (L2_LINE_SIZE - 1) | \
-                       ((L2_PARTITIONS - 1) << 12) | \
-                       ((L2_ASSOCIATIVITY - 1) << 22);
-                *ecx = L2_SETS - 1;
-                *edx = CPUID_4_NO_INVD_SHARING;
+                encode_cache_cpuid4(&l2_cache,
+                                    cs->nr_threads, cs->nr_cores,
+                                    eax, ebx, ecx, edx);
                 break;
             case 3: /* L3 cache info */
-                if (!cpu->enable_l3_cache) {
-                    *eax = 0;
-                    *ebx = 0;
-                    *ecx = 0;
-                    *edx = 0;
+                pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads);
+                if (cpu->enable_l3_cache) {
+                    encode_cache_cpuid4(&l3_cache,
+                                        (1 << pkg_offset), cs->nr_cores,
+                                        eax, ebx, ecx, edx);
                     break;
                 }
-                *eax |= CPUID_4_TYPE_UNIFIED | \
-                        CPUID_4_LEVEL(3) | \
-                        CPUID_4_SELF_INIT_LEVEL;
-                pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads);
-                *eax |= ((1 << pkg_offset) - 1) << 14;
-                *ebx = (L3_N_LINE_SIZE - 1) | \
-                       ((L3_N_PARTITIONS - 1) << 12) | \
-                       ((L3_N_ASSOCIATIVITY - 1) << 22);
-                *ecx = L3_N_SETS - 1;
-                *edx = CPUID_4_INCLUSIVE | CPUID_4_COMPLEX_IDX;
-                break;
+                /* fall through */
             default: /* end of info */
-                *eax = 0;
-                *ebx = 0;
-                *ecx = 0;
-                *edx = 0;
+                *eax = *ebx = *ecx = *edx = 0;
                 break;
             }
         }
-
-        /* QEMU gives out its own APIC IDs, never pass down bits 31..26.  */
-        if ((*eax & 31) && cs->nr_cores > 1) {
-            *eax |= (cs->nr_cores - 1) << 26;
-        }
         break;
     case 5:
         /* mwait info: needed for Core compatibility */
@@ -3576,10 +3834,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L1_ITLB_2M_ASSOC <<  8) | (L1_ITLB_2M_ENTRIES);
         *ebx = (L1_DTLB_4K_ASSOC << 24) | (L1_DTLB_4K_ENTRIES << 16) | \
                (L1_ITLB_4K_ASSOC <<  8) | (L1_ITLB_4K_ENTRIES);
-        *ecx = (L1D_SIZE_KB_AMD << 24) | (L1D_ASSOCIATIVITY_AMD << 16) | \
-               (L1D_LINES_PER_TAG << 8) | (L1D_LINE_SIZE);
-        *edx = (L1I_SIZE_KB_AMD << 24) | (L1I_ASSOCIATIVITY_AMD << 16) | \
-               (L1I_LINES_PER_TAG << 8) | (L1I_LINE_SIZE);
+        *ecx = encode_cache_cpuid80000005(&l1d_cache_amd);
+        *edx = encode_cache_cpuid80000005(&l1i_cache_amd);
         break;
     case 0x80000006:
         /* cache info (L2 cache) */
@@ -3595,18 +3851,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L2_DTLB_4K_ENTRIES << 16) | \
                (AMD_ENC_ASSOC(L2_ITLB_4K_ASSOC) << 12) | \
                (L2_ITLB_4K_ENTRIES);
-        *ecx = (L2_SIZE_KB_AMD << 16) | \
-               (AMD_ENC_ASSOC(L2_ASSOCIATIVITY) << 12) | \
-               (L2_LINES_PER_TAG << 8) | (L2_LINE_SIZE);
-        if (!cpu->enable_l3_cache) {
-            *edx = ((L3_SIZE_KB / 512) << 18) | \
-                   (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \
-                   (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE);
-        } else {
-            *edx = ((L3_N_SIZE_KB_AMD / 512) << 18) | \
-                   (AMD_ENC_ASSOC(L3_N_ASSOCIATIVITY) << 12) | \
-                   (L3_N_LINES_PER_TAG << 8) | (L3_N_LINE_SIZE);
-        }
+        encode_cache_cpuid80000006(&l2_cache_amd,
+                                   cpu->enable_l3_cache ? &l3_cache : NULL,
+                                   ecx, edx);
         break;
     case 0x80000007:
         *eax = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 78db1b8..eaed287 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1044,6 +1044,59 @@ typedef enum TPRAccess {
     TPR_ACCESS_WRITE,
 } TPRAccess;
 
+/* Cache information data structures: */
+
+enum CacheType {
+    DCACHE,
+    ICACHE,
+    UNIFIED_CACHE
+};
+
+typedef struct CPUCacheInfo {
+    enum CacheType type;
+    uint8_t level;
+    /* Size in bytes */
+    uint32_t size;
+    /* Line size, in bytes */
+    uint16_t line_size;
+    /*
+     * Associativity.
+     * Note: representation of fully-associative caches is not implemented
+     */
+    uint8_t associativity;
+    /* Physical line partitions. CPUID[0x8000001D].EBX, CPUID[4].EBX */
+    uint8_t partitions;
+    /* Number of sets. CPUID[0x8000001D].ECX, CPUID[4].ECX */
+    uint32_t sets;
+    /*
+     * Lines per tag.
+     * AMD-specific: CPUID[0x80000005], CPUID[0x80000006].
+     * (Is this synonym to @partitions?)
+     */
+    uint8_t lines_per_tag;
+
+    /* Self-initializing cache */
+    bool self_init;
+    /*
+     * WBINVD/INVD is not guaranteed to act upon lower level caches of
+     * non-originating threads sharing this cache.
+     * CPUID[4].EDX[bit 0], CPUID[0x8000001D].EDX[bit 0]
+     */
+    bool no_invd_sharing;
+    /*
+     * Cache is inclusive of lower cache levels.
+     * CPUID[4].EDX[bit 1], CPUID[0x8000001D].EDX[bit 1].
+     */
+    bool inclusive;
+    /*
+     * A complex function is used to index the cache, potentially using all
+     * address bits.  CPUID[4].EDX[bit 2].
+     */
+    bool complex_indexing;
+} CPUCacheInfo;
+
+
+
 typedef struct CPUX86State {
     /* standard registers */
     target_ulong regs[CPU_NB_REGS];
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 2/9] i386: Add cache information in X86CPUDefinition
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

Add cache information in X86CPUDefinition and CPUX86State.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 4 ++++
 target/i386/cpu.h | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2d3d7d8..8c84fa2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1098,6 +1098,7 @@ struct X86CPUDefinition {
     int stepping;
     FeatureWordArray features;
     const char *model_id;
+    CPUCaches cache_info;
 };
 
 static X86CPUDefinition builtin_x86_defs[] = {
@@ -3235,6 +3236,9 @@ static void x86_cpu_load_def(X86CPU *cpu, X86CPUDefinition *def, Error **errp)
         env->features[w] = def->features[w];
     }
 
+    /* Load Cache information from the X86CPUDefinition */
+    memcpy(&env->cache_info, &def->cache_info, sizeof(CPUCaches));
+
     /* Special cases not set in the X86CPUDefinition structs: */
     /* TODO: in-kernel irqchip for hvf */
     if (kvm_enabled()) {
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index eaed287..aff8396 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1096,6 +1096,13 @@ typedef struct CPUCacheInfo {
 } CPUCacheInfo;
 
 
+typedef struct CPUCaches {
+        bool valid;
+        CPUCacheInfo l1d_cache;
+        CPUCacheInfo l1i_cache;
+        CPUCacheInfo l2_cache;
+        CPUCacheInfo l3_cache;
+} CPUCaches;
 
 typedef struct CPUX86State {
     /* standard registers */
@@ -1282,6 +1289,7 @@ typedef struct CPUX86State {
     /* Features that were explicitly enabled/disabled */
     FeatureWordArray user_features;
     uint32_t cpuid_model[12];
+    CPUCaches cache_info;
 
     /* MTRRs */
     uint64_t mtrr_fixed[11];
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 2/9] i386: Add cache information in X86CPUDefinition
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

Add cache information in X86CPUDefinition and CPUX86State.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 4 ++++
 target/i386/cpu.h | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2d3d7d8..8c84fa2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1098,6 +1098,7 @@ struct X86CPUDefinition {
     int stepping;
     FeatureWordArray features;
     const char *model_id;
+    CPUCaches cache_info;
 };
 
 static X86CPUDefinition builtin_x86_defs[] = {
@@ -3235,6 +3236,9 @@ static void x86_cpu_load_def(X86CPU *cpu, X86CPUDefinition *def, Error **errp)
         env->features[w] = def->features[w];
     }
 
+    /* Load Cache information from the X86CPUDefinition */
+    memcpy(&env->cache_info, &def->cache_info, sizeof(CPUCaches));
+
     /* Special cases not set in the X86CPUDefinition structs: */
     /* TODO: in-kernel irqchip for hvf */
     if (kvm_enabled()) {
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index eaed287..aff8396 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1096,6 +1096,13 @@ typedef struct CPUCacheInfo {
 } CPUCacheInfo;
 
 
+typedef struct CPUCaches {
+        bool valid;
+        CPUCacheInfo l1d_cache;
+        CPUCacheInfo l1i_cache;
+        CPUCacheInfo l2_cache;
+        CPUCacheInfo l3_cache;
+} CPUCaches;
 
 typedef struct CPUX86State {
     /* standard registers */
@@ -1282,6 +1289,7 @@ typedef struct CPUX86State {
     /* Features that were explicitly enabled/disabled */
     FeatureWordArray user_features;
     uint32_t cpuid_model[12];
+    CPUCaches cache_info;
 
     /* MTRRs */
     uint64_t mtrr_fixed[11];
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 3/9] i386: Initialize cache information for EPYC family processors
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

Initialize pre-determined cache information for EPYC processors.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8c84fa2..3b2a19a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2295,6 +2295,54 @@ static X86CPUDefinition builtin_x86_defs[] = {
             CPUID_6_EAX_ARAT,
         .xlevel = 0x8000000A,
         .model_id = "AMD EPYC Processor",
+        .cache_info.valid = 1,
+        .cache_info.l1d_cache = {
+            .type = DCACHE,
+            .level = 1,
+            .size = 32 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 64,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l1i_cache = {
+            .type = ICACHE,
+            .level = 1,
+            .size = 64 * KiB,
+            .line_size = 64,
+            .associativity = 4,
+            .partitions = 1,
+            .sets = 256,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l2_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 2,
+            .size = 512 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 1024,
+            .lines_per_tag = 1,
+        },
+        .cache_info.l3_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 3,
+            .size = 8 * MiB,
+            .line_size = 64,
+            .associativity = 16,
+            .partitions = 1,
+            .sets = 8192,
+            .lines_per_tag = 1,
+            .self_init = true,
+            .inclusive = true,
+            .complex_indexing = true,
+        },
     },
     {
         .name = "EPYC-IBPB",
@@ -2341,6 +2389,54 @@ static X86CPUDefinition builtin_x86_defs[] = {
             CPUID_6_EAX_ARAT,
         .xlevel = 0x8000000A,
         .model_id = "AMD EPYC Processor (with IBPB)",
+        .cache_info.valid = 1,
+        .cache_info.l1d_cache = {
+            .type = DCACHE,
+            .level = 1,
+            .size = 32 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 64,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l1i_cache = {
+            .type = ICACHE,
+            .level = 1,
+            .size = 64 * KiB,
+            .line_size = 64,
+            .associativity = 4,
+            .partitions = 1,
+            .sets = 256,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l2_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 2,
+            .size = 512 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 1024,
+            .lines_per_tag = 1,
+        },
+        .cache_info.l3_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 3,
+            .size = 8 * MiB,
+            .line_size = 64,
+            .associativity = 16,
+            .partitions = 1,
+            .sets = 8192,
+            .lines_per_tag = 1,
+            .self_init = true,
+            .inclusive = true,
+            .complex_indexing = true,
+        },
     },
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 3/9] i386: Initialize cache information for EPYC family processors
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

Initialize pre-determined cache information for EPYC processors.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8c84fa2..3b2a19a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2295,6 +2295,54 @@ static X86CPUDefinition builtin_x86_defs[] = {
             CPUID_6_EAX_ARAT,
         .xlevel = 0x8000000A,
         .model_id = "AMD EPYC Processor",
+        .cache_info.valid = 1,
+        .cache_info.l1d_cache = {
+            .type = DCACHE,
+            .level = 1,
+            .size = 32 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 64,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l1i_cache = {
+            .type = ICACHE,
+            .level = 1,
+            .size = 64 * KiB,
+            .line_size = 64,
+            .associativity = 4,
+            .partitions = 1,
+            .sets = 256,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l2_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 2,
+            .size = 512 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 1024,
+            .lines_per_tag = 1,
+        },
+        .cache_info.l3_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 3,
+            .size = 8 * MiB,
+            .line_size = 64,
+            .associativity = 16,
+            .partitions = 1,
+            .sets = 8192,
+            .lines_per_tag = 1,
+            .self_init = true,
+            .inclusive = true,
+            .complex_indexing = true,
+        },
     },
     {
         .name = "EPYC-IBPB",
@@ -2341,6 +2389,54 @@ static X86CPUDefinition builtin_x86_defs[] = {
             CPUID_6_EAX_ARAT,
         .xlevel = 0x8000000A,
         .model_id = "AMD EPYC Processor (with IBPB)",
+        .cache_info.valid = 1,
+        .cache_info.l1d_cache = {
+            .type = DCACHE,
+            .level = 1,
+            .size = 32 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 64,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l1i_cache = {
+            .type = ICACHE,
+            .level = 1,
+            .size = 64 * KiB,
+            .line_size = 64,
+            .associativity = 4,
+            .partitions = 1,
+            .sets = 256,
+            .lines_per_tag = 1,
+            .self_init = 1,
+            .no_invd_sharing = true,
+        },
+        .cache_info.l2_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 2,
+            .size = 512 * KiB,
+            .line_size = 64,
+            .associativity = 8,
+            .partitions = 1,
+            .sets = 1024,
+            .lines_per_tag = 1,
+        },
+        .cache_info.l3_cache = {
+            .type = UNIFIED_CACHE,
+            .level = 3,
+            .size = 8 * MiB,
+            .line_size = 64,
+            .associativity = 16,
+            .partitions = 1,
+            .sets = 8192,
+            .lines_per_tag = 1,
+            .self_init = true,
+            .inclusive = true,
+            .complex_indexing = true,
+        },
     },
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 4/9] i386: Add new property to control cache info
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

This will be used to control the cache information.
By default new information will be displayed. If user
passes "-cpu legacy-cache" then older information will
be displayed even if the hardware supports new information.
It will be "on" for machine type "pc-q35-2.10" for compatibility.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 include/hw/i386/pc.h | 4 ++++
 target/i386/cpu.c    | 1 +
 target/i386/cpu.h    | 5 +++++
 3 files changed, 10 insertions(+)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ffee841..d904a3c 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -327,6 +327,10 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
         .driver   = "q35-pcihost",\
         .property = "x-pci-hole64-fix",\
         .value    = "off",\
+    },{\
+        .driver   = TYPE_X86_CPU,\
+        .property = "legacy-cache",\
+        .value    = "on",\
     },
 
 #define PC_COMPAT_2_9 \
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3b2a19a..1659320 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5129,6 +5129,7 @@ static Property x86_cpu_properties[] = {
                      false),
     DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true),
     DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true),
+    DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, false),
 
     /*
      * From "Requirements for Implementing the Microsoft
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index aff8396..fe8edf6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1394,6 +1394,11 @@ struct X86CPU {
      */
     bool enable_l3_cache;
 
+    /* Compatibility bits for old machine types.
+     * If true present the old cache topology information
+     */
+    bool legacy_cache;
+
     /* Compatibility bits for old machine types: */
     bool enable_cpuid_0xb;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 4/9] i386: Add new property to control cache info
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

This will be used to control the cache information.
By default new information will be displayed. If user
passes "-cpu legacy-cache" then older information will
be displayed even if the hardware supports new information.
It will be "on" for machine type "pc-q35-2.10" for compatibility.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 include/hw/i386/pc.h | 4 ++++
 target/i386/cpu.c    | 1 +
 target/i386/cpu.h    | 5 +++++
 3 files changed, 10 insertions(+)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ffee841..d904a3c 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -327,6 +327,10 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
         .driver   = "q35-pcihost",\
         .property = "x-pci-hole64-fix",\
         .value    = "off",\
+    },{\
+        .driver   = TYPE_X86_CPU,\
+        .property = "legacy-cache",\
+        .value    = "on",\
     },
 
 #define PC_COMPAT_2_9 \
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3b2a19a..1659320 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5129,6 +5129,7 @@ static Property x86_cpu_properties[] = {
                      false),
     DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true),
     DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true),
+    DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, false),
 
     /*
      * From "Requirements for Implementing the Microsoft
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index aff8396..fe8edf6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1394,6 +1394,11 @@ struct X86CPU {
      */
     bool enable_l3_cache;
 
+    /* Compatibility bits for old machine types.
+     * If true present the old cache topology information
+     */
+    bool legacy_cache;
+
     /* Compatibility bits for old machine types: */
     bool enable_cpuid_0xb;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 5/9] i386: Use the statically loaded cache definitions
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

Use the statically loaded cache definitions if available
and legacy-cache parameter is not set.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1659320..76b3bc9 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3934,8 +3934,13 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L1_ITLB_2M_ASSOC <<  8) | (L1_ITLB_2M_ENTRIES);
         *ebx = (L1_DTLB_4K_ASSOC << 24) | (L1_DTLB_4K_ENTRIES << 16) | \
                (L1_ITLB_4K_ASSOC <<  8) | (L1_ITLB_4K_ENTRIES);
-        *ecx = encode_cache_cpuid80000005(&l1d_cache_amd);
-        *edx = encode_cache_cpuid80000005(&l1i_cache_amd);
+        if (env->cache_info.valid && !cpu->legacy_cache) {
+            *ecx = encode_cache_cpuid80000005(&env->cache_info.l1d_cache);
+            *edx = encode_cache_cpuid80000005(&env->cache_info.l1i_cache);
+        } else {
+            *ecx = encode_cache_cpuid80000005(&l1d_cache_amd);
+            *edx = encode_cache_cpuid80000005(&l1i_cache_amd);
+        }
         break;
     case 0x80000006:
         /* cache info (L2 cache) */
@@ -3951,9 +3956,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L2_DTLB_4K_ENTRIES << 16) | \
                (AMD_ENC_ASSOC(L2_ITLB_4K_ASSOC) << 12) | \
                (L2_ITLB_4K_ENTRIES);
-        encode_cache_cpuid80000006(&l2_cache_amd,
-                                   cpu->enable_l3_cache ? &l3_cache : NULL,
-                                   ecx, edx);
+        if (env->cache_info.valid && !cpu->legacy_cache) {
+            encode_cache_cpuid80000006(&env->cache_info.l2_cache,
+                                       cpu->enable_l3_cache ?
+                                       &env->cache_info.l3_cache : NULL,
+                                       ecx, edx);
+        } else {
+            encode_cache_cpuid80000006(&l2_cache_amd,
+                                       cpu->enable_l3_cache ? &l3_cache : NULL,
+                                       ecx, edx);
+        }
         break;
     case 0x80000007:
         *eax = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 5/9] i386: Use the statically loaded cache definitions
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

Use the statically loaded cache definitions if available
and legacy-cache parameter is not set.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1659320..76b3bc9 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3934,8 +3934,13 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L1_ITLB_2M_ASSOC <<  8) | (L1_ITLB_2M_ENTRIES);
         *ebx = (L1_DTLB_4K_ASSOC << 24) | (L1_DTLB_4K_ENTRIES << 16) | \
                (L1_ITLB_4K_ASSOC <<  8) | (L1_ITLB_4K_ENTRIES);
-        *ecx = encode_cache_cpuid80000005(&l1d_cache_amd);
-        *edx = encode_cache_cpuid80000005(&l1i_cache_amd);
+        if (env->cache_info.valid && !cpu->legacy_cache) {
+            *ecx = encode_cache_cpuid80000005(&env->cache_info.l1d_cache);
+            *edx = encode_cache_cpuid80000005(&env->cache_info.l1i_cache);
+        } else {
+            *ecx = encode_cache_cpuid80000005(&l1d_cache_amd);
+            *edx = encode_cache_cpuid80000005(&l1i_cache_amd);
+        }
         break;
     case 0x80000006:
         /* cache info (L2 cache) */
@@ -3951,9 +3956,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                (L2_DTLB_4K_ENTRIES << 16) | \
                (AMD_ENC_ASSOC(L2_ITLB_4K_ASSOC) << 12) | \
                (L2_ITLB_4K_ENTRIES);
-        encode_cache_cpuid80000006(&l2_cache_amd,
-                                   cpu->enable_l3_cache ? &l3_cache : NULL,
-                                   ecx, edx);
+        if (env->cache_info.valid && !cpu->legacy_cache) {
+            encode_cache_cpuid80000006(&env->cache_info.l2_cache,
+                                       cpu->enable_l3_cache ?
+                                       &env->cache_info.l3_cache : NULL,
+                                       ecx, edx);
+        } else {
+            encode_cache_cpuid80000006(&l2_cache_amd,
+                                       cpu->enable_l3_cache ? &l3_cache : NULL,
+                                       ecx, edx);
+        }
         break;
     case 0x80000007:
         *eax = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 6/9] i386: Populate AMD Processor Cache Information for cpuid 0x8000001D
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

Add information for cpuid 0x8000001D leaf. Populate cache topology information
for different cache types(Data Cache, Instruction Cache, L2 and L3) supported
by 0x8000001D leaf. Please refer Processor Programming Reference (PPR) for AMD
Family 17h Model for more details.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm.c | 29 ++++++++++++++++--
 2 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 76b3bc9..63f2d31 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -307,6 +307,14 @@ static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
                           a == ASSOC_FULL ? 0xF : \
                           0 /* invalid value */)
 
+/* Definitions used on CPUID Leaf 0x8000001D */
+/* Number of logical cores in a complex */
+#define CORES_IN_CMPLX  4
+/* Number of logical processors sharing cache */
+#define NUM_SHARING_CACHE(threads)   (threads ? \
+                         (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
+                         (CORES_IN_CMPLX - 1))
+
 /*
  * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
  * @l3 can be NULL.
@@ -336,6 +344,41 @@ static void encode_cache_cpuid80000006(CPUCacheInfo *l2,
     }
 }
 
+/* Encode cache info for CPUID[8000001D] */
+static void encode_cache_cpuid8000001d(CPUCacheInfo *cache, int nr_threads,
+                                uint32_t *eax, uint32_t *ebx,
+                                uint32_t *ecx, uint32_t *edx)
+{
+    assert(cache->size == cache->line_size * cache->associativity *
+                          cache->partitions * cache->sets);
+
+    *eax = CACHE_TYPE(cache->type) | CACHE_LEVEL(cache->level) |
+               (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0);
+
+    /* L3 is shared among multiple cores */
+    if (cache->level == 3) {
+        *eax |= (NUM_SHARING_CACHE(nr_threads - 1) << 14);
+    } else {
+        *eax |= ((nr_threads - 1) << 14);
+    }
+
+    assert(cache->line_size > 0);
+    assert(cache->partitions > 0);
+    assert(cache->associativity > 0);
+    /* We don't implement fully-associative caches */
+    assert(cache->associativity < cache->sets);
+    *ebx = (cache->line_size - 1) |
+           ((cache->partitions - 1) << 12) |
+           ((cache->associativity - 1) << 22);
+
+    assert(cache->sets > 0);
+    *ecx = cache->sets - 1;
+
+    *edx = (cache->no_invd_sharing ? CACHE_NO_INVD_SHARING : 0) |
+           (cache->inclusive ? CACHE_INCLUSIVE : 0) |
+           (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
+}
+
 /* Definitions of the hardcoded cache entries we expose: */
 
 /* L1 data cache: */
@@ -4006,6 +4049,55 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             *edx = 0;
         }
         break;
+    case 0x8000001D:
+        *eax = 0;
+        switch (count) {
+        case 0: /* L1 dcache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l1d_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l1d_cache_amd, cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        case 1: /* L1 icache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l1i_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l1i_cache_amd,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        case 2: /* L2 cache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l2_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l2_cache_amd, cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        case 3: /* L3 cache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l3_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l3_cache, cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        default: /* end of info */
+            *eax = *ebx = *ecx = *edx = 0;
+            break;
+        }
+        break;
     case 0xC0000000:
         *eax = env->cpuid_xlevel2;
         *ebx = 0;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index d23fff1..d170d2a 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -937,9 +937,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
         c = &cpuid_data.entries[cpuid_i++];
 
-        c->function = i;
-        c->flags = 0;
-        cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+        switch (i) {
+        case 0x8000001d:
+            /* Query for all AMD cache information leaves */
+            for (j = 0; ; j++) {
+                c->function = i;
+                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+                c->index = j;
+                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
+
+                if (c->eax == 0) {
+                    break;
+                }
+                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                    fprintf(stderr, "cpuid_data is full, no space for "
+                            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
+                    abort();
+                }
+                c = &cpuid_data.entries[cpuid_i++];
+            }
+            break;
+        default:
+            c->function = i;
+            c->flags = 0;
+            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            break;
+        }
     }
 
     /* Call Centaur's CPUID instructions they are supported. */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 6/9] i386: Populate AMD Processor Cache Information for cpuid 0x8000001D
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

Add information for cpuid 0x8000001D leaf. Populate cache topology information
for different cache types(Data Cache, Instruction Cache, L2 and L3) supported
by 0x8000001D leaf. Please refer Processor Programming Reference (PPR) for AMD
Family 17h Model for more details.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm.c | 29 ++++++++++++++++--
 2 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 76b3bc9..63f2d31 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -307,6 +307,14 @@ static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
                           a == ASSOC_FULL ? 0xF : \
                           0 /* invalid value */)
 
+/* Definitions used on CPUID Leaf 0x8000001D */
+/* Number of logical cores in a complex */
+#define CORES_IN_CMPLX  4
+/* Number of logical processors sharing cache */
+#define NUM_SHARING_CACHE(threads)   (threads ? \
+                         (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
+                         (CORES_IN_CMPLX - 1))
+
 /*
  * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
  * @l3 can be NULL.
@@ -336,6 +344,41 @@ static void encode_cache_cpuid80000006(CPUCacheInfo *l2,
     }
 }
 
+/* Encode cache info for CPUID[8000001D] */
+static void encode_cache_cpuid8000001d(CPUCacheInfo *cache, int nr_threads,
+                                uint32_t *eax, uint32_t *ebx,
+                                uint32_t *ecx, uint32_t *edx)
+{
+    assert(cache->size == cache->line_size * cache->associativity *
+                          cache->partitions * cache->sets);
+
+    *eax = CACHE_TYPE(cache->type) | CACHE_LEVEL(cache->level) |
+               (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0);
+
+    /* L3 is shared among multiple cores */
+    if (cache->level == 3) {
+        *eax |= (NUM_SHARING_CACHE(nr_threads - 1) << 14);
+    } else {
+        *eax |= ((nr_threads - 1) << 14);
+    }
+
+    assert(cache->line_size > 0);
+    assert(cache->partitions > 0);
+    assert(cache->associativity > 0);
+    /* We don't implement fully-associative caches */
+    assert(cache->associativity < cache->sets);
+    *ebx = (cache->line_size - 1) |
+           ((cache->partitions - 1) << 12) |
+           ((cache->associativity - 1) << 22);
+
+    assert(cache->sets > 0);
+    *ecx = cache->sets - 1;
+
+    *edx = (cache->no_invd_sharing ? CACHE_NO_INVD_SHARING : 0) |
+           (cache->inclusive ? CACHE_INCLUSIVE : 0) |
+           (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
+}
+
 /* Definitions of the hardcoded cache entries we expose: */
 
 /* L1 data cache: */
@@ -4006,6 +4049,55 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             *edx = 0;
         }
         break;
+    case 0x8000001D:
+        *eax = 0;
+        switch (count) {
+        case 0: /* L1 dcache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l1d_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l1d_cache_amd, cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        case 1: /* L1 icache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l1i_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l1i_cache_amd,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        case 2: /* L2 cache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l2_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l2_cache_amd, cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        case 3: /* L3 cache info */
+            if (env->cache_info.valid && !cpu->legacy_cache) {
+                encode_cache_cpuid8000001d(&env->cache_info.l3_cache,
+                                           cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            } else {
+                encode_cache_cpuid8000001d(&l3_cache, cs->nr_threads,
+                                           eax, ebx, ecx, edx);
+            }
+            break;
+        default: /* end of info */
+            *eax = *ebx = *ecx = *edx = 0;
+            break;
+        }
+        break;
     case 0xC0000000:
         *eax = env->cpuid_xlevel2;
         *ebx = 0;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index d23fff1..d170d2a 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -937,9 +937,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
         c = &cpuid_data.entries[cpuid_i++];
 
-        c->function = i;
-        c->flags = 0;
-        cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+        switch (i) {
+        case 0x8000001d:
+            /* Query for all AMD cache information leaves */
+            for (j = 0; ; j++) {
+                c->function = i;
+                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+                c->index = j;
+                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
+
+                if (c->eax == 0) {
+                    break;
+                }
+                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                    fprintf(stderr, "cpuid_data is full, no space for "
+                            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
+                    abort();
+                }
+                c = &cpuid_data.entries[cpuid_i++];
+            }
+            break;
+        default:
+            c->function = i;
+            c->flags = 0;
+            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            break;
+        }
     }
 
     /* Call Centaur's CPUID instructions they are supported. */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 7/9] i386: Add support for CPUID_8000_001E for AMD
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

Populate threads/core_id/apic_ids/socket_id when CPUID_EXT3_TOPOEXT
feature is supported. This is required to support hyperthreading feature
on AMD CPUs. This is supported via CPUID_8000_001E extended functions.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 63f2d31..24b39c6 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -315,6 +315,12 @@ static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
                          (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
                          (CORES_IN_CMPLX - 1))
 
+/* Definitions used on CPUID Leaf 0x8000001E */
+#define EXTENDED_APIC_ID(threads, socket_id, core_id, thread_id) \
+                        ((threads) ? \
+                         ((socket_id << 6) | (core_id << 1) | thread_id) : \
+                         ((socket_id << 6) | core_id))
+
 /*
  * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
  * @l3 can be NULL.
@@ -4098,6 +4104,14 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             break;
         }
         break;
+    case 0x8000001E:
+        assert(cpu->core_id <= 255);
+        *eax = EXTENDED_APIC_ID((cs->nr_threads - 1),
+               cpu->socket_id, cpu->core_id, cpu->thread_id);
+        *ebx = (cs->nr_threads - 1) << 8 | cpu->core_id;
+        *ecx = cpu->socket_id;
+        *edx = 0;
+        break;
     case 0xC0000000:
         *eax = env->cpuid_xlevel2;
         *ebx = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 7/9] i386: Add support for CPUID_8000_001E for AMD
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

Populate threads/core_id/apic_ids/socket_id when CPUID_EXT3_TOPOEXT
feature is supported. This is required to support hyperthreading feature
on AMD CPUs. This is supported via CPUID_8000_001E extended functions.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 63f2d31..24b39c6 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -315,6 +315,12 @@ static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
                          (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
                          (CORES_IN_CMPLX - 1))
 
+/* Definitions used on CPUID Leaf 0x8000001E */
+#define EXTENDED_APIC_ID(threads, socket_id, core_id, thread_id) \
+                        ((threads) ? \
+                         ((socket_id << 6) | (core_id << 1) | thread_id) : \
+                         ((socket_id << 6) | core_id))
+
 /*
  * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
  * @l3 can be NULL.
@@ -4098,6 +4104,14 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             break;
         }
         break;
+    case 0x8000001E:
+        assert(cpu->core_id <= 255);
+        *eax = EXTENDED_APIC_ID((cs->nr_threads - 1),
+               cpu->socket_id, cpu->core_id, cpu->thread_id);
+        *ebx = (cs->nr_threads - 1) << 8 | cpu->core_id;
+        *ecx = cpu->socket_id;
+        *edx = 0;
+        break;
     case 0xC0000000:
         *eax = env->cpuid_xlevel2;
         *ebx = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 8/9] i386: Enable TOPOEXT feature on AMD EPYC CPU
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

Enable TOPOEXT feature on EPYC CPU. This is required to support
hyperthreading on VM guests. Also extend xlevel to 0x8000001E.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 24b39c6..bfe24b9 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2327,7 +2327,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
         .features[FEAT_8000_0001_ECX] =
             CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
             CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+            CPUID_EXT3_TOPOEXT,
         .features[FEAT_7_0_EBX] =
             CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
             CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
@@ -2419,7 +2420,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
         .features[FEAT_8000_0001_ECX] =
             CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
             CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+            CPUID_EXT3_TOPOEXT,
         .features[FEAT_8000_0008_EBX] =
             CPUID_8000_0008_EBX_IBPB,
         .features[FEAT_7_0_EBX] =
@@ -4572,6 +4574,11 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
             x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x8000000A);
         }
 
+        /* TOPOEXT feature requires 0x8000001E */
+        if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) {
+            x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x8000001E);
+        }
+
         /* SEV requires CPUID[0x8000001F] */
         if (sev_enabled()) {
             x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x8000001F);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 8/9] i386: Enable TOPOEXT feature on AMD EPYC CPU
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

Enable TOPOEXT feature on EPYC CPU. This is required to support
hyperthreading on VM guests. Also extend xlevel to 0x8000001E.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 24b39c6..bfe24b9 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2327,7 +2327,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
         .features[FEAT_8000_0001_ECX] =
             CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
             CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+            CPUID_EXT3_TOPOEXT,
         .features[FEAT_7_0_EBX] =
             CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
             CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
@@ -2419,7 +2420,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
         .features[FEAT_8000_0001_ECX] =
             CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
             CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+            CPUID_EXT3_TOPOEXT,
         .features[FEAT_8000_0008_EBX] =
             CPUID_8000_0008_EBX_IBPB,
         .features[FEAT_7_0_EBX] =
@@ -4572,6 +4574,11 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
             x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x8000000A);
         }
 
+        /* TOPOEXT feature requires 0x8000001E */
+        if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) {
+            x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x8000001E);
+        }
+
         /* SEV requires CPUID[0x8000001F] */
         if (sev_enabled()) {
             x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x8000001F);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 9/9] i386: Remove generic SMT thread check
  2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
@ 2018-04-10 23:16   ` Babu Moger
  -1 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: geoff, babu.moger, kash, qemu-devel, kvm

Remove generic non-intel check while validating hyperthreading support.
Certain AMD CPUs can support hyperthreading now.

CPU family with TOPOEXT feature can support hyperthreading now.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bfe24b9..077364a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4822,17 +4822,20 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 
     qemu_init_vcpu(cs);
 
-    /* Only Intel CPUs support hyperthreading. Even though QEMU fixes this
-     * issue by adjusting CPUID_0000_0001_EBX and CPUID_8000_0008_ECX
-     * based on inputs (sockets,cores,threads), it is still better to gives
+    /* Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU
+     * fixes this issue by adjusting CPUID_0000_0001_EBX and CPUID_8000_0008_ECX
+     * based on inputs (sockets,cores,threads), it is still better to give
      * users a warning.
      *
      * NOTE: the following code has to follow qemu_init_vcpu(). Otherwise
      * cs->nr_threads hasn't be populated yet and the checking is incorrect.
      */
-    if (!IS_INTEL_CPU(env) && cs->nr_threads > 1 && !ht_warned) {
-        error_report("AMD CPU doesn't support hyperthreading. Please configure"
-                     " -smp options properly.");
+     if (IS_AMD_CPU(env) &&
+         !(env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) &&
+         cs->nr_threads > 1 && !ht_warned) {
+            error_report("This family of AMD CPU doesn't support "
+                         "hyperthreading. Please configure -smp "
+                         "options properly.");
         ht_warned = true;
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v6 9/9] i386: Remove generic SMT thread check
@ 2018-04-10 23:16   ` Babu Moger
  0 siblings, 0 replies; 24+ messages in thread
From: Babu Moger @ 2018-04-10 23:16 UTC (permalink / raw)
  To: mst, marcel, pbonzini, rth, ehabkost, mtosatti
  Cc: qemu-devel, kvm, kash, geoff, babu.moger

Remove generic non-intel check while validating hyperthreading support.
Certain AMD CPUs can support hyperthreading now.

CPU family with TOPOEXT feature can support hyperthreading now.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
---
 target/i386/cpu.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bfe24b9..077364a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4822,17 +4822,20 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 
     qemu_init_vcpu(cs);
 
-    /* Only Intel CPUs support hyperthreading. Even though QEMU fixes this
-     * issue by adjusting CPUID_0000_0001_EBX and CPUID_8000_0008_ECX
-     * based on inputs (sockets,cores,threads), it is still better to gives
+    /* Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU
+     * fixes this issue by adjusting CPUID_0000_0001_EBX and CPUID_8000_0008_ECX
+     * based on inputs (sockets,cores,threads), it is still better to give
      * users a warning.
      *
      * NOTE: the following code has to follow qemu_init_vcpu(). Otherwise
      * cs->nr_threads hasn't be populated yet and the checking is incorrect.
      */
-    if (!IS_INTEL_CPU(env) && cs->nr_threads > 1 && !ht_warned) {
-        error_report("AMD CPU doesn't support hyperthreading. Please configure"
-                     " -smp options properly.");
+     if (IS_AMD_CPU(env) &&
+         !(env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) &&
+         cs->nr_threads > 1 && !ht_warned) {
+            error_report("This family of AMD CPU doesn't support "
+                         "hyperthreading. Please configure -smp "
+                         "options properly.");
         ht_warned = true;
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 7/9] i386: Add support for CPUID_8000_001E for AMD
  2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
@ 2018-05-14 21:12     ` Eduardo Habkost
  -1 siblings, 0 replies; 24+ messages in thread
From: Eduardo Habkost @ 2018-05-14 21:12 UTC (permalink / raw)
  To: Babu Moger
  Cc: geoff, kvm, mst, kash, mtosatti, qemu-devel, marcel, pbonzini, rth

On Tue, Apr 10, 2018 at 07:16:07PM -0400, Babu Moger wrote:
> Populate threads/core_id/apic_ids/socket_id when CPUID_EXT3_TOPOEXT
> feature is supported. This is required to support hyperthreading feature
> on AMD CPUs. This is supported via CPUID_8000_001E extended functions.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Geoffrey McRae <geoff@hostfission.com>
> ---
>  target/i386/cpu.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 63f2d31..24b39c6 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -315,6 +315,12 @@ static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
>                           (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
>                           (CORES_IN_CMPLX - 1))
>  
> +/* Definitions used on CPUID Leaf 0x8000001E */
> +#define EXTENDED_APIC_ID(threads, socket_id, core_id, thread_id) \
> +                        ((threads) ? \
> +                         ((socket_id << 6) | (core_id << 1) | thread_id) : \
> +                         ((socket_id << 6) | core_id))

Using the AMD64 Architecture Programmer's Manual Volume 3 as
reference below.

The formula above assumes MNC = (2 ^ 6).

The current code in QEMU sets ApicIdCoreIdSize
(CPUID[0x80000008].ECX[bits 15:12]) to 0, which means MNC is
calculated using the legacy method:
  MNC = CPUID Fn8000_0008_ECX[NC] + 1.

The current code sets CPUID Fn8000_0008_ECX[NC] to:
  (cs->nr_cores * cs->nr_threads) - 1.

This means we cannot hardcode MNC, and must calculate it based on
nr_cores and nr_threads.

Probably it's a good idea to fill ApicIdCoreIdSize if TOPOEXT is
set, so we will know MNC will always be a power of 2 and the
formula here will be compatible with the existing APIC ID
calculation logic.

Note that we already have a comment at the top of topology.h:

 * This code should be compatible with AMD's "Extended Method" described at:
 *   AMD CPUID Specification (Publication #25481)
 *   Section 3: Multiple Core Calcuation
 * as long as:
 *  nr_threads is set to 1;
 *  OFFSET_IDX is assumed to be 0;
 *  CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to apicid_core_width().

So you can already use cpu->apic_id here as long as
ApicIdCoreSize is set to apicid_core_width().  Probably
compatibility with AMD methods will be kept even if
nr_threads > 1, but I didn't confirm that.

Actually, I strongly advise you use cpu->apic_id here.  Otherwise
the Extended APIC ID seen by the guest here won't match the APIC
ID used everywhere else inside QEMU and KVM code.


> +
>  /*
>   * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
>   * @l3 can be NULL.
> @@ -4098,6 +4104,14 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>              break;
>          }
>          break;
> +    case 0x8000001E:
> +        assert(cpu->core_id <= 255);
> +        *eax = EXTENDED_APIC_ID((cs->nr_threads - 1),
> +               cpu->socket_id, cpu->core_id, cpu->thread_id);
> +        *ebx = (cs->nr_threads - 1) << 8 | cpu->core_id;
> +        *ecx = cpu->socket_id;

Terminology is confusing here, so let's confirm this is really
what we want to do:
* ThreadsPerComputeUnit is set to nr_threads-1.  Looks correct.
* ComputeUnitId is being set to core_id.  Is this really what we
  want?
* NodesPerProcessor is being set to 0 (meaning 1 node per
  processor)
* NodeId is being set to socket_id.  Is this really right,
  considering that NodesPerProcessor is being set to 0?

If this is really what you intend to do, I'd like to see it
better documented, to avoid confusion in the future.

We could either add wrapper functions that make the logic more
explicit (e.g. x86_cpu_node_id(), x86_cpu_nodes_per_processor(),
etc.) or add comments explaining how the QEMU socket/core/thread
abstractions are being mapped to the AMD
socket/node/compute-unit/thread abstractions (also, how a
"Logical CCX L3 complex" is mapped into the QEMU abstractions,
here?)


> +        *edx = 0;
> +        break;
>      case 0xC0000000:
>          *eax = env->cpuid_xlevel2;
>          *ebx = 0;
> -- 
> 1.8.3.1
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v6 7/9] i386: Add support for CPUID_8000_001E for AMD
@ 2018-05-14 21:12     ` Eduardo Habkost
  0 siblings, 0 replies; 24+ messages in thread
From: Eduardo Habkost @ 2018-05-14 21:12 UTC (permalink / raw)
  To: Babu Moger
  Cc: mst, marcel, pbonzini, rth, mtosatti, geoff, kash, qemu-devel, kvm

On Tue, Apr 10, 2018 at 07:16:07PM -0400, Babu Moger wrote:
> Populate threads/core_id/apic_ids/socket_id when CPUID_EXT3_TOPOEXT
> feature is supported. This is required to support hyperthreading feature
> on AMD CPUs. This is supported via CPUID_8000_001E extended functions.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Geoffrey McRae <geoff@hostfission.com>
> ---
>  target/i386/cpu.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 63f2d31..24b39c6 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -315,6 +315,12 @@ static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
>                           (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
>                           (CORES_IN_CMPLX - 1))
>  
> +/* Definitions used on CPUID Leaf 0x8000001E */
> +#define EXTENDED_APIC_ID(threads, socket_id, core_id, thread_id) \
> +                        ((threads) ? \
> +                         ((socket_id << 6) | (core_id << 1) | thread_id) : \
> +                         ((socket_id << 6) | core_id))

Using the AMD64 Architecture Programmer's Manual Volume 3 as
reference below.

The formula above assumes MNC = (2 ^ 6).

The current code in QEMU sets ApicIdCoreIdSize
(CPUID[0x80000008].ECX[bits 15:12]) to 0, which means MNC is
calculated using the legacy method:
  MNC = CPUID Fn8000_0008_ECX[NC] + 1.

The current code sets CPUID Fn8000_0008_ECX[NC] to:
  (cs->nr_cores * cs->nr_threads) - 1.

This means we cannot hardcode MNC, and must calculate it based on
nr_cores and nr_threads.

Probably it's a good idea to fill ApicIdCoreIdSize if TOPOEXT is
set, so we will know MNC will always be a power of 2 and the
formula here will be compatible with the existing APIC ID
calculation logic.

Note that we already have a comment at the top of topology.h:

 * This code should be compatible with AMD's "Extended Method" described at:
 *   AMD CPUID Specification (Publication #25481)
 *   Section 3: Multiple Core Calcuation
 * as long as:
 *  nr_threads is set to 1;
 *  OFFSET_IDX is assumed to be 0;
 *  CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to apicid_core_width().

So you can already use cpu->apic_id here as long as
ApicIdCoreSize is set to apicid_core_width().  Probably
compatibility with AMD methods will be kept even if
nr_threads > 1, but I didn't confirm that.

Actually, I strongly advise you use cpu->apic_id here.  Otherwise
the Extended APIC ID seen by the guest here won't match the APIC
ID used everywhere else inside QEMU and KVM code.


> +
>  /*
>   * Encode cache info for CPUID[0x80000006].ECX and CPUID[0x80000006].EDX
>   * @l3 can be NULL.
> @@ -4098,6 +4104,14 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>              break;
>          }
>          break;
> +    case 0x8000001E:
> +        assert(cpu->core_id <= 255);
> +        *eax = EXTENDED_APIC_ID((cs->nr_threads - 1),
> +               cpu->socket_id, cpu->core_id, cpu->thread_id);
> +        *ebx = (cs->nr_threads - 1) << 8 | cpu->core_id;
> +        *ecx = cpu->socket_id;

Terminology is confusing here, so let's confirm this is really
what we want to do:
* ThreadsPerComputeUnit is set to nr_threads-1.  Looks correct.
* ComputeUnitId is being set to core_id.  Is this really what we
  want?
* NodesPerProcessor is being set to 0 (meaning 1 node per
  processor)
* NodeId is being set to socket_id.  Is this really right,
  considering that NodesPerProcessor is being set to 0?

If this is really what you intend to do, I'd like to see it
better documented, to avoid confusion in the future.

We could either add wrapper functions that make the logic more
explicit (e.g. x86_cpu_node_id(), x86_cpu_nodes_per_processor(),
etc.) or add comments explaining how the QEMU socket/core/thread
abstractions are being mapped to the AMD
socket/node/compute-unit/thread abstractions (also, how a
"Logical CCX L3 complex" is mapped into the QEMU abstractions,
here?)


> +        *edx = 0;
> +        break;
>      case 0xC0000000:
>          *eax = env->cpuid_xlevel2;
>          *ebx = 0;
> -- 
> 1.8.3.1
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 7/9] i386: Add support for CPUID_8000_001E for AMD
  2018-05-14 21:12     ` [Qemu-devel] " Eduardo Habkost
@ 2018-05-14 23:49       ` Moger, Babu
  -1 siblings, 0 replies; 24+ messages in thread
From: Moger, Babu @ 2018-05-14 23:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: geoff, kvm, mst, kash, mtosatti, qemu-devel, marcel, pbonzini, rth



> -----Original Message-----
> From: Eduardo Habkost [mailto:ehabkost@redhat.com]
> Sent: Monday, May 14, 2018 4:12 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: mst@redhat.com; marcel@redhat.com; pbonzini@redhat.com;
> rth@twiddle.net; mtosatti@redhat.com; geoff@hostfission.com;
> kash@tripleback.net; qemu-devel@nongnu.org; kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [PATCH v6 7/9] i386: Add support for
> CPUID_8000_001E for AMD
> 
> On Tue, Apr 10, 2018 at 07:16:07PM -0400, Babu Moger wrote:
> > Populate threads/core_id/apic_ids/socket_id when CPUID_EXT3_TOPOEXT
> > feature is supported. This is required to support hyperthreading feature
> > on AMD CPUs. This is supported via CPUID_8000_001E extended functions.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Geoffrey McRae <geoff@hostfission.com>
> > ---
> >  target/i386/cpu.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 63f2d31..24b39c6 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -315,6 +315,12 @@ static uint32_t
> encode_cache_cpuid80000005(CPUCacheInfo *cache)
> >                           (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
> >                           (CORES_IN_CMPLX - 1))
> >
> > +/* Definitions used on CPUID Leaf 0x8000001E */
> > +#define EXTENDED_APIC_ID(threads, socket_id, core_id, thread_id) \
> > +                        ((threads) ? \
> > +                         ((socket_id << 6) | (core_id << 1) | thread_id) : \
> > +                         ((socket_id << 6) | core_id))
> 
> Using the AMD64 Architecture Programmer's Manual Volume 3 as
> reference below.
> 
> The formula above assumes MNC = (2 ^ 6).
> 
> The current code in QEMU sets ApicIdCoreIdSize
> (CPUID[0x80000008].ECX[bits 15:12]) to 0, which means MNC is
> calculated using the legacy method:
>   MNC = CPUID Fn8000_0008_ECX[NC] + 1.
> 
> The current code sets CPUID Fn8000_0008_ECX[NC] to:
>   (cs->nr_cores * cs->nr_threads) - 1.
> 
> This means we cannot hardcode MNC, and must calculate it based on
> nr_cores and nr_threads.
> 
> Probably it's a good idea to fill ApicIdCoreIdSize if TOPOEXT is
> set, so we will know MNC will always be a power of 2 and the
> formula here will be compatible with the existing APIC ID
> calculation logic.
> 
> Note that we already have a comment at the top of topology.h:
> 
>  * This code should be compatible with AMD's "Extended Method" described
> at:
>  *   AMD CPUID Specification (Publication #25481)
>  *   Section 3: Multiple Core Calcuation
>  * as long as:
>  *  nr_threads is set to 1;
>  *  OFFSET_IDX is assumed to be 0;
>  *  CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to
> apicid_core_width().
> 
> So you can already use cpu->apic_id here as long as
> ApicIdCoreSize is set to apicid_core_width().  Probably
> compatibility with AMD methods will be kept even if
> nr_threads > 1, but I didn't confirm that.
> 
> Actually, I strongly advise you use cpu->apic_id here.  Otherwise

Yes. We should be able to use cpu->apic_id here. Thanks for pointing this out.
Will run more tests to confirm. Thanks 

> the Extended APIC ID seen by the guest here won't match the APIC
> ID used everywhere else inside QEMU and KVM code.
> 
> 
> > +
> >  /*
> >   * Encode cache info for CPUID[0x80000006].ECX and
> CPUID[0x80000006].EDX
> >   * @l3 can be NULL.
> > @@ -4098,6 +4104,14 @@ void cpu_x86_cpuid(CPUX86State *env,
> uint32_t index, uint32_t count,
> >              break;
> >          }
> >          break;
> > +    case 0x8000001E:
> > +        assert(cpu->core_id <= 255);
> > +        *eax = EXTENDED_APIC_ID((cs->nr_threads - 1),
> > +               cpu->socket_id, cpu->core_id, cpu->thread_id);
> > +        *ebx = (cs->nr_threads - 1) << 8 | cpu->core_id;
> > +        *ecx = cpu->socket_id;
> 
> Terminology is confusing here, so let's confirm this is really
> what we want to do:
> * ThreadsPerComputeUnit is set to nr_threads-1.  Looks correct.
> * ComputeUnitId is being set to core_id.  Is this really what we
>   want?
> * NodesPerProcessor is being set to 0 (meaning 1 node per
>   processor)
> * NodeId is being set to socket_id.  Is this really right,
>   considering that NodesPerProcessor is being set to 0?

Yes. You are right. 
ComputeUnitId, NodesPerProcessor and core_id is not set correctly. 
Let me study little bit and ask around here.  Will respond again.

> 
> If this is really what you intend to do, I'd like to see it
> better documented, to avoid confusion in the future.
> 
> We could either add wrapper functions that make the logic more
> explicit (e.g. x86_cpu_node_id(), x86_cpu_nodes_per_processor(),
> etc.) or add comments explaining how the QEMU socket/core/thread
> abstractions are being mapped to the AMD
> socket/node/compute-unit/thread abstractions (also, how a
> "Logical CCX L3 complex" is mapped into the QEMU abstractions,
> here?)
> 
> 
> > +        *edx = 0;
> > +        break;
> >      case 0xC0000000:
> >          *eax = env->cpuid_xlevel2;
> >          *ebx = 0;
> > --
> > 1.8.3.1
> >
> >
> 
> --
> Eduardo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v6 7/9] i386: Add support for CPUID_8000_001E for AMD
@ 2018-05-14 23:49       ` Moger, Babu
  0 siblings, 0 replies; 24+ messages in thread
From: Moger, Babu @ 2018-05-14 23:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: mst, marcel, pbonzini, rth, mtosatti, geoff, kash, qemu-devel, kvm



> -----Original Message-----
> From: Eduardo Habkost [mailto:ehabkost@redhat.com]
> Sent: Monday, May 14, 2018 4:12 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: mst@redhat.com; marcel@redhat.com; pbonzini@redhat.com;
> rth@twiddle.net; mtosatti@redhat.com; geoff@hostfission.com;
> kash@tripleback.net; qemu-devel@nongnu.org; kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [PATCH v6 7/9] i386: Add support for
> CPUID_8000_001E for AMD
> 
> On Tue, Apr 10, 2018 at 07:16:07PM -0400, Babu Moger wrote:
> > Populate threads/core_id/apic_ids/socket_id when CPUID_EXT3_TOPOEXT
> > feature is supported. This is required to support hyperthreading feature
> > on AMD CPUs. This is supported via CPUID_8000_001E extended functions.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Geoffrey McRae <geoff@hostfission.com>
> > ---
> >  target/i386/cpu.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 63f2d31..24b39c6 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -315,6 +315,12 @@ static uint32_t
> encode_cache_cpuid80000005(CPUCacheInfo *cache)
> >                           (((CORES_IN_CMPLX - 1) * 2) + 1)  : \
> >                           (CORES_IN_CMPLX - 1))
> >
> > +/* Definitions used on CPUID Leaf 0x8000001E */
> > +#define EXTENDED_APIC_ID(threads, socket_id, core_id, thread_id) \
> > +                        ((threads) ? \
> > +                         ((socket_id << 6) | (core_id << 1) | thread_id) : \
> > +                         ((socket_id << 6) | core_id))
> 
> Using the AMD64 Architecture Programmer's Manual Volume 3 as
> reference below.
> 
> The formula above assumes MNC = (2 ^ 6).
> 
> The current code in QEMU sets ApicIdCoreIdSize
> (CPUID[0x80000008].ECX[bits 15:12]) to 0, which means MNC is
> calculated using the legacy method:
>   MNC = CPUID Fn8000_0008_ECX[NC] + 1.
> 
> The current code sets CPUID Fn8000_0008_ECX[NC] to:
>   (cs->nr_cores * cs->nr_threads) - 1.
> 
> This means we cannot hardcode MNC, and must calculate it based on
> nr_cores and nr_threads.
> 
> Probably it's a good idea to fill ApicIdCoreIdSize if TOPOEXT is
> set, so we will know MNC will always be a power of 2 and the
> formula here will be compatible with the existing APIC ID
> calculation logic.
> 
> Note that we already have a comment at the top of topology.h:
> 
>  * This code should be compatible with AMD's "Extended Method" described
> at:
>  *   AMD CPUID Specification (Publication #25481)
>  *   Section 3: Multiple Core Calcuation
>  * as long as:
>  *  nr_threads is set to 1;
>  *  OFFSET_IDX is assumed to be 0;
>  *  CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to
> apicid_core_width().
> 
> So you can already use cpu->apic_id here as long as
> ApicIdCoreSize is set to apicid_core_width().  Probably
> compatibility with AMD methods will be kept even if
> nr_threads > 1, but I didn't confirm that.
> 
> Actually, I strongly advise you use cpu->apic_id here.  Otherwise

Yes. We should be able to use cpu->apic_id here. Thanks for pointing this out.
Will run more tests to confirm. Thanks 

> the Extended APIC ID seen by the guest here won't match the APIC
> ID used everywhere else inside QEMU and KVM code.
> 
> 
> > +
> >  /*
> >   * Encode cache info for CPUID[0x80000006].ECX and
> CPUID[0x80000006].EDX
> >   * @l3 can be NULL.
> > @@ -4098,6 +4104,14 @@ void cpu_x86_cpuid(CPUX86State *env,
> uint32_t index, uint32_t count,
> >              break;
> >          }
> >          break;
> > +    case 0x8000001E:
> > +        assert(cpu->core_id <= 255);
> > +        *eax = EXTENDED_APIC_ID((cs->nr_threads - 1),
> > +               cpu->socket_id, cpu->core_id, cpu->thread_id);
> > +        *ebx = (cs->nr_threads - 1) << 8 | cpu->core_id;
> > +        *ecx = cpu->socket_id;
> 
> Terminology is confusing here, so let's confirm this is really
> what we want to do:
> * ThreadsPerComputeUnit is set to nr_threads-1.  Looks correct.
> * ComputeUnitId is being set to core_id.  Is this really what we
>   want?
> * NodesPerProcessor is being set to 0 (meaning 1 node per
>   processor)
> * NodeId is being set to socket_id.  Is this really right,
>   considering that NodesPerProcessor is being set to 0?

Yes. You are right. 
ComputeUnitId, NodesPerProcessor and core_id is not set correctly. 
Let me study little bit and ask around here.  Will respond again.

> 
> If this is really what you intend to do, I'd like to see it
> better documented, to avoid confusion in the future.
> 
> We could either add wrapper functions that make the logic more
> explicit (e.g. x86_cpu_node_id(), x86_cpu_nodes_per_processor(),
> etc.) or add comments explaining how the QEMU socket/core/thread
> abstractions are being mapped to the AMD
> socket/node/compute-unit/thread abstractions (also, how a
> "Logical CCX L3 complex" is mapped into the QEMU abstractions,
> here?)
> 
> 
> > +        *edx = 0;
> > +        break;
> >      case 0xC0000000:
> >          *eax = env->cpuid_xlevel2;
> >          *ebx = 0;
> > --
> > 1.8.3.1
> >
> >
> 
> --
> Eduardo

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-14 23:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10 23:16 [PATCH v6 0/9] i386: Enable TOPOEXT to support hyperthreading on AMD CPU Babu Moger
2018-04-10 23:16 ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 1/9] i386: Helpers to encode cache information consistently Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 2/9] i386: Add cache information in X86CPUDefinition Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 3/9] i386: Initialize cache information for EPYC family processors Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 4/9] i386: Add new property to control cache info Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 5/9] i386: Use the statically loaded cache definitions Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 6/9] i386: Populate AMD Processor Cache Information for cpuid 0x8000001D Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 7/9] i386: Add support for CPUID_8000_001E for AMD Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-05-14 21:12   ` Eduardo Habkost
2018-05-14 21:12     ` [Qemu-devel] " Eduardo Habkost
2018-05-14 23:49     ` Moger, Babu
2018-05-14 23:49       ` [Qemu-devel] " Moger, Babu
2018-04-10 23:16 ` [PATCH v6 8/9] i386: Enable TOPOEXT feature on AMD EPYC CPU Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger
2018-04-10 23:16 ` [PATCH v6 9/9] i386: Remove generic SMT thread check Babu Moger
2018-04-10 23:16   ` [Qemu-devel] " Babu Moger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.