All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/AMD: also determine L3 cache size
@ 2021-04-16 13:20 Jan Beulich
  2021-04-16 14:21 ` Andrew Cooper
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-04-16 13:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

For Intel CPUs we record L3 cache size, hence we should also do so for
AMD and alike.

While making these additions, also make sure (throughout the function)
that we don't needlessly overwrite prior values when the new value to be
stored is zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I have to admit though that I'm not convinced the sole real use of the
field (in flush_area_local()) is a good one - flushing an entire L3's
worth of lines via CLFLUSH may not be more efficient than using WBINVD.
But I didn't measure it (yet).

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -240,28 +240,41 @@ int get_model_name(struct cpuinfo_x86 *c
 
 void display_cacheinfo(struct cpuinfo_x86 *c)
 {
-	unsigned int dummy, ecx, edx, l2size;
+	unsigned int dummy, ecx, edx, size;
 
 	if (c->extended_cpuid_level >= 0x80000005) {
 		cpuid(0x80000005, &dummy, &dummy, &ecx, &edx);
-		if (opt_cpu_info)
-			printk("CPU: L1 I cache %dK (%d bytes/line),"
-			              " D cache %dK (%d bytes/line)\n",
-			       edx>>24, edx&0xFF, ecx>>24, ecx&0xFF);
-		c->x86_cache_size=(ecx>>24)+(edx>>24);	
+		if ((edx | ecx) >> 24) {
+			if (opt_cpu_info)
+				printk("CPU: L1 I cache %uK (%u bytes/line),"
+				              " D cache %uK (%u bytes/line)\n",
+				       edx >> 24, edx & 0xFF, ecx >> 24, ecx & 0xFF);
+			c->x86_cache_size = (ecx >> 24) + (edx >> 24);
+		}
 	}
 
 	if (c->extended_cpuid_level < 0x80000006)	/* Some chips just has a large L1. */
 		return;
 
-	ecx = cpuid_ecx(0x80000006);
-	l2size = ecx >> 16;
-	
-	c->x86_cache_size = l2size;
-
-	if (opt_cpu_info)
-		printk("CPU: L2 Cache: %dK (%d bytes/line)\n",
-		       l2size, ecx & 0xFF);
+	cpuid(0x80000006, &dummy, &dummy, &ecx, &edx);
+
+	size = ecx >> 16;
+	if (size) {
+		c->x86_cache_size = size;
+
+		if (opt_cpu_info)
+			printk("CPU: L2 Cache: %uK (%u bytes/line)\n",
+			       size, ecx & 0xFF);
+	}
+
+	size = edx >> 18;
+	if (size) {
+		c->x86_cache_size = size * 512;
+
+		if (opt_cpu_info)
+			printk("CPU: L3 Cache: %uM (%u bytes/line)\n",
+			       (size + (size & 1)) >> 1, edx & 0xFF);
+	}
 }
 
 static inline u32 _phys_pkg_id(u32 cpuid_apic, int index_msb)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/AMD: also determine L3 cache size
  2021-04-16 13:20 [PATCH] x86/AMD: also determine L3 cache size Jan Beulich
@ 2021-04-16 14:21 ` Andrew Cooper
  2021-04-16 14:27   ` Jan Beulich
  2021-04-29  9:21   ` Jan Beulich
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Cooper @ 2021-04-16 14:21 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 16/04/2021 14:20, Jan Beulich wrote:
> For Intel CPUs we record L3 cache size, hence we should also do so for
> AMD and alike.
>
> While making these additions, also make sure (throughout the function)
> that we don't needlessly overwrite prior values when the new value to be
> stored is zero.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I have to admit though that I'm not convinced the sole real use of the
> field (in flush_area_local()) is a good one - flushing an entire L3's
> worth of lines via CLFLUSH may not be more efficient than using WBINVD.
> But I didn't measure it (yet).

WBINVD always needs a broadcast IPI to work correctly.

CLFLUSH and friends let you do this from a single CPU, using cache
coherency to DTRT with the line, wherever it is.


Looking at that logic in flush_area_local(), I don't see how it can be
correct.  The WBINVD path is a decomposition inside the IPI, but in the
higher level helpers, I don't see how the "area too big, convert to
WBINVD" can be safe.

All users of FLUSH_CACHE are flush_all(), except two PCI
Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be
safe, while vmx_do_resume() has very dubious reasoning, and is dead code
I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting.

~Andrew



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/AMD: also determine L3 cache size
  2021-04-16 14:21 ` Andrew Cooper
@ 2021-04-16 14:27   ` Jan Beulich
  2021-04-29  9:21   ` Jan Beulich
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2021-04-16 14:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 16.04.2021 16:21, Andrew Cooper wrote:
> On 16/04/2021 14:20, Jan Beulich wrote:
>> For Intel CPUs we record L3 cache size, hence we should also do so for
>> AMD and alike.
>>
>> While making these additions, also make sure (throughout the function)
>> that we don't needlessly overwrite prior values when the new value to be
>> stored is zero.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> I have to admit though that I'm not convinced the sole real use of the
>> field (in flush_area_local()) is a good one - flushing an entire L3's
>> worth of lines via CLFLUSH may not be more efficient than using WBINVD.
>> But I didn't measure it (yet).
> 
> WBINVD always needs a broadcast IPI to work correctly.
> 
> CLFLUSH and friends let you do this from a single CPU, using cache
> coherency to DTRT with the line, wherever it is.
> 
> 
> Looking at that logic in flush_area_local(), I don't see how it can be
> correct.  The WBINVD path is a decomposition inside the IPI, but in the
> higher level helpers, I don't see how the "area too big, convert to
> WBINVD" can be safe.

Would you mind giving an example? I'm struggling to understand what
exactly you mean to point out.

Jan

> All users of FLUSH_CACHE are flush_all(), except two PCI
> Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be
> safe, while vmx_do_resume() has very dubious reasoning, and is dead code
> I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting.
> 
> ~Andrew
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/AMD: also determine L3 cache size
  2021-04-16 14:21 ` Andrew Cooper
  2021-04-16 14:27   ` Jan Beulich
@ 2021-04-29  9:21   ` Jan Beulich
  2021-05-07  8:25     ` Ping: " Jan Beulich
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-04-29  9:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 16.04.2021 16:21, Andrew Cooper wrote:
> On 16/04/2021 14:20, Jan Beulich wrote:
>> For Intel CPUs we record L3 cache size, hence we should also do so for
>> AMD and alike.
>>
>> While making these additions, also make sure (throughout the function)
>> that we don't needlessly overwrite prior values when the new value to be
>> stored is zero.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> I have to admit though that I'm not convinced the sole real use of the
>> field (in flush_area_local()) is a good one - flushing an entire L3's
>> worth of lines via CLFLUSH may not be more efficient than using WBINVD.
>> But I didn't measure it (yet).
> 
> WBINVD always needs a broadcast IPI to work correctly.
> 
> CLFLUSH and friends let you do this from a single CPU, using cache
> coherency to DTRT with the line, wherever it is.
> 
> 
> Looking at that logic in flush_area_local(), I don't see how it can be
> correct.  The WBINVD path is a decomposition inside the IPI, but in the
> higher level helpers, I don't see how the "area too big, convert to
> WBINVD" can be safe.
> 
> All users of FLUSH_CACHE are flush_all(), except two PCI
> Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be
> safe, while vmx_do_resume() has very dubious reasoning, and is dead code
> I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting.

Besides my prior question on your reply, may I also ask what all of
this means for the patch itself? After all you've been replying to
the post-commit-message remark only so far.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Ping: [PATCH] x86/AMD: also determine L3 cache size
  2021-04-29  9:21   ` Jan Beulich
@ 2021-05-07  8:25     ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2021-05-07  8:25 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 29.04.2021 11:21, Jan Beulich wrote:
> On 16.04.2021 16:21, Andrew Cooper wrote:
>> On 16/04/2021 14:20, Jan Beulich wrote:
>>> For Intel CPUs we record L3 cache size, hence we should also do so for
>>> AMD and alike.
>>>
>>> While making these additions, also make sure (throughout the function)
>>> that we don't needlessly overwrite prior values when the new value to be
>>> stored is zero.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> I have to admit though that I'm not convinced the sole real use of the
>>> field (in flush_area_local()) is a good one - flushing an entire L3's
>>> worth of lines via CLFLUSH may not be more efficient than using WBINVD.
>>> But I didn't measure it (yet).
>>
>> WBINVD always needs a broadcast IPI to work correctly.
>>
>> CLFLUSH and friends let you do this from a single CPU, using cache
>> coherency to DTRT with the line, wherever it is.
>>
>>
>> Looking at that logic in flush_area_local(), I don't see how it can be
>> correct.  The WBINVD path is a decomposition inside the IPI, but in the
>> higher level helpers, I don't see how the "area too big, convert to
>> WBINVD" can be safe.
>>
>> All users of FLUSH_CACHE are flush_all(), except two PCI
>> Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be
>> safe, while vmx_do_resume() has very dubious reasoning, and is dead code
>> I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting.
> 
> Besides my prior question on your reply, may I also ask what all of
> this means for the patch itself? After all you've been replying to
> the post-commit-message remark only so far.

As for the other patch just pinged again, unless I hear back on the
patch itself by then, I'm intending to commit this the week after the
next one, if need be without any acks.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-07  8:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16 13:20 [PATCH] x86/AMD: also determine L3 cache size Jan Beulich
2021-04-16 14:21 ` Andrew Cooper
2021-04-16 14:27   ` Jan Beulich
2021-04-29  9:21   ` Jan Beulich
2021-05-07  8:25     ` Ping: " Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.