linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, fpu: correct XSAVE xstate size calculation
@ 2015-07-28 17:21 Dave Hansen
  2015-08-05 10:32 ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2015-07-28 17:21 UTC (permalink / raw)
  To: dave; +Cc: dave.hansen, mingo, linux-kernel, bp, fenghua.yu, hpa, x86


From: Dave Hansen <dave.hansen@linux.intel.com>

Note: our xsaves support is currently broken and disabled.  This
patch does not fix it, but it is an incremental improvement.  It
might be useful to someone backporting the entire set of XSAVES
patches at some point, but it should not be backported alone.

There are currently two xsave buffer formats: standard and
compacted.  The standard format is waht 'XSAVE' and 'XSAVEOPT'
produce while 'XSAVES' and 'XSAVEC' produce a compacted-formet
buffer.  (The kernel never uses XSAVEC)

But, the XSAVES buffer *ALSO* contains "system state components"
which are never saved by a plain XSAVE.  So, XSAVES has two
things that might make its buffer differently-sized from an
XSAVE-produced one.

The current code assumes that an XSAVES buffer's size is simply
the sum of the sizes of the (user) states which are supported.
This seems to work in most cases, but it is not consistent with
what the SDM says, and it breaks if we 'align' a component in the
buffer.  The calculation is also unnecessary work since the CPU
*tells* us the size of the buffer directly.

This patch just reads the size of the buffer right out of the
CPUID leaf instead of trying to derive it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
---

 b/arch/x86/kernel/fpu/xstate.c |   36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff -puN arch/x86/kernel/fpu/xstate.c~fix-xstate_size-calculation arch/x86/kernel/fpu/xstate.c
--- a/arch/x86/kernel/fpu/xstate.c~fix-xstate_size-calculation	2015-07-24 09:50:36.418385438 -0700
+++ b/arch/x86/kernel/fpu/xstate.c	2015-07-27 11:02:13.305376883 -0700
@@ -292,24 +292,40 @@ static void __init setup_init_fpu_buf(vo
 
 /*
  * Calculate total size of enabled xstates in XCR0/xfeatures_mask.
+ *
+ * Note the SDM's wording here.  "sub-function 0" only enumerates
+ * the size of the *user* states.  If we use it to size a buffer
+ * that we use 'XSAVES' on, we could potentially overflow the
+ * buffer because 'XSAVES' saves system states too.
+ *
+ * Note that we do not currently set any bits on IA32_XSS so
+ * 'XCR0 | IA32_XSS == XCR0' for now.
  */
 static void __init init_xstate_size(void)
 {
 	unsigned int eax, ebx, ecx, edx;
-	int i;
 
 	if (!cpu_has_xsaves) {
+		/*
+		 * - CPUID function 0DH, sub-function 0:
+		 *    EBX enumerates the size (in bytes) required by
+		 *    the XSAVE instruction for an XSAVE area
+		 *    containing all the *user* state components
+		 *    corresponding to bits currently set in XCR0.
+		 */
 		cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		xstate_size = ebx;
-		return;
-	}
-
-	xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
-	for (i = 2; i < 64; i++) {
-		if (test_bit(i, (unsigned long *)&xfeatures_mask)) {
-			cpuid_count(XSTATE_CPUID, i, &eax, &ebx, &ecx, &edx);
-			xstate_size += eax;
-		}
+	} else {
+		/*
+		 * - CPUID function 0DH, sub-function 1:
+		 *    EBX enumerates the size (in bytes) required by
+		 *    the XSAVES instruction for an XSAVE area
+		 *    containing all the state components
+		 *    corresponding to bits currently set in
+		 *    XCR0 | IA32_XSS.
+		 */
+		cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
+		xstate_size = ebx;
 	}
 }
 
_

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-07-28 17:21 [PATCH] x86, fpu: correct XSAVE xstate size calculation Dave Hansen
@ 2015-08-05 10:32 ` Ingo Molnar
  2015-08-05 14:34   ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2015-08-05 10:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: dave.hansen, linux-kernel, bp, fenghua.yu, hpa, x86,
	Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko


* Dave Hansen <dave@sr71.net> wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Note: our xsaves support is currently broken and disabled.  This
> patch does not fix it, but it is an incremental improvement.  It
> might be useful to someone backporting the entire set of XSAVES
> patches at some point, but it should not be backported alone.
> 
> There are currently two xsave buffer formats: standard and
> compacted.  The standard format is waht 'XSAVE' and 'XSAVEOPT'
> produce while 'XSAVES' and 'XSAVEC' produce a compacted-formet
> buffer.  (The kernel never uses XSAVEC)
> 
> But, the XSAVES buffer *ALSO* contains "system state components"
> which are never saved by a plain XSAVE.  So, XSAVES has two
> things that might make its buffer differently-sized from an
> XSAVE-produced one.
> 
> The current code assumes that an XSAVES buffer's size is simply
> the sum of the sizes of the (user) states which are supported.
> This seems to work in most cases, but it is not consistent with
> what the SDM says, and it breaks if we 'align' a component in the
> buffer.  The calculation is also unnecessary work since the CPU
> *tells* us the size of the buffer directly.
> 
> This patch just reads the size of the buffer right out of the
> CPUID leaf instead of trying to derive it.

So how will we know where to find which field, if we cannot even do a size 
calculation?

I realize that the calculation and what CPUID gives us should match, but it's not 
really good for the kernel to not know the precise layout of a critical task 
context data structure ...

So can we turn this into 'double check the CPUID size and print a warning on 
mismatch' kind of boot time sanity check? Preferably for all XSAVE* data formats 
we can run into. I'd be fine with applying such a patch ahead of enabling 
compaction again.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-05 10:32 ` Ingo Molnar
@ 2015-08-05 14:34   ` Dave Hansen
  2015-08-06  7:15     ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2015-08-05 14:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: dave.hansen, linux-kernel, bp, fenghua.yu, hpa, x86,
	Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko

On 08/05/2015 03:32 AM, Ingo Molnar wrote:
> * Dave Hansen <dave@sr71.net> wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>
>> Note: our xsaves support is currently broken and disabled.  This
>> patch does not fix it, but it is an incremental improvement.  It
>> might be useful to someone backporting the entire set of XSAVES
>> patches at some point, but it should not be backported alone.
>>
>> There are currently two xsave buffer formats: standard and
>> compacted.  The standard format is waht 'XSAVE' and 'XSAVEOPT'
>> produce while 'XSAVES' and 'XSAVEC' produce a compacted-formet
>> buffer.  (The kernel never uses XSAVEC)
>>
>> But, the XSAVES buffer *ALSO* contains "system state components"
>> which are never saved by a plain XSAVE.  So, XSAVES has two
>> things that might make its buffer differently-sized from an
>> XSAVE-produced one.
>>
>> The current code assumes that an XSAVES buffer's size is simply
>> the sum of the sizes of the (user) states which are supported.
>> This seems to work in most cases, but it is not consistent with
>> what the SDM says, and it breaks if we 'align' a component in the
>> buffer.  The calculation is also unnecessary work since the CPU
>> *tells* us the size of the buffer directly.
>>
>> This patch just reads the size of the buffer right out of the
>> CPUID leaf instead of trying to derive it.
> 
> So how will we know where to find which field, if we cannot even do a size 
> calculation?

setup_xstate_features() still populates xstate_offsets[] which tells us
where to find each field.  This patch does not change that.

> I realize that the calculation and what CPUID gives us should match, but it's not 
> really good for the kernel to not know the precise layout of a critical task 
> context data structure ...

There is no architectural guarantee that the sum of xstate sizes will be
the same as what comes out of that CPUID leaf.  It would be nice, but
it's not architectural and I've run in to platforms where that
assumption does not hold.

> So can we turn this into 'double check the CPUID size and print a warning on 
> mismatch' kind of boot time sanity check? Preferably for all XSAVE* data formats 
> we can run into. I'd be fine with applying such a patch ahead of enabling 
> compaction again.

I don't think that is sufficient.

There are 4 reasons to apply this patch that I can think of:
1. There is no architectural guarantee that the calculation (sum of
   xstate sizes) will match what CPUID gives us as the size of the
   buffer.  I've seen this in practice.
2. The alignment bit indicates that there is space used in the buffer
   which is not part of a state component.  The current code does not
   take that in to account.
3. The code is currently asking for the size of an XSAVE-produced
   buffer.  The code will be wrong the moment we switch to XSAVES
   because XSAVES saves more things than XSAVE and uses more space.
4. It makes the code smaller and simpler, especially if you consider
   what would happen if we added "real" alignment support.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-05 14:34   ` Dave Hansen
@ 2015-08-06  7:15     ` Ingo Molnar
       [not found]       ` <CA+55aFxzOj-Ee=DN-_3CMeDeYVsmvmmgoxd3hp4MpRSp+og7AQ@mail.gmail.com>
  2015-08-06 17:19       ` Dave Hansen
  0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2015-08-06  7:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: dave.hansen, linux-kernel, bp, fenghua.yu, hpa, x86,
	Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko


* Dave Hansen <dave@sr71.net> wrote:

> > I realize that the calculation and what CPUID gives us should match, but it's 
> > not really good for the kernel to not know the precise layout of a critical 
> > task context data structure ...
> 
> There is no architectural guarantee that the sum of xstate sizes will be the 
> same as what comes out of that CPUID leaf.  It would be nice, but it's not 
> architectural and I've run in to platforms where that assumption does not hold.

WHY?

What sense does it make to have a blob we don't know the exact layout of? How will 
debuggers or user-space in general be able to print (and change) the register 
values if they don't know the layout?

If 'compacted' format means "binary blob only the CPU can decode, not the kernel" 
then our answer is "uhm, no, thank you, we'll use standard format instead" ...

And no, "it's not Intel architectural" is a stupid and somewhat circular argument 
IMHO: the kernel always knew how to decompose CPU context dumps and you'll have to 
come up with a damn better reason to break that than pointing at some text in an 
Intel document.

> > So can we turn this into 'double check the CPUID size and print a warning on 
> > mismatch' kind of boot time sanity check? Preferably for all XSAVE* data 
> > formats we can run into. I'd be fine with applying such a patch ahead of 
> > enabling compaction again.
> 
> I don't think that is sufficient.
> 
> There are 4 reasons to apply this patch that I can think of:
>    1. There is no architectural guarantee that the calculation (sum of
>    xstate sizes) will match what CPUID gives us as the size of the
>    buffer.  I've seen this in practice.

So the context layout and structure on such CPUs has to be mapped and properly 
taken into account in the size calculation. How can GDB or any other (kernel) 
debugger display (and change) individual fields reliably if the layout is not 
known?

> 2. The alignment bit indicates that there is space used in the buffer
>    which is not part of a state component.  The current code does not
>    take that in to account.

Then it has to be taken it into account - just like user-space has to take it into 
account if it wants to display (and change) individual fields...

> 3. The code is currently asking for the size of an XSAVE-produced
>    buffer.  The code will be wrong the moment we switch to XSAVES
>    because XSAVES saves more things than XSAVE and uses more space.

This will have to be fixed before we move to compacted format.

> 4. It makes the code smaller and simpler, especially if you consider
>    what would happen if we added "real" alignment support.

What would happen?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
       [not found]       ` <CA+55aFxzOj-Ee=DN-_3CMeDeYVsmvmmgoxd3hp4MpRSp+og7AQ@mail.gmail.com>
@ 2015-08-06  8:27         ` Ingo Molnar
  2015-08-06  8:29           ` Ingo Molnar
  2015-08-06 14:56           ` Dave Hansen
  0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2015-08-06  8:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Anvin, Denys Vlasenko, Thomas Gleixner, linux-kernel,
	Andy Lutomirski, bp, Peter Zijlstra, fenghua.yu, x86,
	dave.hansen, Dave Hansen


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Aug 6, 2015 10:15 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
> >
> > What sense does it make to have a blob we don't know the exact layout of? How 
> > will debuggers or user-space in general be able to print (and change) the 
> > register values if they don't know the layout?
> 
> The usage model is that you only use this for saving and restoring state.
> 
> If you look at the state, you restore the state and then you look at the 
> registers. You never look at the blob itself.

So we are relying on the saved structure already in a couple of cases, such as MPX 
exception handling:

        /*
         * We need to look at BNDSTATUS to resolve this exception.
         * A NULL here might mean that it is in its 'init state',
         * which is all zeros which indicates MPX was not
         * responsible for the exception.
         */
        bndcsr = get_xsave_field_ptr(XSTATE_BNDCSR);
        if (!bndcsr)
                goto exit_trap;

        trace_bounds_exception_mpx(bndcsr);

get_xsave_field_ptr() very much knows about the structure.

Currently the hardware enumerates to us the following details (simplified, omitted 
legacies):

   offset0, size0
   offset1, size1
   offset2, size2
   ...
   offsetN, sizeN

but the alignment of the final boundary of the xsave area is not given.

So as long as the limitation is that the final pair: offsetN + sizeN might not 
extend to the true end of the save area due to the end of the XSAVE area being 
extended to natural cache line boundary (or more) - I'm fine with that, it's not 
important to being able to read it, and it's OK for the CPU to have padding areas 
it doesn't write to but might need to read from.

But if the claim is that we don't know and shouldn't know about the structure of 
these blobs, I think that's generally a bad idea, even if in the normal case we 
don't touch the blobs and just pass them through to user-space.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-06  8:27         ` Ingo Molnar
@ 2015-08-06  8:29           ` Ingo Molnar
  2015-08-06 14:56           ` Dave Hansen
  1 sibling, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2015-08-06  8:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Anvin, Denys Vlasenko, Thomas Gleixner, linux-kernel,
	Andy Lutomirski, bp, Peter Zijlstra, fenghua.yu, x86,
	dave.hansen, Dave Hansen


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Aug 6, 2015 10:15 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
> > >
> > > What sense does it make to have a blob we don't know the exact layout of? How 
> > > will debuggers or user-space in general be able to print (and change) the 
> > > register values if they don't know the layout?
> > 
> > The usage model is that you only use this for saving and restoring state.
> > 
> > If you look at the state, you restore the state and then you look at the 
> > registers. You never look at the blob itself.
> 
> So we are relying on the saved structure already in a couple of cases, such as MPX 
> exception handling:
> 
>         /*
>          * We need to look at BNDSTATUS to resolve this exception.
>          * A NULL here might mean that it is in its 'init state',
>          * which is all zeros which indicates MPX was not
>          * responsible for the exception.
>          */
>         bndcsr = get_xsave_field_ptr(XSTATE_BNDCSR);
>         if (!bndcsr)
>                 goto exit_trap;
> 
>         trace_bounds_exception_mpx(bndcsr);
> 
> get_xsave_field_ptr() very much knows about the structure.

Correction:

  get_xsave_field_ptr() users very much know about the structure.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-06  8:27         ` Ingo Molnar
  2015-08-06  8:29           ` Ingo Molnar
@ 2015-08-06 14:56           ` Dave Hansen
  2015-08-06 16:03             ` Dave Hansen
  2015-08-08  9:15             ` Ingo Molnar
  1 sibling, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2015-08-06 14:56 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Peter Anvin, Denys Vlasenko, Thomas Gleixner, linux-kernel,
	Andy Lutomirski, bp, Peter Zijlstra, fenghua.yu, x86,
	dave.hansen

I think we have three options.  Here's some rough pseudo-ish-code to 
sketch them out.

/* Option 1, what we have today */
	/* 
	 * This breaks if offset[i]+size[i] != offset[i+1]
	 * or if alignment is in play.  Silly hardware breaks
	 * this today.
	 */
	for (i = 0; i < nr_xstates; i++) {
		if (!enabled_xstate(i))
			continue;
		total_blob_size += xstate_sizes[i];
	}

/* Option 2: search for the end of the last state, probably works, Ingo likes? */
	for (i = 0; i < nr_xstates; i++) {
		if (cpu_has_xsaves && !enabled_xstate(i))
			continue;
		end_of_state = xstate_offsets[i] + xstate_sizes[i];
		if (xstate_is_aligned[i]) /* currently not implemented */
			end_of_state = ALIGN(end_of_state, 64);
		if (end_of_state > total_blob_size)
			total_blob_size = end_of_state;
	}
	/* align unconditionally, maybe??? */
	total_blob_size = ALIGN(total_blob_size, 64);

	/* Double check our obviously bug-free math with what the CPU says */
	if (!cpu_has_xsaves)
		cpuid(0xD0, 0, &check_total_blob_size, ...);
	else
		cpuid(0xD0, 1, &check_total_blob_size, ...);

	WARN_ON(check_total_blob_size != total_blob_size);


/* Option 3, trust the CPU (what Dave's patch does) */
	if (!cpu_has_xsaves)
		cpuid(0xD0, 0, &total_blob_size, ...);
	else
		cpuid(0xD0, 1, &total_blob_size, ...);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-06 14:56           ` Dave Hansen
@ 2015-08-06 16:03             ` Dave Hansen
  2015-08-08  9:15             ` Ingo Molnar
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2015-08-06 16:03 UTC (permalink / raw)
  To: Dave Hansen, Ingo Molnar, Linus Torvalds
  Cc: Peter Anvin, Denys Vlasenko, Thomas Gleixner, linux-kernel,
	Andy Lutomirski, bp, Peter Zijlstra, fenghua.yu, x86

Here's a correction about what we do today, and it's an important point.
 The code I was ripping out in the patch was *really* just for the
XSAVES/compacted case, does *NOT* do anything today since we've disabled
XSAVES.

So perhaps the title here should be:

	[PATCH] x86, fpu: correct XSAVES xstate size calculation

/* Option 1, what we have today */

	if (!cpu_has_xsaves) {
		cpuid(0xD0, 0, &total_blob_size, ...);
		return;
	}
	/*
	 * This breaks if offset[i]+size[i] != offset[i+1]
	 * or if alignment is in play.  Silly hardware breaks
	 * this today.
	 */
	for (i = 0; i < nr_xstates; i++) {
		if (!enabled_xstate(i))
			continue;
		total_blob_size += xstate_sizes[i];
	}


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-06  7:15     ` Ingo Molnar
       [not found]       ` <CA+55aFxzOj-Ee=DN-_3CMeDeYVsmvmmgoxd3hp4MpRSp+og7AQ@mail.gmail.com>
@ 2015-08-06 17:19       ` Dave Hansen
  2015-08-08  9:06         ` Ingo Molnar
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2015-08-06 17:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: dave.hansen, linux-kernel, bp, fenghua.yu, hpa, x86,
	Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko

Just to be clear: the current code is OK and correct for non-compacted
buffers.  Since we currently disable the compacted buffers, this patch
has no effect on current kernels.

This patch fixes the (currently unused) calculation for sizing the
compacted-format buffer.  I can either send it now, or try to make sure
it gets picked up by whoever goes back and re-implents
XSAVES/compact-format support.

On 08/06/2015 12:15 AM, Ingo Molnar wrote:
> * Dave Hansen <dave@sr71.net> wrote:
>>> I realize that the calculation and what CPUID gives us should match, but it's 
>>> not really good for the kernel to not know the precise layout of a critical 
>>> task context data structure ...
>>
>> There is no architectural guarantee that the sum of xstate sizes will be the 
>> same as what comes out of that CPUID leaf.  It would be nice, but it's not 
>> architectural and I've run in to platforms where that assumption does not hold.
> 
> WHY?

>From a real dmesg:

[    0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[    0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
[    0.000000] x86/fpu: xstate_offset[4]: 0400, xstate_sizes[4]: 0040
...

Note: 0x240 + 0x100 != 0x3c0.

> What sense does it make to have a blob we don't know the exact layout of? How will 
> debuggers or user-space in general be able to print (and change) the register 
> values if they don't know the layout?

Ingo, we know the layout.  We know where every component is.  We know
how big each component is.  This patch does not change the fact that we
calculate and store that.

The *ONLY* thing it does it not derive the total (compacted) buffer size
from that layout since we have another simpler way of getting it.

In fact, it makes the compacted format size calculation work in an
analogous way to the non-compacted one that works today!

> If 'compacted' format means "binary blob only the CPU can decode, not the kernel" 
> then our answer is "uhm, no, thank you, we'll use standard format instead" ...

Nobody is saying that.  We need to read its contents (like with MPX),
and the CPU tells us everything we need in order to decode it.

> And no, "it's not Intel architectural" is a stupid and somewhat circular argument 
> IMHO: the kernel always knew how to decompose CPU context dumps and you'll have to 
> come up with a damn better reason to break that than pointing at some text in an 
> Intel document.

setup_xstate_features() still populates xstate_offsets[] which tells us
where to find each field.  This patch does not change that.

This only changes how we size the (compacted) buffer, not how we
decompose it.

>>> So can we turn this into 'double check the CPUID size and print a warning on 
>>> mismatch' kind of boot time sanity check? Preferably for all XSAVE* data 
>>> formats we can run into. I'd be fine with applying such a patch ahead of 
>>> enabling compaction again.
>>
>> I don't think that is sufficient.
>>
>> There are 4 reasons to apply this patch that I can think of:
>>    1. There is no architectural guarantee that the calculation (sum of
>>    xstate sizes) will match what CPUID gives us as the size of the
>>    buffer.  I've seen this in practice.
> 
> So the context layout and structure on such CPUs has to be mapped and properly 
> taken into account in the size calculation. How can GDB or any other (kernel) 
> debugger display (and change) individual fields reliably if the layout is not 
> known?

Anything wanting to decode the buffer needs to read the format from
CPUID and decode it that way.  GDB from what I hear actually does it
wrong and there are folks looking to fix it.

>> 2. The alignment bit indicates that there is space used in the buffer
>>    which is not part of a state component.  The current code does not
>>    take that in to account.
> 
> Then it has to be taken it into account - just like user-space has to take it into 
> account if it wants to display (and change) individual fields...

This isn't the end of the world (it's just a few more lines and cpuid
calls), but it is unnecessary work.

>> 3. The code is currently asking for the size of an XSAVE-produced
>>    buffer.  The code will be wrong the moment we switch to XSAVES
>>    because XSAVES saves more things than XSAVE and uses more space.
> 
> This will have to be fixed before we move to compacted format.

This patch is intended to help us move in the direction of re-enabling
the compacted format.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-06 17:19       ` Dave Hansen
@ 2015-08-08  9:06         ` Ingo Molnar
  2015-08-10 21:14           ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2015-08-08  9:06 UTC (permalink / raw)
  To: Dave Hansen
  Cc: dave.hansen, linux-kernel, bp, fenghua.yu, hpa, x86,
	Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko


* Dave Hansen <dave@sr71.net> wrote:

> Just to be clear: the current code is OK and correct for non-compacted
> buffers.  Since we currently disable the compacted buffers, this patch
> has no effect on current kernels.

Absolutely, this was my assumption as well.

> This patch fixes the (currently unused) calculation for sizing the
> compacted-format buffer.  I can either send it now, or try to make sure
> it gets picked up by whoever goes back and re-implents
> XSAVES/compact-format support.
> 
> On 08/06/2015 12:15 AM, Ingo Molnar wrote:
> > * Dave Hansen <dave@sr71.net> wrote:
> >>> I realize that the calculation and what CPUID gives us should match, but it's 
> >>> not really good for the kernel to not know the precise layout of a critical 
> >>> task context data structure ...
> >>
> >> There is no architectural guarantee that the sum of xstate sizes will be the 
> >> same as what comes out of that CPUID leaf.  It would be nice, but it's not 
> >> architectural and I've run in to platforms where that assumption does not hold.
> > 
> > WHY?
> 
> From a real dmesg:
> 
> [    0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
> [    0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
> [    0.000000] x86/fpu: xstate_offset[4]: 0400, xstate_sizes[4]: 0040
> ...
> 
> Note: 0x240 + 0x100 != 0x3c0.

This kind of alignment related offset padding is indeed harmless.

> 
> > What sense does it make to have a blob we don't know the exact layout of? How will 
> > debuggers or user-space in general be able to print (and change) the register 
> > values if they don't know the layout?
> 
> Ingo, we know the layout.  We know where every component is.  We know
> how big each component is.  This patch does not change the fact that we
> calculate and store that.

The patch you submitted blindly trusts the CPU, and I'm uneasy about that for 
multiple reasons. We can and should do better than that, while still flexibly 
making use of all CPU capabilities that are offered.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-06 14:56           ` Dave Hansen
  2015-08-06 16:03             ` Dave Hansen
@ 2015-08-08  9:15             ` Ingo Molnar
  1 sibling, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2015-08-08  9:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linus Torvalds, Peter Anvin, Denys Vlasenko, Thomas Gleixner,
	linux-kernel, Andy Lutomirski, bp, Peter Zijlstra, fenghua.yu,
	x86, dave.hansen


* Dave Hansen <dave@sr71.net> wrote:

> I think we have three options.  Here's some rough pseudo-ish-code to 
> sketch them out.

> /* Option 2: search for the end of the last state, probably works, Ingo likes? */
> 	for (i = 0; i < nr_xstates; i++) {
> 		if (cpu_has_xsaves && !enabled_xstate(i))
> 			continue;
> 		end_of_state = xstate_offsets[i] + xstate_sizes[i];
> 		if (xstate_is_aligned[i]) /* currently not implemented */
> 			end_of_state = ALIGN(end_of_state, 64);
> 		if (end_of_state > total_blob_size)
> 			total_blob_size = end_of_state;
> 	}
> 	/* align unconditionally, maybe??? */
> 	total_blob_size = ALIGN(total_blob_size, 64);
> 
> 	/* Double check our obviously bug-free math with what the CPU says */
> 	if (!cpu_has_xsaves)
> 		cpuid(0xD0, 0, &check_total_blob_size, ...);
> 	else
> 		cpuid(0xD0, 1, &check_total_blob_size, ...);
> 
> 	WARN_ON(check_total_blob_size != total_blob_size);

Yes, this is quite close to what I'd like to see.

So I'd do the following things as well:

 - For each xstate feature we enable, define a data structure on fpu/types.h, and
   double check xstate_size against the sizeof() of that structure. Most of the
   current state components are properly declared, but for example AVX512 is
   missing. This documents the features nicely, and also makes it easy for KGDB 
   and tracers to print out the fields - even if in the normal codepaths we rely 
   on the CPU to handle this structure.

   (This is the most important detail I am and was worried about all along.)

 - If our calculations and that of the CPU's mismatch, pick the CPU's variant
   instead.

 - Use WARN_ONCE() instead of WARN_ON() to make life easier on 100+ core 
   prototypes.

 - Remove the LWP bits, I don't think we'll ever support it, it's a broken
   PMU framework.

 - Detail: at least with AVX-1024 (which I'm sure we'll see one day), the natural
   alignment of vector registers will rise from 64 bytes to 128 bytes. At that 
   point the beginning of the AVX-1024 buffer will likely be two cachelines 
   aligned and there might be more padding at the end of the AVX-512 area.
   Make sure the alignment checking code handles this right.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-08  9:06         ` Ingo Molnar
@ 2015-08-10 21:14           ` Dave Hansen
  2015-08-22 13:21             ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2015-08-10 21:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: dave.hansen, linux-kernel, bp, fenghua.yu, hpa, x86,
	Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko

On 08/08/2015 02:06 AM, Ingo Molnar wrote:
>>> What sense does it make to have a blob we don't know the exact layout of? How will 
>>> > > debuggers or user-space in general be able to print (and change) the register 
>>> > > values if they don't know the layout?
>> > 
>> > Ingo, we know the layout.  We know where every component is.  We know
>> > how big each component is.  This patch does not change the fact that we
>> > calculate and store that.
> The patch you submitted blindly trusts the CPU, and I'm uneasy about that for 
> multiple reasons. We can and should do better than that, while still flexibly 
> making use of all CPU capabilities that are offered.

Yes, it blindly trusts the CPU.  This is precisely* what the *existing*
code has done since commit dc1e35c6e95 got merged in 2008.  Do you have
some specific concern with the compact format that makes you want to
stop blindly trusting the CPU after 7 years?

I know what you want now (I've coded up half of it already), but I've
not got the foggiest idea why other than pure paranoia.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
  2015-08-10 21:14           ` Dave Hansen
@ 2015-08-22 13:21             ` Ingo Molnar
  0 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2015-08-22 13:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: dave.hansen, linux-kernel, bp, fenghua.yu, hpa, x86,
	Thomas Gleixner, Peter Zijlstra, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko


* Dave Hansen <dave@sr71.net> wrote:

> On 08/08/2015 02:06 AM, Ingo Molnar wrote:
> >>> What sense does it make to have a blob we don't know the exact layout of? How will 
> >>> > > debuggers or user-space in general be able to print (and change) the register 
> >>> > > values if they don't know the layout?
> >> > 
> >> > Ingo, we know the layout.  We know where every component is.  We know
> >> > how big each component is.  This patch does not change the fact that we
> >> > calculate and store that.
> > The patch you submitted blindly trusts the CPU, and I'm uneasy about that for 
> > multiple reasons. We can and should do better than that, while still flexibly 
> > making use of all CPU capabilities that are offered.
> 
> Yes, it blindly trusts the CPU.  This is precisely* what the *existing*
> code has done since commit dc1e35c6e95 got merged in 2008.  Do you have
> some specific concern with the compact format that makes you want to
> stop blindly trusting the CPU after 7 years?

Yes, the fact that 'compact format' probably never worked well and we had to 
revert use of it.

> I know what you want now (I've coded up half of it already), but I've not got 
> the foggiest idea why other than pure paranoia.

There were multiple bugs in this code so some amount of paranoia is justified.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-08-22 13:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-28 17:21 [PATCH] x86, fpu: correct XSAVE xstate size calculation Dave Hansen
2015-08-05 10:32 ` Ingo Molnar
2015-08-05 14:34   ` Dave Hansen
2015-08-06  7:15     ` Ingo Molnar
     [not found]       ` <CA+55aFxzOj-Ee=DN-_3CMeDeYVsmvmmgoxd3hp4MpRSp+og7AQ@mail.gmail.com>
2015-08-06  8:27         ` Ingo Molnar
2015-08-06  8:29           ` Ingo Molnar
2015-08-06 14:56           ` Dave Hansen
2015-08-06 16:03             ` Dave Hansen
2015-08-08  9:15             ` Ingo Molnar
2015-08-06 17:19       ` Dave Hansen
2015-08-08  9:06         ` Ingo Molnar
2015-08-10 21:14           ` Dave Hansen
2015-08-22 13:21             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).