linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Bae, Chang Seok" <chang.seok.bae@intel.com>
To: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, x86-ml <x86@kernel.org>,
	"Brown, Len" <len.brown@intel.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"Liu, Jing2" <jing2.liu@intel.com>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers
Date: Wed, 27 Jan 2021 01:23:57 +0000	[thread overview]
Message-ID: <80003059-987E-4FFA-8F9D-6A480192BE5D@intel.com> (raw)
In-Reply-To: <20210126201746.GB9662@zn.tnic>

On Jan 26, 2021, at 12:17, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Dec 23, 2020 at 07:57:03AM -0800, Chang S. Bae wrote:
>> 
>> +	/*
>> +	 * @state_mask:
>> +	 *
>> +	 * The state component bitmap. It indicates the saved xstate in
>> +	 * either @state or @state_ptr. The map value starts to be aligned
>> +	 * with @state and then with @state_ptr once it is in use.
> 
> Are you trying to say here that the mask describes the state saved in
> @state initially and then, when the task is switched to dynamic state,
> it denotes the state in ->state_ptr?

Yes, it is. I will take your sentence in the comment. Thank you.

>> +	 */
>> +	u64				state_mask;
>> +
>> +	/*
>> +	 * @state_ptr:
>> +	 *
>> +	 * Copy of all extended register states, in a dynamically allocated
>> +	 * buffer. When a task is using extended features, the register state
>> +	 * is always the most current. This state copy is more recent than
>> +	 * @state. If the task context-switches away, they get saved here,
>> +	 * representing the xstate.
> 
> Calling it a copy here is confusing - you wanna say that when dynamic
> states get used, the state in state_ptr supercedes and invalidates the
> state in @state. AFAIU, at least.

True, it looks better here too.

>> +DEFINE_EVENT(x86_fpu, x86_fpu_xstate_alloc_failed,
>> +	TP_PROTO(struct fpu *fpu),
>> +	TP_ARGS(fpu)
>> +);
>> +
> 
> Huh, what's that for?

This tracepoint can point to the allocation failure even with the NMI handling
failure message only. (You can also check the comment below at the call site.)

>> /*
>>  * Although we spell it out in here, the Processor Trace
>> @@ -71,6 +73,7 @@ static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] =
>> static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 1] = -1};
>> static unsigned int xstate_comp_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
>> static unsigned int xstate_supervisor_only_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
>> +static bool xstate_aligns[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = false};
> 
> What's that for?

The xstate buffer may expand on the fly. The size has to be correctly
calculated if needed. CPUID provides essential information for the
calculation. Instead of reading CPUID repeatedly, store them -- the offset and
size are already stored here. The 64B alignment looks to be missing, so added
here.

>> +	/*
>> +	 * Calculate the size by summing up each state together, since no known
>> +	 * size found with the xstate buffer format out of the given mask.
>> +	 */
> 
> I barely can imagine what that comment is trying to tell me...

How about:
    “With the given mask, no relevant size is found so far. So, calculate it by
     summing up each state size."

>> +/* The watched threshold size of dynamically allocated xstate buffer */
> 
> Watched?

Maybe: 
    "When the buffer is more than this size, the current mechanism is
     potentially marginal to support the allocations."

>> +#define XSTATE_BUFFER_MAX_BYTES		(64 * 1024)
> 
> What's that thing for when we have fpu_kernel_xstate_max_size too?

The threshold size is what the current mechanism can comfortably allocate
(maybe at most). The warning is left when the buffer size goes beyond the 
threshold. Then, we may need to consider a better allocation mechanism.

>> static int __init init_xstate_size(void)
>> {
>> 	/* Recompute the context size for enabled features: */
>> @@ -779,6 +830,14 @@ static int __init init_xstate_size(void)
>> 	if (!is_supported_xstate_size(fpu_kernel_xstate_min_size))
>> 		return -EINVAL;
>> 
>> +	/*
>> +	 * When allocating buffers larger than the threshold, a more sophisticated
>> +	 * mechanism might be considerable.
>> +	 */
>> +	if (fpu_kernel_xstate_max_size > XSTATE_BUFFER_MAX_BYTES)
>> +		pr_warn("x86/fpu: xstate buffer too large (%u > %u)\n",
>> +			fpu_kernel_xstate_max_size, XSTATE_BUFFER_MAX_BYTES);
> 
> So why doesn't this return an error?

Although a warning is given, vmalloc() may manage to allocate this size. So,
it was not considered a hard hit yet. vmalloc() failure will return an error
later.

>> 	/*
>> 	 * User space is always in standard format.
>> 	 */
>> @@ -869,6 +928,9 @@ void __init fpu__init_system_xstate(void)
>> 	if (err)
>> 		goto out_disable;
>> 
>> +	/* Make sure init_task does not include the dynamic user states */
>> +	current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
> 
> xfeatures_mask_user_dynamic just got set to 0 a couple of lines above...

Well, it will have some values when the piece in place to support the dynamic
user state. PATCH13 has this change there:
 
+	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+		u64 feature_mask = BIT_ULL(i);
+
+		if (!(xfeatures_mask_user() & feature_mask))
+			continue;
+
+		if (xfeature_disable_supported(i))
+			xfeatures_mask_user_dynamic |= feature_mask;
+	}

>> +/*
>> + * Allocate an xstate buffer with the size calculated based on 'mask'.
>> + *
>> + * The allocation mechanism does not shrink or reclaim the buffer.
>> + */
>> +int alloc_xstate_buffer(struct fpu *fpu, u64 mask)
>> +{
>> +	union fpregs_state *state_ptr;
>> +	unsigned int oldsz, newsz;
>> +	u64 state_mask;
>> +
>> +	state_mask = fpu->state_mask | mask;
>> +
>> +	oldsz = get_xstate_size(fpu->state_mask);
>> +	newsz = get_xstate_size(state_mask);
>> +
>> +	if (oldsz >= newsz)
>> +		return 0;
>> +
>> +	if (newsz > fpu_kernel_xstate_max_size) {
>> +		pr_warn_once("x86/fpu: xstate buffer too large (%u > %u bytes)\n",
>> +			     newsz, fpu_kernel_xstate_max_size);
>> +		XSTATE_WARN_ON(1);
>> +		return 0;
> 
> return 0?!? On an error?!?

Okay, the first question is whether this is an error. Well, with such too-much
size though, the buffer can still store the states. So, give a warning at
least. Perhaps, a similar case is when the calculated size is unmatched with
the CPUID-provided [3]. We give a warning, not an error there, maybe assuming
the calculated is larger.

But if it should be considered an error, maybe return -EINVAL.

>> +	}
>> +
>> +	/* We need 64B aligned pointer, but vmalloc() returns a page-aligned address. */
> 
> So this comment is useless, basically...

Okay, removed.

>> +	state_ptr = vmalloc(newsz);
>> +	if (!state_ptr) {
>> +		trace_x86_fpu_xstate_alloc_failed(fpu);
> 
> WTH is that tracepoint here for?

While it returns an error, this function can be on the path of NMI handling.
Then, likely only with the “unexpected #NM exception” message. So, logging a
tracepoint can provide evidence of the allocation failure in that case.

The comments on v1 [1][2] were received as such change.

>> +		return -ENOMEM;
>> +	}
>> +
>> +	memset(state_ptr, 0, newsz);
> 
> So vzalloc() above?

Yes, I think it is better to use vzalloc() here.

> I must be missing something here but where's the logic that decides
> between the static and dynamic buffer? Later patches?
> 
> I have to admit I've yet to see how the "switching" between static and
> dynamic state happens…

PATCH9 introduces a wrapper that determines which to take. It simply returns
state_ptr when not a null pointer. So, the logic is to use the dynamic buffer
when available.

Thanks,
Chang

[1] https://lore.kernel.org/lkml/69721125-4e1c-ca9c-ff59-8e1331933e6c@intel.com/#t
[2] https://lore.kernel.org/lkml/20201014104148.GD2628@hirez.programming.kicks-ass.net/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n657

  reply	other threads:[~2021-01-27  3:53 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-23 15:56 [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2020-12-23 15:56 ` [PATCH v3 01/21] x86/fpu/xstate: Modify initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-01-15 12:40   ` Borislav Petkov
2020-12-23 15:56 ` [PATCH v3 02/21] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-01-15 12:50   ` Borislav Petkov
2021-01-19 18:50     ` Bae, Chang Seok
2021-01-20 20:53       ` Borislav Petkov
2021-01-20 21:12         ` Bae, Chang Seok
2020-12-23 15:56 ` [PATCH v3 03/21] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-01-15 13:06   ` Borislav Petkov
2020-12-23 15:57 ` [PATCH v3 04/21] x86/fpu/xstate: Modify context switch helpers " Chang S. Bae
2021-01-15 13:18   ` Borislav Petkov
2021-01-19 18:49     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 05/21] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-01-15 13:39   ` Borislav Petkov
2021-01-15 19:47     ` Bae, Chang Seok
2021-01-19 15:57       ` Borislav Petkov
2021-01-19 18:57         ` Bae, Chang Seok
2021-01-22 10:56           ` Borislav Petkov
2021-01-27  1:23             ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 06/21] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
2021-01-22 11:44   ` Borislav Petkov
2021-01-27  1:23     ` Bae, Chang Seok
2021-01-27  9:38       ` Borislav Petkov
2021-02-03  2:54         ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage dynamic xstate buffers Chang S. Bae
2021-01-26 20:17   ` Borislav Petkov
2021-01-27  1:23     ` Bae, Chang Seok [this message]
2021-01-27 10:41       ` Borislav Petkov
2021-02-03  4:10         ` Bae, Chang Seok
2021-02-04 13:10           ` Borislav Petkov
2021-02-03  4:10     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 08/21] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
2021-02-08 12:33   ` Borislav Petkov
2021-02-08 18:53     ` Bae, Chang Seok
2021-02-09 12:49       ` Borislav Petkov
2021-02-09 15:38         ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 09/21] x86/fpu/xstate: Introduce wrapper functions to organize xstate buffer access Chang S. Bae
2021-02-08 12:33   ` Borislav Petkov
2021-02-09 15:50     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate Chang S. Bae
2021-01-07  8:41   ` Liu, Jing2
2021-01-07 18:40     ` Bae, Chang Seok
2021-01-12  2:52       ` Liu, Jing2
2021-01-15  4:59         ` Bae, Chang Seok
2021-01-15  5:45           ` Liu, Jing2
2021-02-08 12:33   ` Borislav Petkov
2021-02-09 15:48     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 11/21] x86/fpu/xstate: Update xstate buffer address finder " Chang S. Bae
2021-02-19 15:00   ` Borislav Petkov
2021-02-19 19:19     ` Bae, Chang Seok
2020-12-23 15:57 ` [PATCH v3 12/21] x86/fpu/xstate: Update xstate context copy function to support dynamic buffer Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 13/21] x86/fpu/xstate: Expand dynamic context switch buffer on first use Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 14/21] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 15/21] x86/fpu/xstate: Extend the table to map xstate components with features Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 16/21] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 18/21] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 19/21] selftest/x86/amx: Include test cases for the AMX state management Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 20/21] x86/fpu/xstate: Support dynamic user state in the signal handling path Chang S. Bae
2020-12-23 15:57 ` [PATCH v3 21/21] x86/fpu/xstate: Introduce boot-parameters to control some state component support Chang S. Bae
2020-12-23 18:37   ` Randy Dunlap
2021-01-14 21:31     ` Bae, Chang Seok
2021-01-14 21:31 ` [PATCH v3 00/21] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80003059-987E-4FFA-8F9D-6A480192BE5D@intel.com \
    --to=chang.seok.bae@intel.com \
    --cc=bp@suse.de \
    --cc=dave.hansen@intel.com \
    --cc=jing2.liu@intel.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).