From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BE8EC433E6 for ; Tue, 9 Feb 2021 22:19:09 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A9A6764E4F for ; Tue, 9 Feb 2021 22:19:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A9A6764E4F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type:Cc: List-Subscribe:List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id: In-Reply-To:MIME-Version:References:Message-ID:Subject:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=29IASU0hNXO+fgLKx+QLWUdDklwCadWDomZAKyKJcMI=; b=tmem1OJ0Qk/iKQ6gtC1SIEYsL tuBqt7APtP9Q3BSbjtHfqM2bjsvGFOSnhgNSGGD8Twy9UUCU/7QQkxTJMFJYUY8LQqE/bS0yiR3hb 1AcSZ1GuEXw7iuwcVfDWsxRfdYkKYrzlrII3o6w7AFU0SsYN8TTKNxhNvYFwgPYldg+9xQJatEPdi aNtzg6o/+UFUaZOoE7iEGWwAj4/9gBlz3HmqIFDyKSzD+3mfp50/aqyFH+q5Pe5pUVE1DndtCDOsu fyIYCMP2UhutFD6Sf1debFwNkwDNKLyasU6vWUou5DhgeiOPhyZAbIciBBgRRwI/x9TxD4bPieNvs cBDmA30LA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9bK0-0007wb-OE; Tue, 09 Feb 2021 22:17:24 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9bJw-0007vP-QK for linux-arm-kernel@lists.infradead.org; Tue, 09 Feb 2021 22:17:22 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2A61164DF4; Tue, 9 Feb 2021 22:17:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612909039; bh=ZhGSfYyQtST7KGpJixks+Lg42YDSfeAB2cRVvoBLLro=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=k2cte+pPIFvkkJIiEVtMwRoSqrLECJp15hJjtv9f3VLIB1XT9nHZtfWbuYbmHarJ0 ENIKRRNH+9LK5uqzU/2mJqwk6DNcOUUCbD3xKtZI1Y1DLzY387gPWXgLu1MdMYkCy9 nUV/jqERPuyTmv6fOnMwLjRZn95uSx2CBPJ9+Qhvi6rGSj/MrOUigbvq0vK8Q/E2SF s2F7E+Tp4XLKuc6sSnUSrMIITUaekgksgPSQ3yP7kc63BVeF55OLztP/zWGMhNOkn2 7+qppWRUw52MI/sqItRorfwELXJI8nbCP8oEfoyJ9X8sxe7Ykxnb/JtqXrXLi7iB/y /dHnPVYuSDxpg== Date: Tue, 9 Feb 2021 22:16:27 +0000 From: Mark Brown To: Dave Martin Subject: Re: [PATCH v7 1/2] arm64/sve: Split TIF_SVE into separate execute and register state flags Message-ID: <20210209221627.GB4916@sirena.org.uk> References: <20210201122901.11331-1-broonie@kernel.org> <20210201122901.11331-2-broonie@kernel.org> <20210209175943.GH21837@arm.com> MIME-Version: 1.0 In-Reply-To: <20210209175943.GH21837@arm.com> X-Cookie: Put your trust in those who are worthy. User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210209_171721_074897_4C572738 X-CRM114-Status: GOOD ( 65.82 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julien Grall , Julien Grall , Catalin Marinas , Zhang Lei , Will Deacon , linux-arm-kernel@lists.infradead.org, Daniel Kiss Content-Type: multipart/mixed; boundary="===============5606235031401819871==" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org --===============5606235031401819871== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="xXmbgvnjoT4axfJE" Content-Disposition: inline --xXmbgvnjoT4axfJE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 09, 2021 at 05:59:46PM +0000, Dave Martin wrote: > I'm wondering if TIF_SVE_FULL_REGS is actually conflating two things > here: > a) whether the SVE regs can be discarded > b) how the FPSIMD/SVE regs are encoded in thread_struct. The way I was thinking about this was that at any point where we can discard the SVE register state we should actually do so - I wasn't considering a state where we could potentially discard them but didn't actually want to do so. > When implementing a trivial policy for discarding the SVE regs, we > discard the regs at the earliest opportunity, so (a) and (b) are not > very distinct. But if we try to be cleverer later on then this would > break down. If we separate the two meanings from the outset, would > it help to steer any future policy stuff in a more maintainable > direction. > This might mean having two flags instead of one. I think if we want to track potential actions here then rather than tracking a further "potentially discardable" state we should be tracking the things that let us decide to discard the state such as being in a syscall, it seems likely that things that allow us to have such a state could also have similar implications for other areas of the ABI at some point and that we'd be able to get more usage out of that tracking than something which only applies to SVE. > > static void __sve_free(struct task_struct *task) > > { > > + /* SVE context will be zeroed when allocated. */ > > + clear_tsk_thread_flag(task, TIF_SVE_FULL_REGS); > Can you elaborate on why this is here? > I'm not sure this is the right place to clear flags. This code is > really just for freeing the buffer, along with whatever checks are > necessary to confirm that this is safe / sane to do. (i.e., > TIF_SVE(_EXEC) set.). It felt like it lined up with the assignment of sve_state to NULL immediately below. It could equally go a level up in sve_free() I think, the logic in splitting the two free functions isn't super clear =66rom the code TBH - it seems to be due to fpsimd_release_task() and wanting to avoid the state check on SVE_EXEC in the existing code which I guess people thought was valuable. > > + * * TIF_SVE_EXEC set: > > + * > > + * The task can execute SVE instructions while in userspace without > > + * trapping to the kernel. Storage of Z0-Z31 (incorporating Vn in > > + * bits[0-127]) is determined by TIF_SVE_FULL_REGS. > > * > > * task->thread.sve_state must point to a valid buffer at least > > * sve_state_size(task) bytes in size. > > * > > - * During any syscall, the kernel may optionally clear TIF_SVE and > > - * discard the vector state except for the FPSIMD subset. > > - * > > - * * TIF_SVE clear: > > + * During any syscall the ABI allows the kernel to discard the > > + * vector state other than the FPSIMD subset. When this is done > > + * both TIF_SVE_EXEC and TIF_SVE_FULL_REGS will be cleared. > Can this occur if TIF_SVE_FULL_REGS is initially set? I thought > !TIF_SVE_FULL_REGS our way of telling ourselves that we can discard the > SVE state at all... I'm not sure what you mean here - what exactly is the "this" you're referring to, and what do you mean by "initially set"? The intent of the above is to document the fact that when we do a syscall we both disable SVE execution (so an access trap is generated) and discard the SVE state. As I said in the other mail and the earlier part of this comment is intended to say the intent of !TIF_SVE_FULL_REGS is that the state for the SVE registers (if required) should be taken from the FPSIMD registers, wherever they are stored at the time the SVE state is needed. This documentation is attempting to cover the intended behaviour after further changes to allow the two flags to be set separately, it didn't seem sensible to try to document the behaviour when they're set in lock step since otherwise there will be questions about what the flags are supposed to mean separately when they're being introduced. >=20 > > + * * TIF_SVE_FULL_REGS is not set: > > * > > * When stored, FPSIMD registers V0-V31 are encoded in > > * task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z3= 1 are > > @@ -265,12 +287,38 @@ static void sve_free(struct task_struct *task) > > * view. For hygiene purposes, the kernel zeroes them on next use, > > * but userspace is discouraged from relying on this. > > * > > - * task->thread.sve_state does not need to be non-NULL, valid or any > > - * particular size: it must not be dereferenced. > > + * On entry to the kernel other than from a syscall the kernel must > > + * set TIF_SVE_FULL_REGS and save the full register state. > Only if TIF_SVE_EXEC is set though? Yes, I'll clarify this. > I had thought that it is more natural to set TIF_SVE_FULL_REGS in > preparation for entering userspace. This does mean that if we are > preempted between fpsimd_restore_current_state() and exception return to > userspace, then we would have to save the full regs unnecessarily. This > can only happen during a small window though, so it should be rare very > unless the system is already thrashing. This is the situation prior to > your series IIUC. We need to set TIF_SVE_FULL_REGS on entry to the kernel so that if we come to save the register state we know we need to save the full SVE state (which userspace may have changed) and so that if we return to userspace without saving we know if we need to zero the bits of SVE state that are not shared with the FPSIMD state. It is true that there is a race after doing a conversion from FPSIMD in the exit path where we might end up saving the full state again, as you say that's not changed here and is pretty thin. =20 > Alternatively, we set TIF_SVE_FULL_REGS when entering the kernel on a > non-syscall path with TIF_SVE_EXEC set - but that adds a small overhead > to every kernel entry and feels slightly messier. The goal is to avoid that overhead in practice by ensuring that we never leave the kernel without having both flags set, changing the check on entry to the inverse condition. I'll explicitly call that out in the comment. > I wonder if it would help to describe TIF_SVE_FULL_REGS orthogonally to > TIF_SVE_EXEC. > Although we don't intend to implement the combination !TIF_SVE_EXEC && > TIF_SVE_FULL_REGS, it does have a logically consistent meaning (SVE > disabled for userspace, but Vn nonetheless stored in sve_state, laid out > according to thread->sve_vl). My worry with doing that was that as it's not a meaningful state it would create confusion regarding how we could get into that situation and what it would actually mean. > When flipping the flags, there may be a hazard where this combination > temporarily appears, but we could hide that in a helper that runs under > get_cpu_fpsimd_context() -- just like we already do for various other > things. > (We could avoid this by adding atomic thread_flags manipulators that > flip multiple flags, but that's overkill just for this.) How about explicitly saying it's not meaningful and should never be allowed to happen? To me it's just undefined behaviour and anything that is expedient is fine. > > - if (system_supports_sve() && test_thread_flag(TIF_SVE)) > > - sve_load_state(sve_pffr(¤t->thread), > > - ¤t->thread.uw.fpsimd_state.fpsr, > > - sve_vq_from_vl(current->thread.sve_vl) - 1); > > - else > > - fpsimd_load_state(¤t->thread.uw.fpsimd_state); > > + if (test_thread_flag(TIF_SVE_EXEC)) { > Was system_supports_sve() dropped on purpose? And why? There were concerns on earlier versions of the series about the system_supports_sve() checks being redundant, mainly raised by new checks being added in this pattern - if we have TIF_SVE and no SVE hardware we've got a pretty bad bug so I was cutting them out. I had started going through and doing that but it looks like I got distracted somewhere before actually finishing it and noting it in the revision list. > > + if (test_and_set_thread_flag(TIF_SVE_FULL_REGS)) > > + sve_load_state(sve_pffr(¤t->thread), > > + ¤t->thread.uw.fpsimd_state.fpsr, > > + vl); > > + else > > + sve_load_from_fpsimd_state(¤t->thread.uw.fpsimd_state, > > + vl); > As observed above, there could be an argument for setting > TIF_SVE_FULL_REGS here, since the only reason to actually load the regs > is in preparation for entering userspace, which can freely dirty all the > SVE bits undetected since trapping for EL0 will be disabled due to > TIF_SVE_EXEC being est. Yes, we could do that - I guess it's OK to squash into this change. > > + fpsimd_load_state(¤t->thread.uw.fpsimd_state); > Maybe this could be more readable, as well as reducing indentation on > the more complex branch of the if, as: > if (!system_supports_sve || !test_thread_flag(TIF_SVE_EXEC)) { > fpsimd_load_state(¤t->thread.uw.fpsimd_state); >=20 > return; > } > /* handle TIF_SVE_EXEC case */ Sure. > > @@ -318,6 +385,7 @@ static void fpsimd_save(void) > > return; > > } > > =20 > > + set_thread_flag(TIF_SVE_FULL_REGS); > Why do this here? What if we're in a syscall? There's no problem if we're in a syscall since we'll have already cleared TIF_SVE_EXEC but yes, we shouldn't do this here - I think it's just left over from my initial pass through where I was trying to explictly handle both flags at once which just ended up being too tortuous. > > @@ -627,8 +695,9 @@ int sve_set_vector_length(struct task_struct *task, > > } > > =20 > > fpsimd_flush_task_state(task); > > - if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) > > + if (test_and_clear_tsk_thread_flag(task, TIF_SVE_FULL_REGS)) > > sve_to_fpsimd(task); > Won't this mean that we dereference thread->sve_state even if > TIF_SVE_EXEC was clear? Yes, potentially if we let the flag be set. I'll add an extra check here and in other similar places you mentioned. > For the ptrace case, I think we can probably can hit this, IIUC. > It might not apply to the prctl() case if TIF_SVE_EXEC was already > cleared during syscall entry (?), but as observed above this still > assumes that the SVE regs are discarded at the earliest opportinuty, > which might not be true in future. I would hope that if the SVE registers become actually invalid that'd also involve clearing TIF_SVE_FULL_REGS but yeah. > > @@ -952,8 +1022,9 @@ void do_sve_acc(unsigned int esr, struct pt_regs *= regs) > > fpsimd_flush_task_state(current); > > =20 > > fpsimd_to_sve(current); > Hmmm, there's a latent bug upstream here: if the WARN() fires then > sve_state is not safe to dereference. But we already did. > So perhaps this should have been something like: > if (!WARN_ON(test_and_set_thread_flag(TIF_SVE))) > fpsimd_to_sve(); > This might make sense as a separate Fixes patch to precede the series. Yes, that's definitely a separate fix I think. > > @@ -1181,7 +1261,7 @@ void fpsimd_update_current_state(struct user_fpsi= md_state const *state) > > get_cpu_fpsimd_context(); > > =20 > > current->thread.uw.fpsimd_state =3D *state; > > - if (system_supports_sve() && test_thread_flag(TIF_SVE)) > > + if (test_thread_flag(TIF_SVE_FULL_REGS)) > Unintentionally dropped system_supports_sve()? As above discussion on previous revisions suggested removing these - I actually spotted a few more that I'd missed here. > > @@ -241,7 +242,7 @@ static int preserve_sve_context(struct sve_context = __user *ctx) > > BUILD_BUG_ON(sizeof(ctx->__reserved) !=3D sizeof(reserved)); > > err |=3D __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved)); > > =20 > > - if (vq) { > > + if (vq && test_thread_flag(TIF_SVE_FULL_REGS)) { >=20 > Hmm, in theory we should be able to mark the regs as discardable once > they have been saved in the frame, though we never did that in the > past. Since we never zeroed the extra bits on signal handler entry > anyway, we could actually skip the zeroing for this specific case. Any > change of this sort should go in a separate patch though. Definitely a separate change, and probably sensible to split it from this series. > > @@ -587,7 +590,7 @@ static int setup_sigframe_layout(struct rt_sigframe= _user_layout *user, > > if (system_supports_sve()) { > > unsigned int vq =3D 0; > > =20 > > - if (add_all || test_thread_flag(TIF_SVE)) { > > + if (add_all || test_thread_flag(TIF_SVE_EXEC)) { >=20 > Doesn't this need to match the SVE reg dumping condition in > preserve_sve_context()? Posssibly, yeah. > > --- a/arch/arm64/kernel/syscall.c > > +++ b/arch/arm64/kernel/syscall.c > > @@ -186,7 +186,8 @@ static inline void sve_user_discard(void) > > if (!system_supports_sve()) > > return; > > - clear_thread_flag(TIF_SVE); > > + clear_thread_flag(TIF_SVE_EXEC); > > + clear_thread_flag(TIF_SVE_FULL_REGS); > (Assuming this will be refined in the next patch.) No - could you elaborate on what refinement you were expecting here? > > - if (test_thread_flag(TIF_SVE)) > > + if (test_thread_flag(TIF_SVE_EXEC)) > > vcpu->arch.flags |=3D KVM_ARM64_HOST_SVE_IN_USE; > Here we're setting TIF_SVE(_EXEC) to describe the guest context, while > stashing off the host's TIF_SVE(_EXEC) flag so we can get it back later. > Don't we need to do a similar things for TIF_SVE_FULL_REGS though? Yes, I think this needs redoing... --xXmbgvnjoT4axfJE Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmAjCboACgkQJNaLcl1U h9B3Ngf/QY4VkedhOmox1+Ige+FZ5ggVFneoATBn3FrJT+QZ2QXprYBEUrXWlmYl TtYrqYB6eZ9+r6wxs9kC1fZ1/aTigQS5O7Wy7U3N8txmLjHxv+4eAuvkx72KsQVQ FcLO9fwQglQwYjHTurhhApA9yMaLrpsSzcB7557CM6Yyu/XiosnRPK3LBX4AjdYY WBbWetmD23XFpfeW73BKDAVf7/ylvav6nBSHPhVuvzChz8toCcm+RzOn80DkxUag YkN6G0YsIF23uPGuTPD2Pp1EcvFCwQ0hWBWJAMDGUHetwEl1P6coY+dp7a6/XZNk A2EHubCCMd1HKb30upeb5P7FEdFJRQ== =0HMI -----END PGP SIGNATURE----- --xXmbgvnjoT4axfJE-- --===============5606235031401819871== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel --===============5606235031401819871==--