From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CB63C5519F for ; Wed, 18 Nov 2020 12:52:41 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7A19024180 for ; Wed, 18 Nov 2020 12:52:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="hGWVP23H" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A19024180 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=xRM0vktEOOtjZ6mNRqiReh20YRuCKuwqRxVO9TgcgN8=; b=hGWVP23HA2RhCLjMs18WlWFD+ h5GlrUdE7Z8bWgkAZNdms2rp6LPvgJDJanV9D9LsIG/ZRmTnWzpivxhemgUF3TPDjc3KrYhjy6Frv fP4ETfSM7+gA8kN689+lo3GvuQ4lOcXAXEYpUaYv627LNURFlS4eK9S6Dku7ZL4zhmT84G9pSBoj3 8p+dYWkL6Pazh5suFlBXt9rgjTyTAiDXt2ze+H2EZl2no/QZd1exfBicDdRZqzZC87uKLBJjUjJ03 pPVXUd6vbXLqBSFGfDy/aNJ41CQt5/isArETN1+X6lxZkaeUFONV7sycgKRySdZg7tDP/wySp1J8F 7YaR4FeSQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kfMwT-0003R1-LH; Wed, 18 Nov 2020 12:52:09 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kfMwQ-0003Qe-8J for linux-arm-kernel@lists.infradead.org; Wed, 18 Nov 2020 12:52:07 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7D57E101E; Wed, 18 Nov 2020 04:52:01 -0800 (PST) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4DFB33F70D; Wed, 18 Nov 2020 04:52:00 -0800 (PST) Date: Wed, 18 Nov 2020 12:51:56 +0000 From: Dave Martin To: Mark Brown Subject: Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return Message-ID: <20201118125155.GE6882@arm.com> References: <20201106193553.22946-1-broonie@kernel.org> <20201106193553.22946-2-broonie@kernel.org> <20201113184855.GF3212@gaia> <20201113201328.GG4828@sirena.org.uk> <20201116175938.GA6882@arm.com> <20201116194551.GG4739@sirena.org.uk> <20201117181618.GC6882@arm.com> <20201117230338.GI5142@sirena.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201117230338.GI5142@sirena.org.uk> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201118_075206_438798_11C47121 X-CRM114-Status: GOOD ( 70.53 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julien Grall , Catalin Marinas , Zhang Lei , Julien Grall , Will Deacon , linux-arm-kernel@lists.infradead.org, Daniel Kiss Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Nov 17, 2020 at 11:03:38PM +0000, Mark Brown wrote: > On Tue, Nov 17, 2020 at 06:16:20PM +0000, Dave Martin wrote: > > > I think that thinking of TIF_SVE_NEEDS_FLUSH as a distinct toggle may > > not help helping here -- it feels more like an override or a special > > case. > > Well, we need a flag somewhere - it's a question of where we put it. > > > I also wonder whether increasing the amount of commenting is actually a > > trap here: there's a risk of piling new confusion on old. The actual > > problems might be that there is too much commenting already, not too > > little... > > It's an interesting balance. Having worked with the code it's > definitely overwhelmingly more commented than the rest of the > architecture, I think because it *is* a bit fiddly and has even fewer > people who've looked at it in detail than the rest of it. This can make > it look much more complicated than it actually appears to be but on the > other hand it may be that other bits of the code should be more like the > FP code rather than the other way around. > > > Taking a step back, this is how I think it was all supposed to work, > > prior to this series. This time, I'll try to describe the flags > > separately rather than enumerating all the combinations (which I think > > may have been a misstep): > > I tend to agree that thinking about the combinations when you don't have > to makes it harder - in particular with TIF_FOREIGN_FPSTATE. > > > At this level of abstraction, these two concepts are entirely > > independent: > > > !SVE SVE > > +---------------+---------------+ > > !FFP | Track: FPSIMD | Track: SVE | > > | Where: regs | Where: regs | > > +---------------+---------------+ > > FFP | Track: FPSIMD | Track: SVE | > > | Where: memory | Where: memory | > > +---------------+---------------+ > > Yes. > > > I think where we get in a tangle is that TIF_SVE conflates mechanism > > (whether the kernel tracks the full SVE state) with policy (whether > > userspace cares about the full SVE state). At present, we can't > > describe the situation where we track SVE state but userspace doesn't > > care, so we either have to do nothing at all and just leave SVE on all > > the time, or we have to disable SVE at the first > > opportunity. > > I think this is the current conflation that is causing issues but I > wouldn't state it quite like that, I don't know that *care* is quite the > right word for what userspace is doing here. There's two pieces, > there's the SVE state in the registers and there's the ability to > execute SVE instructions without trapping. When userspace executes a > SVE instruction we enable execution and also start tracking the register > state which we currently track in the single TIF_SVE flag. When > userspace does a syscall the extra register state becomes undefined > (realistically we'll probably have to continue resetting it to zero as With regard to "don't care", I used this wording what userspace is doing is saying "I don't need what's in the SVE regs, feel free to throw it away if it helps." Do you see anything wrong with this desciption? Maybe there's a subtlety here I've forgotten. "Don't care" seemed a good fit for this concept, but I'm not attached to this form or words per se. (Note, while userspace should in theory tolerate the SVE regs becoming garbage at syscall exit, as soon as userspace executes any instructions there might be the need to zero bits VL-1:128 of one or more Z-regs, since that's what the SVE programmer's model says happens when writing a V-reg. With no FPSIMD trap, there's no way to be 100% certain this is not needed. So, when taking an SVE trap after entering userspace we would have to zero the high bits of the Z-regs. The P-regs and FFR could theoretically remain garbage -- but it's cleaner and more testable to zero them. We're paying the penalty of a trap anyway in this scenario. We could trap FPSIMD too of source, but userspace uses FPSIMD too often for that to be sane thing to do.) > we currently do) and we currently don't the ability to track the ability > to execute the instructions separately to discarding that state so we > just disable both at once. This means that the syscall overhead is > amplified for any task that mixes SVE and syscalls. Sure, though I think the difference between your model and mine is just semantics. If enter userspace with SVE untrapped, then we have no choice but to track the full SVE regs from that point, so the trap control and state tracking are still not really independent concepts. If you prefer to think of TIF_SVE as purely a trap control and define another flag for "has full SVE state" then I don't have a problem with that; there is a choice of ways to slice up the multiple meanings TIF_SVE currently has. In these patches TIF_SVE_NEEDS_FLUSH means "SVE untrapped, but don't track full SVE state", which is arguably confusing when we have TIF_SVE clear (suggesting trapping). > > > So, what we need here is a way (i.e., a new state) to indicate that > > userspace doesn't care about the extra SVE state. Obviously this not > > orthogonal to TIF_SVE: if SVE state is not tracked, then there is > > no state _to_ care about. > > > As I understand it, this is what the TIF_SVE_NEEDS_FLUSH flag > > describes. It separates policy (userspace's policy in this case -- > > i.e., whether it needs the full SVE state) from mechanism (whether the > > That's approximately the roles, yes. The implementation is currently > different in that both SVE flags are separately wrapping in the > execution permission. Ack > > kernel keeps track of the full SVE state). Perhaps we can come up with > > another name, such as TIF_SVE_MAY_DISCARD. > > Naming is definitely a thing here - _MAY_DISCARD definitely doesn't seem > right. Feel free to suggest ;) > > Whatever we call this new intermediate state, we have a something like > > this. > > > > !SVE ??? SVE > > +---------------+---------------+---------------+ > > | Need: FPSIMD | Need: FPSIMD | Need: SVE | > > !FFP | Track: FPSIMD | Track: SVE | Track: SVE | > > | Where: regs | Where: regs | Where: regs | > > +---------------+---------------+---------------+ > > | Need: FPSIMD | Need: FPSIMD | Need: SVE | > > FFP | Track: FPSIMD | Track: SVE | Track: SVE | > > | Where: memory | Where: memory | Where: memory | > > +---------------+---------------+---------------+ > > > > (FFP = TIF_FOREIGN_FPSTATE, SVE = TIF_SVE) > > I think I get what you mean here but I'm finding the "need" a bit > confusing here and in the following discussion. I think instead of need > and track it might be clearer to say valid registers and execute - it's > which registers are have defined data in them and if userspace can > execute SVE instructions without trapping. So, "need" means "I need this state preserved across paths through the kernel". But I think your concept is equally valid, and less abstract (which may well a good thing.) > > However, the middle column is a bit special: we can't run in userspace > > in this state, since once userspace touches the regs we no longer know > > whether it cares about the data in them. Also, it's a bit pointless > > representing this state in memory, since the main point of having it is > > to _avoid_ dumping to memory. And we do need to do a bit of work to > > sanitise the state -- unless we always do it on the syscall entry path. > > While we can't be in userspace in the middle column I'm not sure it's > entirely true to say that we can't usefully represent things here in > memory if we think about the execute part of things as well. If we > schedule another task we can still track if we want to have SVE > instruction execution on return to userspace even if the only data we're > storing is the FPSIMD subset, avoiding a second SVE trap on the first > SVE instruction after a syscall while also storing less data in memory. Sure, we could have an additional state here. I'm not convinced it's useful, or not expressible in another way, but if it keeps the middle column more orthogonal from TIF_FOREIGN_FPSTATE and so makes the code simpler to understand, then that could be a good thing. We'd need to see what it looks like, I guess. > As noted in the cover letter for the series we're probably going to find > we want to reset the execute permission intermittently on syscall > workloads in order to ensure that tasks that execute SVE infrequently > don't always have SVE enabled but I think we want to sort out what we're > doing with the basic case first. Ack, though whether that knowledge is represented by the middle column directly or we maintain some additional bookkeeping data somewhere seems a moot point. > > Effort needed > > to load regs > > !SVE ??? SVE > > +---------------+ +---------------+ > > | Need: FPSIMD | | Need: SVE | none > > !FFP | Track: FPSIMD +---------------+ Track: SVE | ^ > > | Where: regs | N: FPSIMD | Where: regs | : > > +---------------+ T: undecided +---------------+ some > > | Need: FPSIMD | W:regs+cleanup| Need: SVE | : > > FFP | Track: FPSIMD +---------------+ Track: SVE | v > > | Where: memory | | Where: memory | full reload > > +---------------+ +---------------+ > > ... > > > And finally, on sched-out (or any other fpsimd_save() event): > > > +---------------+ +---------------+ > > | | | | > > | : +---------------+ : | > > | : | | : | > > +--:------------+ +------------:--+ > > | : <======= [*] =======> : | > > | V +---------------+ V | > > | | | | > > +---------------+ +---------------+ > > > ...at least in threory. > > I agree up to this final schedule case since here we'd either be taking > a decision to force userspace to trap on the next SVE instruction or to > store and reload full SVE state in order to avoid that neither of which > seems ideal. With the current patch if TIF_SVE_NEEDS_FLUSH is set we > only save the FPSIMD state and then restore it and switch to TIF_SVE > when scheduling the task again - this is masked a bit in the patch since > it does not update the existing code in fpsimd_save() as TIF_SVE is not > set when setting TIF_SVE_NEEDS_FLUSH so there's no change in that path. I think I missed this point. It doesn't sound quite right to me: how will we ever turn SVE persistently off for a task? I'll need to go take another look at the actual code, since I may be making some wrong assumptions about what it's doing. I had thought that the purpose of this patch was to improve the low- overhead paths -- so with no context switch we can spin between the kernel and userspace without paying the code of extra traps of dumping/ reloading the regs as we do now. Once we enter a high-overhead path, we might as well be brutal. If we save the full SVE regs, and defer the decision on disabling SVE until the next sched-in then that's probably reasonable: we could leave SVE enabled if the regs turn out not to need reloading (as when resuming the task after temporarily running some kernel thread -- we reach do_notify_resume() with TIF_FOREIGN_FPSTATE clear). If so, the bottom- middle box does seem to describe the state of that task while scheduled out. > We could potentially use this as a heuristic to decide that we want to > drop SVE on a syscall that schedules rather than after some number of > syscalls (which we don't currently do but was mentioned as future work > in the cover letter and has precedents on other architectures) but that > doesn't feel great, the syscall counter feels more robust. Maybe I just misunderstood what the code was doing here. My own view is that we should disable SVE on sched-out (or possibly at fpsimd_restore_current_state()) rather than doing anything more clever just yet -- because these options remain close to the current behaviour while still addressing the excessive trapping issue that prompted this change. Then we could improve the decision-making logic (adding a counter or whatever) in subsequent patches. > > While TIF_FOREIGN_FPSTATE and TIF_SVE remain orthogonal to each other, > > this new state is not orthogonal to either. I don't think that's a bug, > > providing that we understand its role. Since it's a 5th state, we are > > going to burn a flag on it, but this doesn't mean that it needs to (or > > even should) have a meaning that's fully independent of the others. > > Rather it may be useful precisely because it's not independent -- it > > decribes a situation which is currently an overlap between the other > > states. > > > I'm pretty sure this is more or less what the proposed patches do, > > but I'll review again with this picture in mind. > > It differs in the handling of scheduling while in a syscall but > otherwise yes, I think that's pretty much it. > > With that in mind and thinking on the subthread with Catalin if we > restructure the SVE flags such that we have separate SVE_EXEC and > SVE_REGS flags (bad but hopefully comprehensible names short enough for > drawing a table) for the execute and storage permissions we end up with: > > !SVE/SVE_REGS SVE_EXEC SVE_REGS+SVE_EXEC > +---------------+---------------+---------------+ > | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE | > !FFP | Trap: Yes | Trap: No | Trap: No | > | Where: regs | Where: regs | Where: regs | > +---------------+---------------+---------------+ > | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE | > FFP | Trap: Yes | Trap: No | Trap: No | > | Where: memory | Where: memory | Where: memory | > +---------------+---------------+---------------+ > > (the first column being either no SVE flags set or only SVE_REGS set.) > All entries should go to one of the !FFP cases. Entry from SVE traps or > syscalls with SVE_EXEC set should go to SVE_EXEC, other entries with > SVE_EXEC already set should go to SVE_REGS+SVE_EXEC and all other > entries should go to !SVE/SVE_REGS. > > Currently TIF_SVE maps onto SVE_REGS+SVE_EXEC and TIF_SVE_NO_FLUSH maps > onto SVE_EXEC. Agreed. I don't see a big problem with this, and it removes the weirdness where we turn off the flag that appears to mean ("SVE enabled") when in the middle column. I still think it's a bit of a moot point, since we still have to resolve the middle states to a different state before we can safely enter userspace -- but that's not so different from the TIF_FOREIGN_FPSTATE case. > > Note: Making things symmetric > > ----------------------------- > > > An alternative behaviour for syscalls would be: > > > TIF_SVE > > !TIF_SVE ,-------------------------. > > ,--------------------------. > > > > +---------------+ +---------------+ > > | | | | > > | +---------------+ | > > | ======> <====== | > > +---------------+ +---------------+ > > | | | | > > | +---------------+ | > > | | | | > > +---------------+ +---------------+ > > > This shouldn't change the underlying behaviour other than way certain > > if() conditions would be expressed. This model may make it clearer > > that the TIF_SVE versus !TIF_SVE decision doesn't occur until moving > > out of the middle state. This gives a natural place to turn SVE on > > speculatively for example, when a syscall occurs with !TIF_SVE. > > (Whether this is useful is we could ever make an accurate enough guess > > is another matter... but this still might make the code a bit easier > > to conceptualise.) > > I worry that this might make it harder to follow in that you'd have to > understand why we're considering enabling SVE for tasks that never tried > to do anything with it. The decision on where to go from the middle box can be anything we like. In the actual implementation, the obvious thing to do is to always go back to the left-hand box if TIF_SVE is clear. The middle box just indicates the _potential_ for a decision. But we don't always have to use that opportunity. You could imagine logic that occasionally disables TIF_SVE in order to gather stats on the tasks's use of SVE, but where if we don't consider the threshold of evicence for disabling SVE persistently to be met yet then we turn SVE back on regardless for now. Of couse, since this flexibility is probably not all that useful in practice, it doesn't necessarily have to be expressed in the code. The picture is more of an abstraction. HTH Cheers ---Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel