From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35111) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YsfYF-0006vH-DG for qemu-devel@nongnu.org; Wed, 13 May 2015 18:54:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YsfYC-0005mc-4d for qemu-devel@nongnu.org; Wed, 13 May 2015 18:54:55 -0400 Received: from eddie.linux-mips.org ([148.251.95.138]:48087 helo=cvs.linux-mips.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YsfYB-0005mY-RZ for qemu-devel@nongnu.org; Wed, 13 May 2015 18:54:52 -0400 Received: (from localhost user: 'macro', uid#1010) by eddie.linux-mips.org with ESMTP id S27012138AbbEMWyrxUgTM (ORCPT ); Thu, 14 May 2015 00:54:47 +0200 Date: Wed, 13 May 2015 23:54:47 +0100 (BST) Sender: "Maciej W. Rozycki" From: "Maciej W. Rozycki" In-Reply-To: <5553C3C2.9070101@twiddle.net> Message-ID: References: <1431531457-17127-1-git-send-email-yongbok.kim@imgtec.com> <1431531457-17127-3-git-send-email-yongbok.kim@imgtec.com> <5553A5C4.6030902@twiddle.net> <5553ACF2.7050708@twiddle.net> <5553BB40.7050706@imgtec.com> <5553C3C2.9070101@twiddle.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Re: [Qemu-devel] [PATCH v3 2/2] target-mips: Misaligned memory accesses for MSA List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: Yongbok Kim , peter.maydell@linaro.org, Leon Alrae , qemu-devel@nongnu.org, afaerber@suse.de On Wed, 13 May 2015, Richard Henderson wrote: > >> I believe the problem is that MSA vector register's size is 16-bytes > >> (this DATA_SIZE isn't supported in softmmu_template) and MSA load/store > >> is supposed to be atomic. > > > > Not really AFAICT. Here's what the specification says[1]: > > > > "The vector load instruction is atomic at the element level with no > > guaranteed ordering among elements, i.e. each element load is an atomic > > operation issued in no particular order with respect to the element's > > vector position." > > > > and[2]: > > > > "The vector store instruction is atomic at the element level with no > > guaranteed ordering among elements, i.e. each element store is an atomic > > operation issued in no particular order with respect to the element's > > vector position." > > > > so you only need to get atomic up to 8 bytes (with LD.D and ST.D, less > > with the narrower vector elements), and that looks supported to me. > > There's "atomic" in the transactional sense, and then there's "atomic" in the > visibility to other actors on the bus sense. > > Presumably Leon is talking about the first, wherein we must ensure all writes > to both pages must succeed. Which just means making sure that both pages are > present and writable before modifying any memory. I don't think we have. The specification is a bit unclear I must admit and it also defines the details of vector load and store operations as implementation dependent, so there's no further clarification. However any unaligned loads or stores that cross a data-bus-width boundary require two bus cycles to complete and therefore by definition are not atomic in the visibility to other actors on the bus sense. Therefore the only atomicity sense that can be considered here is I believe transactional, on the per-element basis as this is what the specification refers to. Then the exact semantics of loads and stores is left up to the implementer, so for example ST.H can be implemented as 2 doubleword-store transactions, or 4 word-store transactions (that wouldn't be allowed with ST.D), or 8 halfword-store transactions (that wouldn't be allowed with ST.W), but not 16 byte-store transactions (that would be allowed with ST.B). Consequently I believe only individual vector element writes (or reads, for that matter) are required to either successfully complete or completely back out, and a TLB, an address error or a bus error exception (or perhaps a hardware interrupt exception even) happening in the middle of a vector load or store instruction may observe the destination vector register or memory respectively partially updated with elements already transferred (but not an individual element partially transferred). That would be consistent with what happens with the other multi-word transfer instructions I mentioned when they get interrupted on the way (yes, they do allow hardware interrupts to break them too) and likely easier to implement as well. That's just my intepretation though. Perhaps the specification needs a further clarification. Maciej