From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE88DC433DB for ; Sun, 24 Jan 2021 15:25:43 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8C0DA22DFA for ; Sun, 24 Jan 2021 15:25:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C0DA22DFA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=armlinux.org.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=YBjYaBURGU+o99mKXp6qUnjCohcx9JUdfDunHmnf0Xg=; b=S1DDr9gSDWBwR5HmzllfMIWVC P1XuiVr+lt83p3MS6fJ11TGApYqzz/A4i4v0PfD+z9WRI/66wEO6NpCx+ETGFMEsu5PMctvHfSnSX cGUMnI0NWj0kWlYMWG1YACiSmvBAwQV3bzVYrSkqrCXUwgVWpkEFS0L8U5zC76mnMsbSf/ogouI94 07QZ0q23wGe/vvvs3r6jAdk6KaNOvqSUrOFbqUyL3oKmmIx+ZMgpolGj6H764iV8hX15iRxNU3RYi +CiPzVpwx3ybM0vOQoskwe8edoT0bczcRHSqLjDr4qifAs+bZr9uw2odjAF2seNHVtoVEUKk8d+p8 zGATnFxVw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l3hFh-00050u-GR; Sun, 24 Jan 2021 15:24:33 +0000 Received: from pandora.armlinux.org.uk ([2001:4d48:ad52:32c8:5054:ff:fe00:142]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l3hFe-0004re-Of for linux-arm-kernel@lists.infradead.org; Sun, 24 Jan 2021 15:24:31 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=kIWmQfDlHCwTi0z23+rEiBJThftLDXXQ3ssN2356dfw=; b=zdidlzN6l3Fq2hOEznqQLWjU3 gBdX6fZ8vupRjzzk9MhoPTfP0hy8mnjMnRb6mBK5nKYP6os0grZZyHIfignucC7du0McEWxkjci5y Kla0eatko6167CGCKC69XuEGXCVcEssL8pM9u9k2suTcQzYPQ229EX90kIVhItlrJoFxs1IeuHzOf YOi2qT4l9sQekisX1MnPwEAvbKRFjH38NoWfCWaX/L1cW6g70ac1EFNi19zD3+7mNKqfNAbp5spxv 72Y52EmPjg5WE/XPShFkC/ZaTd9XgoTqWATm5XsQx+whfIeb7buCLMgFx6irfEi5t3lhjxshHi5Hx EimghHjAA==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:52162) by pandora.armlinux.org.uk with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l3hCh-0002Sy-Nv; Sun, 24 Jan 2021 15:21:27 +0000 Received: from linux by shell.armlinux.org.uk with local (Exim 4.92) (envelope-from ) id 1l3hCh-0001on-G4; Sun, 24 Jan 2021 15:21:27 +0000 Date: Sun, 24 Jan 2021 15:21:27 +0000 From: Russell King - ARM Linux admin To: Ard Biesheuvel Subject: Re: [PATCH] ARM: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 Message-ID: <20210124152127.GA1551@shell.armlinux.org.uk> References: <20210122152012.30075-1-ardb@kernel.org> <20210122161312.GS1551@shell.armlinux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210124_102430_902541_49B2240E X-CRM114-Status: GOOD ( 27.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Marc Zyngier , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sun, Jan 24, 2021 at 02:35:31PM +0100, Ard Biesheuvel wrote: > So what I think is happening is the following: > > In v5.7 and before, the set/way operations trap into KVM, which sets > another trap bit to ensure that second trap occurs the next time the > MMU is disabled. So if any cachelines are allocated after the call to > cache_clean_flush(), they will be invalidated again when KVM > invalidates the VM's entire IPA space. > > According to DDI0406C.d paragraph B3.2.1, it is implementation defined > whether non-cacheable accesses that occur with MMU/caches disabled may > hit in the data cache. > > So after v5.7, without set/way instructions being issued, the second > trap is never set, and so the only cache clean+invalidate that occurs > is the one that the decompressor performs itself, and the one that KVM > does on the guest's behalf at cache_off() time is omitted. This > results in clean cachelines being allocated that shadow the > mini-stack, which are hit by the non-cacheable accesses that occur > before the kernel proper enables the MMU again. > > Reordering the clean+invalidate with the MMU/cache disabling prevent > the issue, as disabling the MMU and caches first disables any mappings > that the cache could perform speculative linefills from, and so the > mini-stack memory access cannot be served from the cache. This may be part of the story, but it doesn't explain all of the observed behaviour. First, some backround... We have three levels of cache on the Armada 8040 - there are the two levels inside the A72 clusters, as designed by Arm Ltd. There is a third level designed by Marvell which is common to all CPUs, which is an exclusive cache. This means that if the higher levels of cache contain a cache line, the L3 cache will not. Next, consider the state leading up to this point inside the guest: - the decompressor code has been copied, overlapping the BSS and the mini-stack. - the decompressor code and data has been C+I using the by-MVA instructions. This should push the data out to DDR. - the decompressor has run, writing a large amount of data (that being the decompressed kernel image.) At this precise point where we write to the mini-cache, the data cache and MMU are both turned off, but the instruction cache is left enabled. The action around the mini-stack involves writing the following hex values to the mini-stack, located at 0x40e69420 - note it's alignment: ffffffff 48000000 09000401 40003000 00000000 4820071d 40008090 It has been observed that immediately after writing, reading the values read back have been observed to be (when incorrect, these are a couple of examples): ffffffff 48000000 09000401 ee020f30 ee030f10 e3a00903 ee050f30 (1) ffffffff 48000000 09000401 ee020f30 00000000 4820071d 40008090 (2) and after v1_invalidate_l1, it always seems to be: ee060f37 e3a00080 ee020f10 ee020f30 ee030f10 e3a00903 ee050f30 v1_invalidate_l1 operates by issuing set/way instructions that target only the L1 cache - its purpose is to initialise the at-reset undefined state of the L1 cache. These invalidates must not target lower level caches, since these may contain valid data from other CPUs already brought up in the system. To be absolutely clear about these two observed cases: case 1: write: ffffffff 48000000 09000401 40003000 00000000 4820071d 40008090 read : ffffffff 48000000 09000401 ee020f30 ee030f10 e3a00903 ee050f30 read : ee060f37 e3a00080 ee020f10 ee020f30 ee030f10 e3a00903 ee050f30 case 2: write: ffffffff 48000000 09000401 40003000 00000000 4820071d 40008090 read : ffffffff 48000000 09000401 ee020f30 00000000 4820071d 40008090 read : ee060f37 e3a00080 ee020f10 ee020f30 ee030f10 e3a00903 ee050f30 If we look at the captured data above, there are a few things to note: 1) the point at which we read-back wrong data is part way through a cache line. 2) case 2 shows only one value is wrong initially, mid-way through the stack. 3) after v1_invalidate_l1, it seems that all data is incorrect. This could be a result of the actions of v1_invalidate_l1, or merely due to time passing and there being pressure from other system activity to evict lines from the various levels of caches. Considering your theory that there are clean cache lines overlapping the mini-stack, and that non-cacheable accesses hit those cache lines, then the stmia write should hit those cache lines and mark them dirty. The subsequent read-back should also hit those cache lines, and return consistent data. If the cache lines are evicted back to RAM, then a read will not hit any cache lines, and should still return the data that was written. Therefore, we should not be seeing any effects at all, and the data should be consistent. This does not fit with the observations. If we consider an alternative theory - that there are clean cache lines overlapping the mini-stack, and non-cacheable accesses do not hit the cache lines. This means that the stmia write bypasses the caches and hits the RAM directly, and reads would also fetch from the RAM. The only way in this case that we would see data change is if the cache line were in fact dirty, and it gets written back to RAM between our non-cacheable write and a subsequent non-cacheable read. This also does not fit the observations, particularly case (2) that I highlight above where only _one_ value was seen to be incorrect. There is another theory along this though - the L1 and L2 have differing behaviour to non-cacheable accesses from L3, and when a clean cache line is discarded from L1/L2, it is placed in L3. For example, if non-cacheable accesses bypass L1 and L2 but not L3. Now we have a _possibility_ to explain this behaviour. Initially, L1/L2 contain a clean cache line overlapping this area. Accesses initially bypass the clean cache line, until it gets evicted into L3, where accesses hit it instead. When it gets evicted from L3, as it was clean, it doesn't get written back, and we see the in-DDR data. The reverse could also be true - L1/L2 could be hit by an uncached access but not L3, and I'd suggest similar effects would be possible. However, this does not fully explain case (2). So, I don't think we have a full and proper idea of what is really behind this. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel