From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9928C4363C for ; Sun, 4 Oct 2020 20:52:40 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5570F206DD for ; Sun, 4 Oct 2020 20:52:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="VEL552Kq"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="UeOMILKh" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5570F206DD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=IE0KgrX3Gnx2l4rOGZH+nOGuXFewaAPyy7zMyVGksmQ=; b=VEL552KqGU2+GIi6XHO0AWYHB 1UAfXeiYplfzqAaEAwX+RsatJG2dJWC9gj5MDwR6q1pVz8pxd6W1PvV8wFZk6uptATIjYBu5mK28q xYHzugNUGHyx+hrF3iZydDU32iEEbXLrJ1FSWSfw/8mopm4PSEvMTQeiOdcn2V51GYML7vRCH0C0d sZ/V3Hcb7keVhZUqL7uFHZY/2DZBhTaVc1D8r9gAOYZppB1aX7b/7Y34+3w+9TfK+U1u/oAwNfRF4 ZFeH/UvtHCykTaDNgtXo4vPz3PvmbGRPwxJt13NxVOqf//N3b8d5htkt0v6bMuxnOFhQs5F0Ln+7L 4ZA74MEbA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPAxx-0000D0-4J; Sun, 04 Oct 2020 20:50:45 +0000 Received: from mail-lf1-x144.google.com ([2a00:1450:4864:20::144]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPAxt-0000Cd-MF for linux-arm-kernel@lists.infradead.org; Sun, 04 Oct 2020 20:50:43 +0000 Received: by mail-lf1-x144.google.com with SMTP id r127so3926154lff.12 for ; Sun, 04 Oct 2020 13:50:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7Au8I19YtMExEIZ/Jfxk865s5QXsA9hm7VczACNkyO4=; b=UeOMILKhewcGtosX2oYLMdphfPmPDcd1es5Ok8o3/Ot3qSIX0pBDpzUm5O22VM/Z8g HCkjgmtxJuX+jWyYqjUYMqRFponwQT+D3HsigQsmC/MUDSceIhana/lmgKIvyWbTibh5 Zjfs8WrmtdCIrLCnEe5WQD+QnOgPKG9RBUtanUwEYnZIZnAysmLkSufrEe9p5k2cbz0K dtgsVLwgDIuu4cogQaUIU5jNaZ25EGcEK4R7QLg6ddSb7ZcK7DENJtiwj/jcxJG7XEpr KwapOxZOdEk35lxpAObbPTDcicQsFh0YO+jxHy5QhHGJuuDDrLoecAOHmeF7stMqVQQk McFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7Au8I19YtMExEIZ/Jfxk865s5QXsA9hm7VczACNkyO4=; b=DdWcl3on2PWXjVpwkNHgIXcMfbxFVoe1okqsBj5KdolREAda7/dgeww0wMj0AUe/mD mPAVtyK7h6dnVIbtttb8KElLN0vWp72CXAmVmhexR6ENby/dkIuql+z3LX7IpEAJYchr fPTN/mG3MpBGGP46JbfO7ZVDuHpPp9eJrZ4JYrEea6PoB6BZgEYNuXI/AZI/RL3E8Q1P JGbS3hutI22iEoRYMJtRKfn48/XfDgvF3zW402T4kl1lZ2j411QY8PViSKpBJ/slO4Wa 0JxmJr4Qz1dkylZQODoNWixfHrqgLr/xImRUvBNp9uOYZPaFpPOfee89bGcUbGYkb/kL ge/w== X-Gm-Message-State: AOAM533+NHKOSf4ziHRpB5BvgyfvnCSVp6NGvqsBDbOG/3VCKx9XxwTo A2GADjEc3PlCiLUiYVB2KTZ549vurvAKpPjzxYyDww== X-Google-Smtp-Source: ABdhPJzjLJNSj7yHgD3sMN03SdZu9Qzdu5kTVXgCriD0mepoGFX2PupGRrATRmi0FXttFi2G6ri5MZlt4CV2QLtMw4k= X-Received: by 2002:a19:6419:: with SMTP id y25mr4069203lfb.333.1601844638911; Sun, 04 Oct 2020 13:50:38 -0700 (PDT) MIME-Version: 1.0 References: <20201001152232.274367-1-linus.walleij@linaro.org> <20201001152232.274367-2-linus.walleij@linaro.org> In-Reply-To: From: Linus Walleij Date: Sun, 4 Oct 2020 22:50:28 +0200 Message-ID: Subject: Re: [PATCH 1/6 v14] ARM: Handle a device tree in lowmem To: Ard Biesheuvel X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201004_165042_008317_E0A2ED67 X-CRM114-Status: GOOD ( 41.91 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Fainelli , Arnd Bergmann , Abbott Liu , Russell King , Mike Rapoport , Andrey Ryabinin , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Oct 2, 2020 at 1:01 PM Ard Biesheuvel wrote: > On Thu, 1 Oct 2020 at 17:22, Linus Walleij wrote: > OK, so if I am understanding this correctly, the root problem is that > the kernel unmaps the memory that the attached DTB resides in, right? Yups, or, well in some places the kernel knows that the DT is there so it sets up two 1MB sections over it, and then it assumes "no-one is ever gonna touch those two section mappings, OK". It's this code from head.S: /* * Then map boot params address in r2 if specified. * We map 2 sections in case the ATAGs/DTB crosses a section boundary. */ mov r0, r2, lsr #SECTION_SHIFT movs r0, r0, lsl #SECTION_SHIFT subne r3, r0, r8 addne r3, r3, #PAGE_OFFSET addne r3, r4, r3, lsr #(SECTION_SHIFT - PMD_ORDER) orrne r6, r7, r0 strne r6, [r3], #1 << PMD_ORDER addne r6, r6, #1 << SECTION_SHIFT strne r6, [r3] Then these two blocks end up in lowmem, and as the kernel clears the PMD:s in prepare_page_table(), in mm/mmu.c is clears the two PMDs covering the device tree under some circumstances. [Ard] > So how is it possible that the kernel does not fit in the first > memblock? Doesn't that mean that we are using memory that is not > available to begin with? > > Do you have a dump of the memory layout on the platform in question? > > How do 0xc3000000/0xc3200000 map onto physical addresses, and how are > those described? The kernel per se actually fits inside the first memblock, but not the attached DTB which ends up above the *uncompressed* kernel, so there is always a gap between the end of the kernel text and the DTB. See this illustration: https://dflund.se/~triad/images/decompress-5.jpg This is how it looks with all my funky debug code (that I am in the process of merging by the way, so it is easier to debug things like this): This platform boots using fastboot: # fastboot --base 40200000 --cmdline "console=ttyMSM0,115200,n8" boot zImage So we load the zImage at 0x40200000 which is where the physical memory starts on this platform because the modem is using 0x0-0x41ffffff. Then I get these debug prints: DTB:0x410A09C0 (0x000051B5) C:0x402080C0-0x410A5C20->0x42217100-0x430B4C60 DTB:0x430AFA00 (0x00005242) This means the DTB is first found in physical memory at 0x410A09C0 then we relocate the kernel from 0x402080C0 to 0x42217100 to make space for the uncompressed kernel above it and after that the DTB is found located at 0x430AFA00. This is using attached device tree so that is why the DTB is right after the compressed kernel in memory. Uncompressing Linux... done, booting the kernel. MMU enable: 0x40300000->0x40300000 Kernel RAM: 0x40000000->0xc0000000 Kernel RAM: 0x40100000->0xc0100000 Kernel RAM: 0x40200000->0xc0200000 (...) Kernel RAM: 0x42500000->0xc2500000 Kernel RAM: 0x42600000->0xc2600000 ATAG/DTB : 0x43000000->0xc3000000 ATAG/DTB : 0x43100000->0xc3100000 First is the 1:1 mapping for the MMU enable code, then all the kernel text segment mappings. Then the DTB is mapped in using the linear kernel map and as you can see that ends up to be at 0xc30AFA00 so we map the 2 MB at 0x43000000 and 0x43100000. That is close to the kernel... so indeed: Clear PMDs from 0x00000000 to 0xb6e00000 Clear PMDs from 0xbf000000 to 0xbf000000 Clear PMDs from 0xbf000000 to 0xc0000000 Memblock[0].base: 0x40200000 size: 0x02c00000, end: 0x42e00000 Clear PMDs from 0xc2e00000 to 0xe0800000 (lowmem) ATAGs/DTB found in lowmem, skip clearing PMD @0xc3000000 ATAGs/DTB found in lowmem, skip clearing PMD @0xc3200000 Ooops. (The last two messages comes from this patch, the rest is debug prints I added.) The first memblock is 44 MB and the linear map ends at 0xc2e00000 which will cover the whole uncompressed kernel, but then we come to this code in prepare_page_table(): /* * Clear out all the kernel space mappings, except for the first * memory bank, up to the vmalloc region. */ for (addr = __phys_to_virt(end); addr < VMALLOC_START; addr += PMD_SIZE) pmd_clear(pmd_off_k(addr)); This clears out all mappings from the end of the first memblock to VMALLOC_START. Including the PMD used by the DTB. Ooops. > If that memory is not actually available to begin with, this fix > papers over the problem, and we should be fixing this in the > decompressor instead. The kernel actually fits in the first memblock, but then it clears the lowmem right below itself and by that point the MMU has figured out that there is another memblock below it that it will happily use for lowmem and just goes ahead and wipes that. The code has no awareness that there might be a DTB there. The decompressor seems to have always been blissfully ignorant about what happens with the attached DTB after it just pushed it a bit upwards in memory (if relocating) it just passes the location in r2 in accordance with the boot specification, to me it seems more like something the kernel proper should handle. I don't think that loading the DTB separately to some high address as advocated by many peope is much better. It can create the same problem if loaded in the wrong place and possibly be placed in other dangerous areas like inside the VMALLOC area where it can get its mappings destroyed at any instance. (We don't check for that either.) To me it seems like it was always like this and people have just been trial-and-erroring by putting the DTB at address until it works. The way attached DTB works is that the entire compressed kernel and the attached DTB usually ends up inside the first memblock as well so the initialization of the kernel lowmem will not wipe its two PMDs. The problem with that approach is that if your kernel suddenly gets bigger - like when enabling KASan - then this can occur, especially with attached device trees. I can mitigate the problem for example by just loading the kernel at 0x50000000 instead: DTB:0x50EA09C0 (0x000051B5) C:0x500080C0-0x50EA5C20->0x52217100-0x530B4C60 DTB:0x530AFA00 (0x00005242) Uncompressing Linux... done, booting the kernel. MMU enable: 0x50300000->0x50300000 Kernel RAM: 0x50000000->0xc0000000 Kernel RAM: 0x50100000->0xc0100000 Kernel RAM: 0x50200000->0xc0200000 (...) Kernel RAM: 0x52500000->0xc2500000 Kernel RAM: 0x52600000->0xc2600000 ATAG/DTB : 0x53000000->0xc3000000 ATAG/DTB : 0x53100000->0xc3100000 (...) Clear PMDs from 0x00000000 to 0xb6e00000 Clear PMDs from 0xbf000000 to 0xbf000000 Clear PMDs from 0xbf000000 to 0xc0000000 Memblock end is above arm_lowmem_limit (0x60000000) Memblock[0].base: 0x50000000 size: 0x10000000, end: 0x60000000 Clear PMDs from 0xd0000000 to 0xd0800000 (lowmem) The DTB is at physical 0x530AFA00 which is virtual 0xC30AFA00 and since the first memblock now is bigger, the kernel and the DTB fits inside it and the DTB mapping is between the kernel and the lowmem mappings, so the PMDs will not be cleared. I think "most" cases work for people because their first memblock is big like this (256 MB) and their kernel is relatively small compared to that so the uncompressed and compressed kernel, and the attached DTB all fit inside it. In my case the first memblock was "just" 44 MB and that was just small enough to trigger this. So to summarize: - The kernel decompressor just moves the kernel and the attached DTB upward in memory so it will not be overwritten by the decompressor. - This will sometimes make part of the compressed kernel and DTB end up above the first memblock. - This works because there is more memory above the first memblock, in an adjacent memblock. (The decompressor has always assumed so much) - The kernel then puts the lowmem mappings from the end of the first memblock to VMALLOC_START and clear the PMDs - If the DTB has been pushed up to lowmem, the two PMDs over the DTB will be cleared, resulting in a crash. Maybe I should just put all of this into the commit message so people can see the mess :D Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel