From: Catalin Marinas <catalin.marinas@arm.com> To: Guanghui Feng <guanghuifeng@linux.alibaba.com> Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com Subject: Re: [PATCH RFC v1] arm64: mm: change mem_map to use block/section mapping with crashkernel Date: Wed, 13 Apr 2022 17:53:38 +0100 [thread overview] Message-ID: <YlcAEo3lpKJg8HJf@arm.com> (raw) In-Reply-To: <1649754476-8713-1-git-send-email-guanghuifeng@linux.alibaba.com> On Tue, Apr 12, 2022 at 05:07:56PM +0800, Guanghui Feng wrote: > There are many changes and discussions: > commit 031495635b46 > commit 1a8e1cef7603 > commit 8424ecdde7df > commit 0a30c53573b0 > commit 2687275a5843 > > When using DMA/DMA32 zone and crashkernel, disable rodata full and kfence, > mem_map will use non block/section mapping(for crashkernel requires to shrink > the region in page granularity). But it will degrade performance when doing > larging continuous mem access in kernel(memcpy/memmove, etc). > > This patch firstly do block/section mapping at mem_map, reserve crashkernel > memory. And then walking pagetable to split block/section mapping > to non block/section mapping [only] for crashkernel mem. We will accelerate > mem access about 10-20% performance improvement, and reduce the cpu dTLB miss > conspicuously on some platform with this optimization. Do you actually have some real world use-cases where this improvement matters? I don't deny that large memcpy over the kernel linear map may be slightly faster but where does this really matter? > +static void init_crashkernel_pmd(pud_t *pudp, unsigned long addr, > + unsigned long end, phys_addr_t phys, > + pgprot_t prot, > + phys_addr_t (*pgtable_alloc)(int), int flags) > +{ > + phys_addr_t map_offset; > + unsigned long next; > + pmd_t *pmdp; > + pmdval_t pmdval; > + > + pmdp = pmd_offset(pudp, addr); > + do { > + next = pmd_addr_end(addr, end); > + if (!pmd_none(*pmdp) && pmd_sect(*pmdp)) { > + phys_addr_t pte_phys = pgtable_alloc(PAGE_SHIFT); > + pmd_clear(pmdp); > + pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN; > + if (flags & NO_EXEC_MAPPINGS) > + pmdval |= PMD_TABLE_PXN; > + __pmd_populate(pmdp, pte_phys, pmdval); > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); The architecture requires us to do a break-before-make here, so pmd_clear(), TLBI, __pmd_populate() - in this order. And that's where it gets tricky, if the kernel happens to access this pmd range while it is unmapped, you'd get a translation fault. -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com> To: Guanghui Feng <guanghuifeng@linux.alibaba.com> Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com Subject: Re: [PATCH RFC v1] arm64: mm: change mem_map to use block/section mapping with crashkernel Date: Wed, 13 Apr 2022 17:53:38 +0100 [thread overview] Message-ID: <YlcAEo3lpKJg8HJf@arm.com> (raw) In-Reply-To: <1649754476-8713-1-git-send-email-guanghuifeng@linux.alibaba.com> On Tue, Apr 12, 2022 at 05:07:56PM +0800, Guanghui Feng wrote: > There are many changes and discussions: > commit 031495635b46 > commit 1a8e1cef7603 > commit 8424ecdde7df > commit 0a30c53573b0 > commit 2687275a5843 > > When using DMA/DMA32 zone and crashkernel, disable rodata full and kfence, > mem_map will use non block/section mapping(for crashkernel requires to shrink > the region in page granularity). But it will degrade performance when doing > larging continuous mem access in kernel(memcpy/memmove, etc). > > This patch firstly do block/section mapping at mem_map, reserve crashkernel > memory. And then walking pagetable to split block/section mapping > to non block/section mapping [only] for crashkernel mem. We will accelerate > mem access about 10-20% performance improvement, and reduce the cpu dTLB miss > conspicuously on some platform with this optimization. Do you actually have some real world use-cases where this improvement matters? I don't deny that large memcpy over the kernel linear map may be slightly faster but where does this really matter? > +static void init_crashkernel_pmd(pud_t *pudp, unsigned long addr, > + unsigned long end, phys_addr_t phys, > + pgprot_t prot, > + phys_addr_t (*pgtable_alloc)(int), int flags) > +{ > + phys_addr_t map_offset; > + unsigned long next; > + pmd_t *pmdp; > + pmdval_t pmdval; > + > + pmdp = pmd_offset(pudp, addr); > + do { > + next = pmd_addr_end(addr, end); > + if (!pmd_none(*pmdp) && pmd_sect(*pmdp)) { > + phys_addr_t pte_phys = pgtable_alloc(PAGE_SHIFT); > + pmd_clear(pmdp); > + pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN; > + if (flags & NO_EXEC_MAPPINGS) > + pmdval |= PMD_TABLE_PXN; > + __pmd_populate(pmdp, pte_phys, pmdval); > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); The architecture requires us to do a break-before-make here, so pmd_clear(), TLBI, __pmd_populate() - in this order. And that's where it gets tricky, if the kernel happens to access this pmd range while it is unmapped, you'd get a translation fault. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-04-13 16:53 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-04-12 9:07 [PATCH RFC v1] arm64: mm: change mem_map to use block/section mapping with crashkernel Guanghui Feng 2022-04-12 9:07 ` Guanghui Feng 2022-04-13 16:53 ` Catalin Marinas [this message] 2022-04-13 16:53 ` Catalin Marinas 2022-04-14 3:43 ` guanghui.fgh 2022-04-14 3:43 ` guanghui.fgh
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YlcAEo3lpKJg8HJf@arm.com \ --to=catalin.marinas@arm.com \ --cc=baolin.wang@linux.alibaba.com \ --cc=guanghuifeng@linux.alibaba.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.