linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhaoyang Huang <huangzhaoyang@gmail.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Mike Rapoport <rppt@kernel.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Jonathan Marek <jonathan@marek.ca>,
	Zhaoyang Huang <zhaoyang.huang@unisoc.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] arch: arm64: introduce RODATA_FULL_USE_PTE_CONT
Date: Tue, 23 Nov 2021 14:48:20 +0800	[thread overview]
Message-ID: <CAGWkznECKd0QZOF2SO__+9N8Tbe=WC2A-5_8Lxb31M6EXaoAGg@mail.gmail.com> (raw)
In-Reply-To: <CAMj1kXHBBE0U7Td8r+AyxXJXo6ciNa79CF8zwVDvH5d5h3DAJQ@mail.gmail.com>

ok. Could the following change be helpful? I think it is safe as
nobody else but the allocator be aware of this memory block and will
keep the attributes integrated while keeping the pages.

diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index c50664b5b72c..0a9da8295d53 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -236,3 +236,45 @@ bool kernel_page_present(struct page *page)
        ptep = pte_offset_kernel(pmdp, addr);
        return pte_valid(READ_ONCE(*ptep));
 }
+
+void arch_alloc_page(struct page *page, int order)
+{
+       unsigned long addr;
+       unsigned long cont_pte_low_bound;
+
+       if (!rodata_full)
+               return;
+       addr = (u64)page_address(page);
+       if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) {
+               order -= 4;
+               do {
+                       cont_pte_low_bound = addr & CONT_PTE_MASK;
+                       __change_memory_common(cont_pte_low_bound,
+                                       (~CONT_PTE_MASK + 1),
__pgprot(PTE_CONT), __pgprot(0));
+                       addr = (u64)page_address(page);
+                       page += 4;
+                       order--;
+               }while (order >= 0);
+       }
+}
+
+void arch_free_page(struct page *page, int order)
+{
+       unsigned long addr;
+       unsigned long cont_pte_low_bound;
+
+       if (!rodata_full)
+               return;
+       addr = (u64)page_address(page);
+       if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) {
+               order -= 4;
+               do {
+                       cont_pte_low_bound = addr & CONT_PTE_MASK;
+                       __change_memory_common(cont_pte_low_bound,
+                                       (~CONT_PTE_MASK + 1),
__pgprot(0), __pgprot(PTE_CONT));
+                       addr = (u64)page_address(page);
+                       page += 4;
+                       order--;
+               }while (order >= 0);
+       }
+}

On Tue, Nov 23, 2021 at 2:16 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 22 Nov 2021 at 12:25, Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 5:23 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Mon, 22 Nov 2021 at 10:17, Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 22, 2021 at 5:04 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > On Mon, 22 Nov 2021 at 10:00, Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 22, 2021 at 4:52 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > >
> > > > > > > On Mon, 22 Nov 2021 at 06:29, Huangzhaoyang <huangzhaoyang@gmail.com> wrote:
> > > > > > > >
> > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > >
> > > > > > > > Kernel linear mapping will be split to the smallest granularity when
> > > > > > > > RODATA_FULL applied, which could lead to TLB pressure. Introduce a method
> > > > > > > > to apply PTE_CONT on pte.
> > > > > > > >
> > > > > > > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > >
> > > > > > > How would this lead to TLB pressure, and how does the use of
> > > > > > > contiguous mappings mitigate that? The linear mapping of the kernel is
> > > > > > > rarely used, as all normal accesses to it go via the vmalloc region,
> > > > > > > so in which case would TLB entries be allocated for this region in a
> > > > > > > way that could cause a measurable performance impact?
> > > > > > In fact, the patch is about to use PTE_CONT *OUT OF* the range of
> > > > > > kernel text.
> > > > >
> > > > > OK, I had missed that, apologies.
> > > > >
> > > > > > Would you please have a look at the code. It apply
> > > > > > PTE_CONT during map_mem and then clear it when load_module change the
> > > > > > corresponding linear mapping to the area it use in vmalloc area.
> > > > >
> > > > > You cannot change the PTE_CONT attribute on live mappings.
> > > > Is it an inner constraint from ASIC perspective? PTE_CONT is different
> > > > to other attributes?
> > >
> > > The PTE_CONT attributes on live translation entries must always be in
> > > sync in all entries covering the same contiguous group, or you might
> > > cause TLB conflict aborts.
> > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >  arch/arm64/Kconfig       |  9 +++++++++
> > > > > > > >  arch/arm64/mm/mmu.c      | 10 ++++++++--
> > > > > > > >  arch/arm64/mm/pageattr.c |  9 +++++++++
> > > > > > > >  3 files changed, 26 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > > > index fee914c..3f8fbf0 100644
> > > > > > > > --- a/arch/arm64/Kconfig
> > > > > > > > +++ b/arch/arm64/Kconfig
> > > > > > > > @@ -1198,6 +1198,15 @@ config RODATA_FULL_DEFAULT_ENABLED
> > > > > > > >           This requires the linear region to be mapped down to pages,
> > > > > > > >           which may adversely affect performance in some cases.
> > > > > > > >
> > > > > > > > +config RODATA_FULL_USE_PTE_CONT
> > > > > > > > +       bool "Apply PTE_CONT when RODATA_FULL_DEFAULT_ENABLED enabled"
> > > > > > > > +       depends on RODATA_FULL_DEFAULT_ENABLED
> > > > > > > > +       default y
> > > > > > > > +       help
> > > > > > > > +         Apply PTE_CONT on linear mapping as much as we can when
> > > > > > > > +         RODATA_FULL_DEFAULT_ENABLED enabled which could decrease the
> > > > > > > > +         impaction on performance by small pte granularity.
> > > > > > > > +
> > > > > > > >  config ARM64_SW_TTBR0_PAN
> > > > > > > >         bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
> > > > > > > >         help
> > > > > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > > > > > > index cfd9deb..8017b17 100644
> > > > > > > > --- a/arch/arm64/mm/mmu.c
> > > > > > > > +++ b/arch/arm64/mm/mmu.c
> > > > > > > > @@ -124,15 +124,21 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
> > > > > > > >          * The following mapping attributes may be updated in live
> > > > > > > >          * kernel mappings without the need for break-before-make.
> > > > > > > >          */
> > > > > > > > +#ifndef CONFIG_RODATA_FULL_USE_PTE_CONT
> > > > > > > >         pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
> > > > > > > > +#else
> > > > > > > > +       pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG | PTE_CONT;
> > > > > > > > +#endif
> > > > >
> > > > > What justifies this change? Why is it ok in certain cases to update
> > > > > the PTE_CONT attribute without BBM?
> > > > BBM?
> > > > I am not sure if it is ok here, but I just find nobody else change
> > > > PTE_CONT so far other than this commit.
> > >
> > > No it is not ok. BBM == break before make, which means you need to
> > > install invalid entries for the entire contiguous group before you can
> > > change the PTE_CONT attributes. As other CPUs may be accessing the
> > > same region, there is no safe way to do this, which is why we don't
> > > permit PTE_CONT to be changed at all.
> > ok, got it. But from practical scenario perspective, could we assume
> > that the corresponding linear map of the vmalloc area would NOT be
> > accessed at all.
> > If it does so, could we just clear PTE_VALID on the
> > linear pte and leave PTE_CONT on the others' pte within the same
> > region. The linear mapping could be reestablished when vmalloc_free.
>
> No. Such an invalid entry would conflict with the other PTE_CONT ones
> in the same contiguous group.
>
>
> > >
> > >
> > > > >
> > > > > > > >
> > > > > > > >         /* creating or taking down mappings is always safe */
> > > > > > > >         if (old == 0 || new == 0)
> > > > > > > >                 return true;
> > > > > > > >
> > > > > > > >         /* live contiguous mappings may not be manipulated at all */
> > > > > > > > -       if ((old | new) & PTE_CONT)
> > > > > > > > +#ifndef CONFIG_RODATA_FULL_USE_PTE_CONT
> > > > > > > > +       if ((old | new) & PTE_CONT)
> > > > > > > >                 return false;
> > > > > > > > +#endif
> > > > > > > >
> > > > > > > >         /* Transitioning from Non-Global to Global is unsafe */
> > > > > > > >         if (old & ~new & PTE_NG)
> > > > > > > > @@ -206,7 +212,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> > > > > > > >
> > > > > > > >                 /* use a contiguous mapping if the range is suitably aligned */
> > > > > > > >                 if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> > > > > > > > -                   (flags & NO_CONT_MAPPINGS) == 0)
> > > > > > > > +                   (IS_ENABLED(CONFIG_RODATA_FULL_USE_PTE_CONT) || (flags & NO_CONT_MAPPINGS) == 0))
> > > > > > > >                         __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> > > > > > > >
> > > > > > > >                 init_pte(pmdp, addr, next, phys, __prot);
> > > > > > > > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> > > > > > > > index a3bacd7..88a87eb 100644
> > > > > > > > --- a/arch/arm64/mm/pageattr.c
> > > > > > > > +++ b/arch/arm64/mm/pageattr.c
> > > > > > > > @@ -99,6 +99,15 @@ static int change_memory_common(unsigned long addr, int numpages,
> > > > > > > >         if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
> > > > > > > >                             pgprot_val(clear_mask) == PTE_RDONLY)) {
> > > > > > > >                 for (i = 0; i < area->nr_pages; i++) {
> > > > > > > > +#ifdef CONFIG_RODATA_FULL_USE_PTE_CONT
> > > > > > > > +                       unsigned long cont_pte_low_bound;
> > > > > > > > +                       unsigned long addr;
> > > > > > > > +
> > > > > > > > +                       addr = (u64)page_address(area->pages[i]);
> > > > > > > > +                       cont_pte_low_bound = addr & CONT_PTE_MASK;
> > > > > > > > +                       __change_memory_common(cont_pte_low_bound,
> > > > > > > > +                                              (~CONT_PTE_MASK + 1), __pgprot(0) , __pgprot(PTE_CONT));
> > > > > > > > +#endif
> > > > > > > >                         __change_memory_common((u64)page_address(area->pages[i]),
> > > > > > > >                                                PAGE_SIZE, set_mask, clear_mask);
> > > > > > > >                 }
> > > > > > > > --
> > > > > > > > 1.9.1
> > > > > > > >

      reply	other threads:[~2021-11-23  6:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22  5:28 [RFC PATCH] arch: arm64: introduce RODATA_FULL_USE_PTE_CONT Huangzhaoyang
2021-11-22  8:52 ` Ard Biesheuvel
2021-11-22  9:00   ` Zhaoyang Huang
2021-11-22  9:04     ` Ard Biesheuvel
2021-11-22  9:17       ` Zhaoyang Huang
2021-11-22  9:23         ` Ard Biesheuvel
2021-11-22 11:24           ` Zhaoyang Huang
2021-11-22 18:16             ` Ard Biesheuvel
2021-11-23  6:48               ` Zhaoyang Huang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGWkznECKd0QZOF2SO__+9N8Tbe=WC2A-5_8Lxb31M6EXaoAGg@mail.gmail.com' \
    --to=huangzhaoyang@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=jonathan@marek.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@gmail.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=rppt@kernel.org \
    --cc=will@kernel.org \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).