All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
@ 2014-04-30 11:36 Steve Capper
  2014-04-30 18:11 ` Arnd Bergmann
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Steve Capper @ 2014-04-30 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

We have the capability to map 1GB level 1 blocks when using a 4K
granule.

This patch adjusts the create_mapping logic s.t. when mapping physical
memory on boot, we attempt to use a 1GB block if both the VA and PA
start and end are 1GB aligned. This both reduces the levels of lookup
required to resolve a kernel logical address, as well as reduces TLB
pressure on cores that support 1GB TLB entries.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
Hello,
This patch has been tested on the FastModel for 4K and 64K pages.
Also, this has been tested with Jungseok's 4 level patch.

I put in the explicit check for PAGE_SHIFT, as I am anticipating a
three level 64KB configuration at some point.

With two level 64K, a PUD is equivalent to a PMD which is equivalent to
a PGD, and these are all level 2 descriptors.

Under three level 64K, a PUD would be equivalent to a PGD which would
be a level 1 descriptor thus may not be a block.

Comments/critique/testers welcome.

Cheers,
-- 
Steve
---
 arch/arm64/mm/mmu.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 4d29332..867e979 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
-		alloc_init_pmd(pud, addr, next, phys);
+
+		/*
+		 * For 4K granule only, attempt to put down a 1GB block
+		 */
+		if ((PAGE_SHIFT == 12) &&
+			((addr | next | phys) & ~PUD_MASK) == 0) {
+			pud_t old_pud = *pud;
+			set_pud(pud, __pud(phys | prot_sect_kernel));
+
+			if (!pud_none(old_pud))
+				flush_tlb_all();
+		} else {
+			alloc_init_pmd(pud, addr, next, phys);
+		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-04-30 11:36 [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible Steve Capper
@ 2014-04-30 18:11 ` Arnd Bergmann
  2014-05-01  8:54   ` Steve Capper
  2014-05-02  1:03 ` Jungseok Lee
  2014-05-02  8:51 ` Catalin Marinas
  2 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2014-04-30 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> We have the capability to map 1GB level 1 blocks when using a 4K
> granule.
> 
> This patch adjusts the create_mapping logic s.t. when mapping physical
> memory on boot, we attempt to use a 1GB block if both the VA and PA
> start and end are 1GB aligned. This both reduces the levels of lookup
> required to resolve a kernel logical address, as well as reduces TLB
> pressure on cores that support 1GB TLB entries.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
> Hello,
> This patch has been tested on the FastModel for 4K and 64K pages.
> Also, this has been tested with Jungseok's 4 level patch.
> 
> I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> three level 64KB configuration at some point.
> 
> With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> a PGD, and these are all level 2 descriptors.
> 
> Under three level 64K, a PUD would be equivalent to a PGD which would
> be a level 1 descriptor thus may not be a block.
> 
> Comments/critique/testers welcome.

It seems like a great idea. I have to admit that I don't understand
the existing code, but what are the page sizes used here?

Does the code always use the largest possible page size, or does
it just use either small pages or 1G pages?

In combination with the contiguous page hint, we should be able
to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
combination for boot-time mappings on a 4K page size kernel,
or 64KB/1M/512M/8G on a 64KB page size kernel.

	Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-04-30 18:11 ` Arnd Bergmann
@ 2014-05-01  8:54   ` Steve Capper
  2014-05-01 13:36     ` Arnd Bergmann
  0 siblings, 1 reply; 9+ messages in thread
From: Steve Capper @ 2014-05-01  8:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote:
> On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> > We have the capability to map 1GB level 1 blocks when using a 4K
> > granule.
> > 
> > This patch adjusts the create_mapping logic s.t. when mapping physical
> > memory on boot, we attempt to use a 1GB block if both the VA and PA
> > start and end are 1GB aligned. This both reduces the levels of lookup
> > required to resolve a kernel logical address, as well as reduces TLB
> > pressure on cores that support 1GB TLB entries.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> > Hello,
> > This patch has been tested on the FastModel for 4K and 64K pages.
> > Also, this has been tested with Jungseok's 4 level patch.
> > 
> > I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> > three level 64KB configuration at some point.
> > 
> > With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> > a PGD, and these are all level 2 descriptors.
> > 
> > Under three level 64K, a PUD would be equivalent to a PGD which would
> > be a level 1 descriptor thus may not be a block.
> > 
> > Comments/critique/testers welcome.
> 
> It seems like a great idea. I have to admit that I don't understand
> the existing code, but what are the page sizes used here?

Actually, I think it was your idea ;-). I remember you talking about
increasing the mapping size when 4-level page tables were being
discussed. (I think I should have added a Reported-by, would be happy
to if you want?).

With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K.
And with a 4KB granule, the original code will map 2MB blocks if
possible, and 4KB otherwise.

The patch will make the 4KB granule case also map 1GB blocks if
possible.

> 
> Does the code always use the largest possible page size, or does
> it just use either small pages or 1G pages?

The code will put down the largest mappings it can. As the physical
memory sizes/address are very likely to be aligned to whatever block
size we use; we are likely to achieve the maximum size for our
mappings.

> 
> In combination with the contiguous page hint, we should be able
> to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
> combination for boot-time mappings on a 4K page size kernel,
> or 64KB/1M/512M/8G on a 64KB page size kernel.
> 

A contiguous hint could be applied to these mappings. The logic would
be a bit more complicated though when we consider different granules.
For 4KB we chain together 16 entries, for 64KB we use 32. If/when we
adopt a 16KB granule, we use 32 entries for a level 2 lookup and
128 entries for a level 3 lookup...

The largest TLB entry sizes that I am aware of in play are the block
sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at
the moment for adding the contiguous logic.

Cheers,
-- 
Steve

> 	Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-05-01  8:54   ` Steve Capper
@ 2014-05-01 13:36     ` Arnd Bergmann
  2014-05-01 16:20       ` Steve Capper
  0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2014-05-01 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 01 May 2014 09:54:12 Steve Capper wrote:
> On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote:
> > On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> > > We have the capability to map 1GB level 1 blocks when using a 4K
> > > granule.
> > > 
> > > This patch adjusts the create_mapping logic s.t. when mapping physical
> > > memory on boot, we attempt to use a 1GB block if both the VA and PA
> > > start and end are 1GB aligned. This both reduces the levels of lookup
> > > required to resolve a kernel logical address, as well as reduces TLB
> > > pressure on cores that support 1GB TLB entries.
> > > 
> > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > ---
> > > Hello,
> > > This patch has been tested on the FastModel for 4K and 64K pages.
> > > Also, this has been tested with Jungseok's 4 level patch.
> > > 
> > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> > > three level 64KB configuration at some point.
> > > 
> > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> > > a PGD, and these are all level 2 descriptors.
> > > 
> > > Under three level 64K, a PUD would be equivalent to a PGD which would
> > > be a level 1 descriptor thus may not be a block.
> > > 
> > > Comments/critique/testers welcome.
> > 
> > It seems like a great idea. I have to admit that I don't understand
> > the existing code, but what are the page sizes used here?
> 
> Actually, I think it was your idea ;-). I remember you talking about
> increasing the mapping size when 4-level page tables were being
> discussed. (I think I should have added a Reported-by, would be happy
> to if you want?).

I completely forgot we had talked about this.

> With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K.
> And with a 4KB granule, the original code will map 2MB blocks if
> possible, and 4KB otherwise.
> 
> The patch will make the 4KB granule case also map 1GB blocks if
> possible.

Ok.

> > In combination with the contiguous page hint, we should be able
> > to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
> > combination for boot-time mappings on a 4K page size kernel,
> > or 64KB/1M/512M/8G on a 64KB page size kernel.
> 
> A contiguous hint could be applied to these mappings. The logic would
> be a bit more complicated though when we consider different granules.
> For 4KB we chain together 16 entries, for 64KB we use 32. If/when we
> adopt a 16KB granule, we use 32 entries for a level 2 lookup and
> 128 entries for a level 3 lookup...
> 
> The largest TLB entry sizes that I am aware of in play are the block
> sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at
> the moment for adding the contiguous logic.

Is that an architecture limit, or specific to the Cortex-A53/A57
implementations?

	Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-05-01 13:36     ` Arnd Bergmann
@ 2014-05-01 16:20       ` Steve Capper
  0 siblings, 0 replies; 9+ messages in thread
From: Steve Capper @ 2014-05-01 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 01, 2014 at 03:36:05PM +0200, Arnd Bergmann wrote:
> On Thursday 01 May 2014 09:54:12 Steve Capper wrote:
> > On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote:
> > > On Wednesday 30 April 2014 12:36:22 Steve Capper wrote:
> > > > We have the capability to map 1GB level 1 blocks when using a 4K
> > > > granule.
> > > > 
> > > > This patch adjusts the create_mapping logic s.t. when mapping physical
> > > > memory on boot, we attempt to use a 1GB block if both the VA and PA
> > > > start and end are 1GB aligned. This both reduces the levels of lookup
> > > > required to resolve a kernel logical address, as well as reduces TLB
> > > > pressure on cores that support 1GB TLB entries.
> > > > 
> > > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > > ---
> > > > Hello,
> > > > This patch has been tested on the FastModel for 4K and 64K pages.
> > > > Also, this has been tested with Jungseok's 4 level patch.
> > > > 
> > > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a
> > > > three level 64KB configuration at some point.
> > > > 
> > > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to
> > > > a PGD, and these are all level 2 descriptors.
> > > > 
> > > > Under three level 64K, a PUD would be equivalent to a PGD which would
> > > > be a level 1 descriptor thus may not be a block.
> > > > 
> > > > Comments/critique/testers welcome.
> > > 
> > > It seems like a great idea. I have to admit that I don't understand
> > > the existing code, but what are the page sizes used here?
> > 
> > Actually, I think it was your idea ;-). I remember you talking about
> > increasing the mapping size when 4-level page tables were being
> > discussed. (I think I should have added a Reported-by, would be happy
> > to if you want?).
> 
> I completely forgot we had talked about this.
> 
> > With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K.
> > And with a 4KB granule, the original code will map 2MB blocks if
> > possible, and 4KB otherwise.
> > 
> > The patch will make the 4KB granule case also map 1GB blocks if
> > possible.
> 
> Ok.
> 
> > > In combination with the contiguous page hint, we should be able
> > > to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any
> > > combination for boot-time mappings on a 4K page size kernel,
> > > or 64KB/1M/512M/8G on a 64KB page size kernel.
> > 
> > A contiguous hint could be applied to these mappings. The logic would
> > be a bit more complicated though when we consider different granules.
> > For 4KB we chain together 16 entries, for 64KB we use 32. If/when we
> > adopt a 16KB granule, we use 32 entries for a level 2 lookup and
> > 128 entries for a level 3 lookup...
> > 
> > The largest TLB entry sizes that I am aware of in play are the block
> > sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at
> > the moment for adding the contiguous logic.
> 
> Is that an architecture limit, or specific to the Cortex-A53/A57
> implementations?

Those are the TLBs that are documented for the Cortex-A53 and
Cortex-A57. I have an idea of what the architectural limit is, but I
will need to seek confirmation on it.

Cheers,
-- 
Steve 

> 
> 	Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-04-30 11:36 [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible Steve Capper
  2014-04-30 18:11 ` Arnd Bergmann
@ 2014-05-02  1:03 ` Jungseok Lee
  2014-05-02  9:11   ` Steve Capper
  2014-05-02  8:51 ` Catalin Marinas
  2 siblings, 1 reply; 9+ messages in thread
From: Jungseok Lee @ 2014-05-02  1:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, April 30, 2014 8:36 PM, Steve Capper wrote:
> We have the capability to map 1GB level 1 blocks when using a 4K granule.
> 
> This patch adjusts the create_mapping logic s.t. when mapping physical memory on boot, we attempt to
> use a 1GB block if both the VA and PA start and end are 1GB aligned. This both reduces the levels of
> lookup required to resolve a kernel logical address, as well as reduces TLB pressure on cores that
> support 1GB TLB entries.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
> Hello,
> This patch has been tested on the FastModel for 4K and 64K pages.
> Also, this has been tested with Jungseok's 4 level patch.
> 
> I put in the explicit check for PAGE_SHIFT, as I am anticipating a three level 64KB configuration at
> some point.
> 
> With two level 64K, a PUD is equivalent to a PMD which is equivalent to a PGD, and these are all level
> 2 descriptors.
> 
> Under three level 64K, a PUD would be equivalent to a PGD which would be a level 1 descriptor thus may
> not be a block.
> 
> Comments/critique/testers welcome.

Hi, Steve

I've tested on my platform, and it works well.

If SoC design follows "Principles of ARM Memory Maps",
PA should be supposed to be 1GB aligned. Thus, I think
this patch is effective against them.

Best Regards
Jungseok Lee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-04-30 11:36 [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible Steve Capper
  2014-04-30 18:11 ` Arnd Bergmann
  2014-05-02  1:03 ` Jungseok Lee
@ 2014-05-02  8:51 ` Catalin Marinas
  2014-05-02  9:21   ` Steve Capper
  2 siblings, 1 reply; 9+ messages in thread
From: Catalin Marinas @ 2014-05-02  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 30, 2014 at 12:36:22PM +0100, Steve Capper wrote:
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 4d29332..867e979 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
>  	pud = pud_offset(pgd, addr);
>  	do {
>  		next = pud_addr_end(addr, end);
> -		alloc_init_pmd(pud, addr, next, phys);
> +
> +		/*
> +		 * For 4K granule only, attempt to put down a 1GB block
> +		 */
> +		if ((PAGE_SHIFT == 12) &&
> +			((addr | next | phys) & ~PUD_MASK) == 0) {
> +			pud_t old_pud = *pud;
> +			set_pud(pud, __pud(phys | prot_sect_kernel));
> +
> +			if (!pud_none(old_pud))
> +				flush_tlb_all();

We could even free the original pmd here. I think a
memblock_free(pud_pfn(old_pud) << PAGE_SHIFT, PAGE_SIZE) should do
(untested, and you need to define pud_pfn).

-- 
Catalin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-05-02  1:03 ` Jungseok Lee
@ 2014-05-02  9:11   ` Steve Capper
  0 siblings, 0 replies; 9+ messages in thread
From: Steve Capper @ 2014-05-02  9:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 10:03:02AM +0900, Jungseok Lee wrote:
> On Wednesday, April 30, 2014 8:36 PM, Steve Capper wrote:
> > We have the capability to map 1GB level 1 blocks when using a 4K granule.
> > 
> > This patch adjusts the create_mapping logic s.t. when mapping physical memory on boot, we attempt to
> > use a 1GB block if both the VA and PA start and end are 1GB aligned. This both reduces the levels of
> > lookup required to resolve a kernel logical address, as well as reduces TLB pressure on cores that
> > support 1GB TLB entries.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> > Hello,
> > This patch has been tested on the FastModel for 4K and 64K pages.
> > Also, this has been tested with Jungseok's 4 level patch.
> > 
> > I put in the explicit check for PAGE_SHIFT, as I am anticipating a three level 64KB configuration at
> > some point.
> > 
> > With two level 64K, a PUD is equivalent to a PMD which is equivalent to a PGD, and these are all level
> > 2 descriptors.
> > 
> > Under three level 64K, a PUD would be equivalent to a PGD which would be a level 1 descriptor thus may
> > not be a block.
> > 
> > Comments/critique/testers welcome.
> 
> Hi, Steve
> 
> I've tested on my platform, and it works well.
> 

Thanks for giving this a go!

> If SoC design follows "Principles of ARM Memory Maps",
> PA should be supposed to be 1GB aligned. Thus, I think
> this patch is effective against them.
> 
> Best Regards
> Jungseok Lee
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible
  2014-05-02  8:51 ` Catalin Marinas
@ 2014-05-02  9:21   ` Steve Capper
  0 siblings, 0 replies; 9+ messages in thread
From: Steve Capper @ 2014-05-02  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 09:51:21AM +0100, Catalin Marinas wrote:
> On Wed, Apr 30, 2014 at 12:36:22PM +0100, Steve Capper wrote:
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 4d29332..867e979 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
> >  	pud = pud_offset(pgd, addr);
> >  	do {
> >  		next = pud_addr_end(addr, end);
> > -		alloc_init_pmd(pud, addr, next, phys);
> > +
> > +		/*
> > +		 * For 4K granule only, attempt to put down a 1GB block
> > +		 */
> > +		if ((PAGE_SHIFT == 12) &&
> > +			((addr | next | phys) & ~PUD_MASK) == 0) {
> > +			pud_t old_pud = *pud;
> > +			set_pud(pud, __pud(phys | prot_sect_kernel));
> > +
> > +			if (!pud_none(old_pud))
> > +				flush_tlb_all();
> 
> We could even free the original pmd here. I think a
> memblock_free(pud_pfn(old_pud) << PAGE_SHIFT, PAGE_SIZE) should do
> (untested, and you need to define pud_pfn).

I see what you mean, we will potentially have an unused page in our
swapper_pg_dir array.

I'll have a think, and add some logic to remove the redundant page.

Cheers,
-- 
Steve

> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-05-02  9:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-30 11:36 [PATCH] arm64: mm: Create gigabyte kernel logical mappings where possible Steve Capper
2014-04-30 18:11 ` Arnd Bergmann
2014-05-01  8:54   ` Steve Capper
2014-05-01 13:36     ` Arnd Bergmann
2014-05-01 16:20       ` Steve Capper
2014-05-02  1:03 ` Jungseok Lee
2014-05-02  9:11   ` Steve Capper
2014-05-02  8:51 ` Catalin Marinas
2014-05-02  9:21   ` Steve Capper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.