archive mirror
 help / color / mirror / Atom feed
From: Gregory CLEMENT <>
To: Russell King - ARM Linux admin <>
Cc: Thomas Petazzoni <>,,
	Arnd Bergmann <>
Subject: Re: [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K)
Date: Fri, 12 Jun 2020 11:15:09 +0200	[thread overview]
Message-ID: <87bllo8x2a.fsf@FE-laptop> (raw)
In-Reply-To: <>

Hello Russell,

> Hi Gregory,
> You're on your own with this one; I've no motivation to re-understand
> the ARM page table code now that 32-bit ARM is basically unsupported
> now.


> I'll point out some of the things you got wrong below though.

However thanks for your pointer.

> On Thu, Jun 11, 2020 at 03:49:08PM +0200, Gregory CLEMENT wrote:
>> Hello,
>> On ARM based NAS it is possible to have storage volume larger than
>> 16TB, especially with the use of LVM. However, on 32-bit architectures,
>> the page cache index is stored on 32 bits, which means that given a
>> page size of 4 KB, we can only address volumes of up to 16 TB.
>> Therefore, one option to use such large volumes and filesystems on 32
>> bits architecture is to increase the page size.
>> This series allows to support 8K, 16K, 32K and 64K kernel pages. On
>> ARM the size of the page can be either 4K or 64K, so for the other
>> size a "software emulation" is used, here Linux thinks it is using
>> pages of 8 KB, 16 KB or 32 KB, while underneath the MMU still uses 4
>> KB pages.
>> For ARM there is already a difference between the kernel page and the
>> hardware page in the way they are managed. In the same 4K space the
>> Linux kernel deals with 2 PTE tables at the beginning, while the
>> hardware deals with 2 other hardware PTE tables.
> This is incorrect.  The kernel page size and the hardware page size
> match today - both are 4k.  What your'e talking about here is the
> PTE table size.
> The kernel requires that each PTE table is contained within one
> struct page.  Since one hardware PTE table is 256 entries, it
> occupies 1024 bytes, so a quarter of a page.  So, to have a single
> 4k page per PTE table would waste quite a bit of space.
> Now, the hardware PTE tables do not lend themselves to the kernel's
> usage: the kernel wants additional bits to track the state of each
> page in the page tables.  Hence, we need to shadow every PTE entry.
> This also provides us independence of the underlying hardware PTE
> entry format, which varies between ARM architecture versions.
> So, we end up with a single 4k page containing two consecutive
> hardware PTE tables, followed by two Linux PTE tables for the kernels
> benefit.

It was what I understood, but I seemed I didn't formulate it accurately.

> If you increase the page size, then you need to increase the number
> of tables in a page, or suffer a huge amount of wasted memory taken
> for the page tables - going to an 8k page size means that the upper
> 4k of each page will not be used.  Going to 16k means the upper 12k
> won't be used.  And so on - as your software page size increases,
> the amount of memory wasted for each PTE table will increase
> unless you also increase the number of hardware 1st level entries
> pointing to each PTE page.  With 64k pages, 60k of each PTE page
> will remain unused.

Unfortunately I was aware of it. But I thought that it was an acceptable
drawback to be able to address large volume on a 32 bits ARM
system. Actually it is already the case on some product.

> That isn't very efficient use of memory.

Indeed however on a 3GB system, in the worst case we need 786432 pages
of 4K to map the memory. These pages can be mapped by 1536 block of 512
entries. So when the 64K pages are emulated we loose 92MB (around 3% of
the memory). So it is not negligible but given the use case I seems

Of course, it didn't prevent to try to do better.

> -- 
> RMK's Patch system:
> FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 503kbps up

Gregory Clement, Bootlin
Embedded Linux and Kernel engineering

linux-arm-kernel mailing list

  reply	other threads:[~2020-06-12  9:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE Gregory CLEMENT
2020-06-12  8:22   ` Arnd Bergmann
2020-06-12  8:35     ` Russell King - ARM Linux admin
2020-06-12  8:46       ` Arnd Bergmann
2020-06-12  8:50         ` Russell King - ARM Linux admin
2020-06-12 11:50         ` Catalin Marinas
2020-06-12 12:06         ` Gregory CLEMENT
2020-06-12  8:52     ` Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 2/6] ARM: pagetable: prepare hardware page table to use large page Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 3/6] ARM: Make the number of fix bitmap depend on the page size Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page Gregory CLEMENT
2020-06-12  8:37   ` Arnd Bergmann
2020-06-12 10:25     ` Catalin Marinas
2020-06-12 11:56     ` Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 5/6] ARM: Add large kernel page support Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 6/6] ARM: Add 64K page support at MMU level Gregory CLEMENT
2020-06-11 16:21 ` [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Russell King - ARM Linux admin
2020-06-12  9:15   ` Gregory CLEMENT [this message]
2020-06-12  9:23   ` Arnd Bergmann
2020-06-12 12:21     ` Catalin Marinas
2020-06-12 12:49       ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bllo8x2a.fsf@FE-laptop \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).