All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-cachefs@redhat.com,
	linux-afs@lists.infradead.org
Subject: Re: [PATCH v5 00/27] Memory Folios
Date: Thu, 1 Apr 2021 12:00:08 -0400	[thread overview]
Message-ID: <YGXuCMfBWL51TVu3@cmpxchg.org> (raw)
In-Reply-To: <YGVUobKUMUtEy1PS@zeniv-ca.linux.org.uk>

On Thu, Apr 01, 2021 at 05:05:37AM +0000, Al Viro wrote:
> On Tue, Mar 30, 2021 at 10:09:29PM +0100, Matthew Wilcox wrote:
> 
> > That's a very Intel-centric way of looking at it.  Other architectures
> > support a multitude of page sizes, from the insane ia64 (4k, 8k, 16k, then
> > every power of four up to 4GB) to more reasonable options like (4k, 32k,
> > 256k, 2M, 16M, 128M).  But we (in software) shouldn't constrain ourselves
> > to thinking in terms of what the hardware currently supports.  Google
> > have data showing that for their workloads, 32kB is the goldilocks size.
> > I'm sure for some workloads, it's much higher and for others it's lower.
> > But for almost no workload is 4kB the right choice any more, and probably
> > hasn't been since the late 90s.
> 
> Out of curiosity I looked at the distribution of file sizes in the
> kernel tree:
> 71455 files total
> 0--4Kb		36702
> 4--8Kb		11820
> 8--16Kb		10066
> 16--32Kb	6984
> 32--64Kb	3804
> 64--128Kb	1498
> 128--256Kb	393
> 256--512Kb	108
> 512Kb--1Mb	35
> 1--2Mb		25
> 2--4Mb		5
> 4--6Mb		7
> 6--8Mb		4
> 12Mb		2 
> 14Mb		1
> 16Mb		1
> 
> ... incidentally, everything bigger than 1.2Mb lives^Wshambles under
> drivers/gpu/drm/amd/include/asic_reg/
> 
> Page size	Footprint
> 4Kb		1128Mb
> 8Kb		1324Mb
> 16Kb		1764Mb
> 32Kb		2739Mb
> 64Kb		4832Mb
> 128Kb		9191Mb
> 256Kb		18062Mb
> 512Kb		35883Mb
> 1Mb		71570Mb
> 2Mb		142958Mb
> 
> So for kernel builds (as well as grep over the tree, etc.) uniform 2Mb pages
> would be... interesting.

Right, I don't see us getting rid of 4k cache entries anytime
soon. Even 32k pages would double the footprint here.

The issue is just that at the other end of the spectrum we have IO
devices that do 10GB/s, which corresponds to 2.6 million pages per
second. At such data rates we are currently CPU-limited because of the
pure transaction overhead in page reclaim. Workloads like this tend to
use much larger files, and would benefit from a larger paging unit.

Likewise, most production workloads in cloud servers have enormous
anonymous regions and large executables that greatly benefit from
fewer page table levels and bigger TLB entries.

Today, fragmentation prevents the page allocator from producing 2MB
blocks at a satisfactory rate and allocation latency. It's not
feasible to allocate 2M inside page faults for example; getting huge
page coverage for the page cache will be even more difficult.

I'm not saying we should get rid of 4k cache entries. Rather, I'm
wondering out loud whether longer-term we'd want to change the default
page size to 2M, and implement the 4k cache entries, which we clearly
continue to need, with a slab style allocator on top. The idea being
that it'll do a better job at grouping cache entries with other cache
entries of a similar lifetime than the untyped page allocator does
naturally, and so make fragmentation a whole lot more manageable.

(I'm using x86 page sizes as examples because they matter to me. But
there is an architecture independent discrepancy between the smallest
cache entries we must continue to support, and larger blocks / huge
pages that we increasingly rely on as first class pages.)

  parent reply	other threads:[~2021-04-01 18:02 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-20  5:40 [PATCH v5 00/27] Memory Folios Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 01/27] fs/cachefiles: Remove wait_bit_key layout dependency Matthew Wilcox (Oracle)
2021-03-22  8:06   ` Christoph Hellwig
2021-03-20  5:40 ` [PATCH v5 02/27] mm/writeback: Add wait_on_page_writeback_killable Matthew Wilcox (Oracle)
2021-03-22  8:07   ` Christoph Hellwig
2021-03-20  5:40 ` [PATCH v5 03/27] afs: Use wait_on_page_writeback_killable Matthew Wilcox (Oracle)
2021-03-22  8:08   ` Christoph Hellwig
2021-03-20  5:40 ` [PATCH v5 04/27] mm: Introduce struct folio Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 05/27] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 06/27] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 07/27] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 08/27] mm: Add put_folio Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 09/27] mm: Add get_folio Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 10/27] mm: Create FolioFlags Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 11/27] mm: Handle per-folio private data Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 12/27] mm: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 13/27] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 14/27] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 15/27] mm/filemap: Add unlock_folio Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 16/27] mm/filemap: Add lock_folio Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 17/27] mm/filemap: Add lock_folio_killable Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 18/27] mm/filemap: Add __lock_folio_async Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 19/27] mm/filemap: Add __lock_folio_or_retry Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 20/27] mm/filemap: Add wait_on_folio_locked Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 21/27] mm/filemap: Add end_folio_writeback Matthew Wilcox (Oracle)
2021-03-20  5:40 ` [PATCH v5 22/27] mm/writeback: Add wait_on_folio_writeback Matthew Wilcox (Oracle)
2021-03-20  5:41 ` [PATCH v5 23/27] mm/writeback: Add wait_for_stable_folio Matthew Wilcox (Oracle)
2021-03-20  5:41 ` [PATCH v5 24/27] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
2021-03-21  7:10   ` kernel test robot
2021-03-21  7:10     ` kernel test robot
2021-03-20  5:41 ` [PATCH v5 25/27] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit Matthew Wilcox (Oracle)
2021-03-20  5:41 ` [PATCH v5 26/27] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
2021-03-20  7:54   ` kernel test robot
2021-03-20  7:54     ` kernel test robot
2021-03-20  5:41 ` [PATCH v5 27/27] mm/doc: Build kerneldoc for various mm files Matthew Wilcox (Oracle)
2021-03-22  3:25 ` [PATCH v5 00/27] Memory Folios Matthew Wilcox
2021-03-22  9:25 ` [PATCH v5 01/27] fs/cachefiles: Remove wait_bit_key layout dependency David Howells
2021-03-22  9:26 ` [PATCH v5 02/27] mm/writeback: Add wait_on_page_writeback_killable David Howells
2021-03-22  9:27 ` [PATCH v5 03/27] afs: Use wait_on_page_writeback_killable David Howells
2021-03-22 19:41   ` Matthew Wilcox
2021-03-22 17:59 ` [PATCH v5 00/27] Memory Folios Johannes Weiner
2021-03-22 18:47   ` Matthew Wilcox
2021-03-24  0:29     ` Johannes Weiner
2021-03-24  6:24       ` Matthew Wilcox
2021-03-26 17:48         ` Johannes Weiner
2021-03-29 16:58           ` Matthew Wilcox
2021-03-29 17:56             ` Matthew Wilcox
2021-03-30 19:30             ` Johannes Weiner
2021-03-30 21:09               ` Matthew Wilcox
2021-03-31 18:14                 ` Johannes Weiner
2021-03-31 18:28                   ` Matthew Wilcox
2021-04-01  5:05                 ` Al Viro
2021-04-01 12:07                   ` Matthew Wilcox
2021-04-01 16:00                   ` Johannes Weiner [this message]
2021-03-31 14:54               ` Christoph Hellwig
2021-03-23 15:50   ` Christoph Hellwig
2021-03-23 11:29 ` [PATCH v5 03/27] afs: Use wait_on_page_writeback_killable David Howells
2021-03-23 17:50 ` [PATCH v5 00/27] Memory Folios David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YGXuCMfBWL51TVu3@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.