Linux-Fsdevel Archive on lore.kernel.org
 help / Atom feed
* [LSF/MM TOPIC] Eliminating tail pages
@ 2019-02-11 19:09 Matthew Wilcox
  2019-02-12  8:55 ` Kirill A. Shutemov
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Wilcox @ 2019-02-11 19:09 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm


I can't follow simple instructions.

----- Forwarded message from Matthew Wilcox <willy@infradead.org> -----

Date: Mon, 11 Feb 2019 11:07:28 -0800
From: Matthew Wilcox <willy@infradead.org>
To: lsf-pc@lists.linux-foundation.org
Subject: [LSF/MM TOPIC] Eliminating tail pages
User-Agent: Mutt/1.9.2 (2017-12-15)


Tail pages are a pain.  All over the kernel, we call compound_head()
(or occasionally forget to ...).  So what would it take to eliminate them?

I'm doing my best to eliminate them from being stored in the page cache.
That's a nice first step, but the very first thing that functions like
find_get_entry(), find_get_entries(), et al do is convert any large
page they find to a tail page.  So we'll probably need to introduce new
functions which will return head pages and convert users over to them.
I know Kirill has a lot more experience with this.

Another place where we return tail pages is get_user_pages().  Callers of
get_user_pages() expect tail or small pages; they do things like calculate
the offset of the byte within the page by AND with PAGE_MASK.  There'll be
a lot of work to check all the users and convert them to something like

unsigned int page_offset(struct page *page, unsigned long addr);

Another thing to consider is that some architectures have a third-level
page size of 16GB (looking at you, POWER).  So an unsigned int isn't
going to cut it.  Do we want to support pages that large, or do we declare
that there will never be any point in supporting pages larger than 4GB?

There are probably other pitfalls I'm forgetting or have never known.
Something like this will be essential for the glorious future that
Christoph Lameter keeps talking about where we divide the memory up into
parts which are only accessible as 2MB pages and parts which support
legacy 4kB usages.

Useful participants:
Kirill Shutemov
Christoph Lameter
Hugh Dickins

probably also relevant to the DAX crew.

----- End forwarded message -----

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [LSF/MM TOPIC] Eliminating tail pages
  2019-02-11 19:09 [LSF/MM TOPIC] Eliminating tail pages Matthew Wilcox
@ 2019-02-12  8:55 ` Kirill A. Shutemov
  0 siblings, 0 replies; 2+ messages in thread
From: Kirill A. Shutemov @ 2019-02-12  8:55 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm

On Mon, Feb 11, 2019 at 11:09:08AM -0800, Matthew Wilcox wrote:
> 
> I can't follow simple instructions.
> 
> ----- Forwarded message from Matthew Wilcox <willy@infradead.org> -----
> 
> Date: Mon, 11 Feb 2019 11:07:28 -0800
> From: Matthew Wilcox <willy@infradead.org>
> To: lsf-pc@lists.linux-foundation.org
> Subject: [LSF/MM TOPIC] Eliminating tail pages
> User-Agent: Mutt/1.9.2 (2017-12-15)
> 
> 
> Tail pages are a pain.  All over the kernel, we call compound_head()
> (or occasionally forget to ...).  So what would it take to eliminate them?
> 
> I'm doing my best to eliminate them from being stored in the page cache.
> That's a nice first step, but the very first thing that functions like
> find_get_entry(), find_get_entries(), et al do is convert any large
> page they find to a tail page.  So we'll probably need to introduce new
> functions which will return head pages and convert users over to them.
> I know Kirill has a lot more experience with this.
> 
> Another place where we return tail pages is get_user_pages().  Callers of
> get_user_pages() expect tail or small pages; they do things like calculate
> the offset of the byte within the page by AND with PAGE_MASK.  There'll be
> a lot of work to check all the users and convert them to something like
> 
> unsigned int page_offset(struct page *page, unsigned long addr);
> 
> Another thing to consider is that some architectures have a third-level
> page size of 16GB (looking at you, POWER).  So an unsigned int isn't
> going to cut it.  Do we want to support pages that large, or do we declare
> that there will never be any point in supporting pages larger than 4GB?
> 
> There are probably other pitfalls I'm forgetting or have never known.

Another place where we see tail pages is on plain page walk: we do map
compund pages with PTEs: THP after split_huge_pmd() or simillar. Some
drivers also allocate compound pages that can be mmaped into userspace
with PTE. I saw sound subsystem do this.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11 19:09 [LSF/MM TOPIC] Eliminating tail pages Matthew Wilcox
2019-02-12  8:55 ` Kirill A. Shutemov

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox