Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"matthew.wilcox@oracle.com" <matthew.wilcox@oracle.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	Kernel Team <Kernel-team@fb.com>,
	"william.kucharski@oracle.com" <william.kucharski@oracle.com>,
	"chad.mynhier@oracle.com" <chad.mynhier@oracle.com>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>
Subject: Re: [PATCH v2 0/3] Enable THP for text section of non-shmem files
Date: Tue, 18 Jun 2019 21:48:16 +0000
Message-ID: <BA4D64DA-4F48-4683-8512-0402B9533EE7@fb.com> (raw)
In-Reply-To: <20190618141223.4479989e18b1e1ea942b0e42@linux-foundation.org>



> On Jun 18, 2019, at 2:12 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Fri, 14 Jun 2019 11:22:01 -0700 Song Liu <songliubraving@fb.com> wrote:
> 
>> This set follows up discussion at LSF/MM 2019. The motivation is to put
>> text section of an application in THP, and thus reduces iTLB miss rate and
>> improves performance. Both Facebook and Oracle showed strong interests to
>> this feature.
>> 
>> To make reviews easier, this set aims a mininal valid product. Current
>> version of the work does not have any changes to file system specific
>> code. This comes with some limitations (discussed later).
>> 
>> This set enables an application to "hugify" its text section by simply
>> running something like:
>> 
>>          madvise(0x600000, 0x80000, MADV_HUGEPAGE);
>> 
>> Before this call, the /proc/<pid>/maps looks like:
>> 
>>    00400000-074d0000 r-xp 00000000 00:27 2006927     app
>> 
>> After this call, part of the text section is split out and mapped to THP:
>> 
>>    00400000-00425000 r-xp 00000000 00:27 2006927     app
>>    00600000-00e00000 r-xp 00200000 00:27 2006927     app   <<< on THP
>>    00e00000-074d0000 r-xp 00a00000 00:27 2006927     app
>> 
>> Limitations:
>> 
>> 1. This only works for text section (vma with VM_DENYWRITE).
>> 2. Once the application put its own pages in THP, the file is read only.
>>   open(file, O_WRITE) will fail with -ETXTBSY. To modify/update the file,
>>   it must be removed first.
> 
> Removed?  Even if the original mmap/madvise has gone away?  hm.

Yeah, it is not ideal. The thp holds a negative count on i_mmap_writable, 
so it cannot be opened for write. 

> 
> I'm wondering if this limitation can be abused in some fashion: mmap a
> file to which you have read permissions, run madvise(MADV_HUGEPAGE) and
> thus prevent the file's owner from being able to modify the file?  Or
> something like that.  What are the issues and protections here?

In this case, the owner need to make a copy of the file, and then remove 
and update the original file. 

In this version, we want either split huge page on writes, or fail the 
write when we cannot split. However, the huge page information is only 
available at page level, and on the write path, page level information 
is not available until write_begin(). So it is hard to stop writes at 
earlier stage. Therefore, in this version, we leverage i_mmap_writable, 
which is at address_space level. So it is easier to stop writes to the 
file. 

This is a temporary behavior. And it is gated by the config. So I guess
it is OK. It works well for our use cases though. Once we have better 
write support, we can remove the limitation. 

If this is too weird, I am also open to suggestions. 

Thanks,
Song

  reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-14 18:22 Song Liu
2019-06-14 18:22 ` [PATCH v2 1/3] mm: check compound_head(page)->mapping in filemap_fault() Song Liu
2019-06-17 14:59   ` Rik van Riel
2019-06-14 18:22 ` [PATCH v2 2/3] mm,thp: stats for file backed THP Song Liu
2019-06-17 15:00   ` Rik van Riel
2019-06-21 12:50   ` Kirill A. Shutemov
2019-06-21 14:09     ` Song Liu
2019-06-14 18:22 ` [PATCH v2 3/3] mm,thp: add read-only THP support for (non-shmem) FS Song Liu
2019-06-17 15:42   ` Rik van Riel
2019-06-21 12:58   ` Kirill A. Shutemov
2019-06-21 13:08     ` Song Liu
2019-06-21 13:11       ` Kirill A. Shutemov
2019-06-18 21:12 ` [PATCH v2 0/3] Enable THP for text section of non-shmem files Andrew Morton
2019-06-18 21:48   ` Song Liu [this message]
2019-06-20  1:13     ` Andrew Morton
2019-06-20  2:04       ` Song Liu
2019-06-19  6:26   ` Song Liu

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BA4D64DA-4F48-4683-8512-0402B9533EE7@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=chad.mynhier@oracle.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=matthew.wilcox@oracle.com \
    --cc=mike.kravetz@oracle.com \
    --cc=william.kucharski@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox