From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Andrea Arcangeli <aarcange@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, Hugh Dickins <hughd@google.com>, Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>, Mel Gorman <mgorman@suse.de>, linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>, Matthew Wilcox <willy@linux.intel.com>, "Kirill A. Shutemov" <kirill@shutemov.name>, Hillf Danton <dhillf@gmail.com>, Dave Hansen <dave@sr71.net>, Ning Qu <quning@google.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCHv5 00/23] Transparent huge page cache: phase 1, everything but mmap() Date: Sun, 4 Aug 2013 05:17:02 +0300 [thread overview] Message-ID: <1375582645-29274-1-git-send-email-kirill.shutemov@linux.intel.com> (raw) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> This is the second part of my transparent huge page cache work. It brings thp support for ramfs, but without mmap() -- it will be posted separately. Intro ----- The goal of the project is preparing kernel infrastructure to handle huge pages in page cache. To proof that the proposed changes are functional we enable the feature for the most simple file system -- ramfs. ramfs is not that useful by itself, but it's good pilot project. Design overview --------------- Every huge page is represented in page cache radix-tree by HPAGE_PMD_NR (512 on x86-64) entries: one entry for head page and HPAGE_PMD_NR-1 entries for tail pages. Radix tree manipulations are implemented in batched way: we add and remove whole huge page at once, under one tree_lock. To make it possible, we extended radix-tree interface to be able to pre-allocate memory enough to insert a number of *contiguous* elements (kudos to Matthew Wilcox). Huge pages can be added to page cache three ways: - write(2) to file or page; - read(2) from sparse file; - fault sparse file. Potentially, one more way is collapsing small page, but it's outside initial implementation. For now we still write/read at most PAGE_CACHE_SIZE bytes a time. There's some room for speed up later. Since mmap() isn't targeted for this patchset, we just split huge page on page fault. To minimize memory overhead for small file we setup fops->release helper -- simple_thp_release() -- which splits the last page in file, when last writer goes away. truncate_inode_pages_range() drops whole huge page at once if it's fully inside the range. If a huge page is only partly in the range we zero out the part, exactly like we do for partial small pages. split_huge_page() for file pages works similar to anon pages, but we walk by mapping->i_mmap rather then anon_vma->rb_root. At the end we call truncate_inode_pages() to drop small pages beyond i_size, if any. Locking model around split_huge_page() rather complicated and I still don't feel myself confident enough with it. Looks like we need to serialize over i_mutex in split_huge_page(), but it breaks locking ordering for i_mutex->mmap_sem. I don't see how it can be fixed easily. Any ideas are welcome. Performance indicators will be posted separately. Please, review. Kirill A. Shutemov (23): radix-tree: implement preload for multiple contiguous elements memcg, thp: charge huge cache pages thp: compile-time and sysfs knob for thp pagecache thp, mm: introduce mapping_can_have_hugepages() predicate thp: represent file thp pages in meminfo and friends thp, mm: rewrite add_to_page_cache_locked() to support huge pages mm: trace filemap: dump page order block: implement add_bdi_stat() thp, mm: rewrite delete_from_page_cache() to support huge pages thp, mm: warn if we try to use replace_page_cache_page() with THP thp, mm: handle tail pages in page_cache_get_speculative() thp, mm: add event counters for huge page alloc on file write or read thp, mm: allocate huge pages in grab_cache_page_write_begin() thp, mm: naive support of thp in generic_perform_write mm, fs: avoid page allocation beyond i_size on read thp, mm: handle transhuge pages in do_generic_file_read() thp, libfs: initial thp support thp: libfs: introduce simple_thp_release() truncate: support huge pages thp: handle file pages in split_huge_page() thp: wait_split_huge_page(): serialize over i_mmap_mutex too thp, mm: split huge page on mmap file page ramfs: enable transparent huge page cache Documentation/vm/transhuge.txt | 16 ++++ drivers/base/node.c | 4 + fs/libfs.c | 80 ++++++++++++++++++- fs/proc/meminfo.c | 3 + fs/ramfs/file-mmu.c | 3 +- fs/ramfs/inode.c | 6 +- include/linux/backing-dev.h | 10 +++ include/linux/fs.h | 10 +++ include/linux/huge_mm.h | 53 ++++++++++++- include/linux/mmzone.h | 1 + include/linux/page-flags.h | 33 ++++++++ include/linux/pagemap.h | 48 +++++++++++- include/linux/radix-tree.h | 11 +++ include/linux/vm_event_item.h | 4 + include/trace/events/filemap.h | 7 +- lib/radix-tree.c | 41 +++++++--- mm/Kconfig | 12 +++ mm/filemap.c | 171 +++++++++++++++++++++++++++++++++++------ mm/huge_memory.c | 116 ++++++++++++++++++++++++---- mm/memcontrol.c | 2 - mm/memory.c | 4 +- mm/truncate.c | 108 ++++++++++++++++++++------ mm/vmstat.c | 5 ++ 23 files changed, 658 insertions(+), 90 deletions(-) -- 1.8.3.2
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Andrea Arcangeli <aarcange@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, Hugh Dickins <hughd@google.com>, Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>, Mel Gorman <mgorman@suse.de>, linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>, Matthew Wilcox <willy@linux.intel.com>, "Kirill A. Shutemov" <kirill@shutemov.name>, Hillf Danton <dhillf@gmail.com>, Dave Hansen <dave@sr71.net>, Ning Qu <quning@google.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCHv5 00/23] Transparent huge page cache: phase 1, everything but mmap() Date: Sun, 4 Aug 2013 05:17:02 +0300 [thread overview] Message-ID: <1375582645-29274-1-git-send-email-kirill.shutemov@linux.intel.com> (raw) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> This is the second part of my transparent huge page cache work. It brings thp support for ramfs, but without mmap() -- it will be posted separately. Intro ----- The goal of the project is preparing kernel infrastructure to handle huge pages in page cache. To proof that the proposed changes are functional we enable the feature for the most simple file system -- ramfs. ramfs is not that useful by itself, but it's good pilot project. Design overview --------------- Every huge page is represented in page cache radix-tree by HPAGE_PMD_NR (512 on x86-64) entries: one entry for head page and HPAGE_PMD_NR-1 entries for tail pages. Radix tree manipulations are implemented in batched way: we add and remove whole huge page at once, under one tree_lock. To make it possible, we extended radix-tree interface to be able to pre-allocate memory enough to insert a number of *contiguous* elements (kudos to Matthew Wilcox). Huge pages can be added to page cache three ways: - write(2) to file or page; - read(2) from sparse file; - fault sparse file. Potentially, one more way is collapsing small page, but it's outside initial implementation. For now we still write/read at most PAGE_CACHE_SIZE bytes a time. There's some room for speed up later. Since mmap() isn't targeted for this patchset, we just split huge page on page fault. To minimize memory overhead for small file we setup fops->release helper -- simple_thp_release() -- which splits the last page in file, when last writer goes away. truncate_inode_pages_range() drops whole huge page at once if it's fully inside the range. If a huge page is only partly in the range we zero out the part, exactly like we do for partial small pages. split_huge_page() for file pages works similar to anon pages, but we walk by mapping->i_mmap rather then anon_vma->rb_root. At the end we call truncate_inode_pages() to drop small pages beyond i_size, if any. Locking model around split_huge_page() rather complicated and I still don't feel myself confident enough with it. Looks like we need to serialize over i_mutex in split_huge_page(), but it breaks locking ordering for i_mutex->mmap_sem. I don't see how it can be fixed easily. Any ideas are welcome. Performance indicators will be posted separately. Please, review. Kirill A. Shutemov (23): radix-tree: implement preload for multiple contiguous elements memcg, thp: charge huge cache pages thp: compile-time and sysfs knob for thp pagecache thp, mm: introduce mapping_can_have_hugepages() predicate thp: represent file thp pages in meminfo and friends thp, mm: rewrite add_to_page_cache_locked() to support huge pages mm: trace filemap: dump page order block: implement add_bdi_stat() thp, mm: rewrite delete_from_page_cache() to support huge pages thp, mm: warn if we try to use replace_page_cache_page() with THP thp, mm: handle tail pages in page_cache_get_speculative() thp, mm: add event counters for huge page alloc on file write or read thp, mm: allocate huge pages in grab_cache_page_write_begin() thp, mm: naive support of thp in generic_perform_write mm, fs: avoid page allocation beyond i_size on read thp, mm: handle transhuge pages in do_generic_file_read() thp, libfs: initial thp support thp: libfs: introduce simple_thp_release() truncate: support huge pages thp: handle file pages in split_huge_page() thp: wait_split_huge_page(): serialize over i_mmap_mutex too thp, mm: split huge page on mmap file page ramfs: enable transparent huge page cache Documentation/vm/transhuge.txt | 16 ++++ drivers/base/node.c | 4 + fs/libfs.c | 80 ++++++++++++++++++- fs/proc/meminfo.c | 3 + fs/ramfs/file-mmu.c | 3 +- fs/ramfs/inode.c | 6 +- include/linux/backing-dev.h | 10 +++ include/linux/fs.h | 10 +++ include/linux/huge_mm.h | 53 ++++++++++++- include/linux/mmzone.h | 1 + include/linux/page-flags.h | 33 ++++++++ include/linux/pagemap.h | 48 +++++++++++- include/linux/radix-tree.h | 11 +++ include/linux/vm_event_item.h | 4 + include/trace/events/filemap.h | 7 +- lib/radix-tree.c | 41 +++++++--- mm/Kconfig | 12 +++ mm/filemap.c | 171 +++++++++++++++++++++++++++++++++++------ mm/huge_memory.c | 116 ++++++++++++++++++++++++---- mm/memcontrol.c | 2 - mm/memory.c | 4 +- mm/truncate.c | 108 ++++++++++++++++++++------ mm/vmstat.c | 5 ++ 23 files changed, 658 insertions(+), 90 deletions(-) -- 1.8.3.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2013-08-04 2:15 UTC|newest] Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-08-04 2:17 Kirill A. Shutemov [this message] 2013-08-04 2:17 ` [PATCHv5 00/23] Transparent huge page cache: phase 1, everything but mmap() Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 01/23] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-05 11:17 ` Jan Kara 2013-08-05 11:17 ` Jan Kara 2013-08-06 16:34 ` Matthew Wilcox 2013-08-06 16:34 ` Matthew Wilcox 2013-08-06 20:17 ` Jan Kara 2013-08-06 20:17 ` Jan Kara 2013-08-07 16:32 ` Kirill A. Shutemov 2013-08-07 16:32 ` Kirill A. Shutemov 2013-08-07 16:32 ` Kirill A. Shutemov 2013-08-07 20:00 ` Jan Kara 2013-08-07 20:00 ` Jan Kara 2013-08-07 20:24 ` Kirill A. Shutemov 2013-08-07 20:24 ` Kirill A. Shutemov 2013-08-07 20:24 ` Kirill A. Shutemov 2013-08-07 20:36 ` Jan Kara 2013-08-07 20:36 ` Jan Kara 2013-08-07 21:37 ` Kirill A. Shutemov 2013-08-07 21:37 ` Kirill A. Shutemov 2013-08-07 21:37 ` Kirill A. Shutemov 2013-08-08 8:45 ` Kirill A. Shutemov 2013-08-08 8:45 ` Kirill A. Shutemov 2013-08-08 8:45 ` Kirill A. Shutemov 2013-08-08 10:04 ` Jan Kara 2013-08-08 10:04 ` Jan Kara 2013-08-09 11:13 ` Kirill A. Shutemov 2013-08-09 11:13 ` Kirill A. Shutemov 2013-08-09 11:13 ` Kirill A. Shutemov 2013-08-09 11:36 ` Jan Kara 2013-08-09 11:36 ` Jan Kara 2013-08-04 2:17 ` [PATCH 02/23] memcg, thp: charge huge cache pages Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 8:25 ` Michal Hocko 2013-08-04 8:25 ` Michal Hocko 2013-08-04 2:17 ` [PATCH 03/23] thp: compile-time and sysfs knob for thp pagecache Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-09-05 21:53 ` Ning Qu 2013-09-06 11:33 ` Kirill A. Shutemov 2013-09-06 11:33 ` Kirill A. Shutemov 2013-09-06 11:33 ` Kirill A. Shutemov 2013-09-06 17:14 ` Ning Qu 2013-08-04 2:17 ` [PATCH 04/23] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 05/23] thp: represent file thp pages in meminfo and friends Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-30 22:16 ` Ning Qu 2013-09-02 11:36 ` Kirill A. Shutemov 2013-09-02 11:36 ` Kirill A. Shutemov 2013-09-02 20:05 ` Ning Qu 2013-08-04 2:17 ` [PATCH 06/23] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 07/23] mm: trace filemap: dump page order Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 08/23] block: implement add_bdi_stat() Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-05 11:21 ` Jan Kara 2013-08-05 11:21 ` Jan Kara 2013-08-04 2:17 ` [PATCH 09/23] thp, mm: rewrite delete_from_page_cache() to support huge pages Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 10/23] thp, mm: warn if we try to use replace_page_cache_page() with THP Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 11/23] thp, mm: handle tail pages in page_cache_get_speculative() Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 12/23] thp, mm: add event counters for huge page alloc on file write or read Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 13/23] thp, mm: allocate huge pages in grab_cache_page_write_begin() Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 14/23] thp, mm: naive support of thp in generic_perform_write Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 15/23] mm, fs: avoid page allocation beyond i_size on read Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-05 0:29 ` NeilBrown 2013-08-05 0:29 ` NeilBrown 2013-08-04 2:17 ` [PATCH 16/23] thp, mm: handle transhuge pages in do_generic_file_read() Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 17/23] thp, libfs: initial thp support Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 18/23] thp: libfs: introduce simple_thp_release() Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 19/23] truncate: support huge pages Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-05 13:29 ` Jan Kara 2013-08-05 13:29 ` Jan Kara 2013-08-06 20:23 ` Dave Hansen 2013-08-06 20:23 ` Dave Hansen 2013-08-06 20:57 ` Kirill A. Shutemov 2013-08-06 20:57 ` Kirill A. Shutemov 2013-08-06 21:55 ` Dave Hansen 2013-08-06 21:55 ` Dave Hansen 2013-08-09 14:39 ` Kirill A. Shutemov 2013-08-09 14:39 ` Kirill A. Shutemov 2013-08-09 14:39 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 20/23] thp: handle file pages in split_huge_page() Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-06 19:09 ` Ning Qu 2013-08-06 21:09 ` Ning Qu 2013-08-06 21:47 ` Ning Qu 2013-08-09 14:46 ` Kirill A. Shutemov 2013-08-09 14:46 ` Kirill A. Shutemov 2013-08-09 14:46 ` Kirill A. Shutemov 2013-08-09 14:49 ` Ning Qu 2013-08-09 21:24 ` Ning Qu 2013-08-09 21:24 ` Ning Qu 2013-08-04 2:17 ` [PATCH 21/23] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 22/23] thp, mm: split huge page on mmap file page Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov 2013-08-08 20:49 ` Khalid Aziz 2013-08-08 20:49 ` Khalid Aziz 2013-08-09 14:50 ` Kirill A. Shutemov 2013-08-09 14:50 ` Kirill A. Shutemov 2013-08-04 2:17 ` [PATCH 23/23] ramfs: enable transparent huge page cache Kirill A. Shutemov 2013-08-04 2:17 ` Kirill A. Shutemov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1375582645-29274-1-git-send-email-kirill.shutemov@linux.intel.com \ --to=kirill.shutemov@linux.intel.com \ --cc=aarcange@redhat.com \ --cc=ak@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=dave@sr71.net \ --cc=dhillf@gmail.com \ --cc=fengguang.wu@intel.com \ --cc=hughd@google.com \ --cc=jack@suse.cz \ --cc=kirill@shutemov.name \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=quning@google.com \ --cc=viro@zeniv.linux.org.uk \ --cc=willy@linux.intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.