linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: "Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Jan Kara <jack@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-block@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCHv1, RFC 00/33] ext4: support of huge pages
Date: Tue, 26 Jul 2016 03:35:02 +0300	[thread overview]
Message-ID: <1469493335-3622-1-git-send-email-kirill.shutemov@linux.intel.com> (raw)

Here's the first version of my patchset which intended to bring huge pages
to ext4. It's not yet ready for applying or serious use, but good enough
to show the approach.

The basics are the same as with tmpfs[1] which is in -mm tree and ext4
built on of it. The main difference is that we need to handle read out
from and write-back to backing storage.

Head page links buffers for whole huge page. Dirty/writeback tracking
happens on per-hugepage level.

We read out whole huge page at once. It required bumping BIO_MAX_PAGES to
512 to get it work on x86-64, which is hack. I'm not sure how to handle it
properly.

Readahead doesn't play with huge pages well too: 128k max readahead window,
assumption on page size, PageReadahead() to track hit/miss.
I've got it to allocate huge pages, but it doesn't provide any readahead as
such. I don't know how to do this right.

Unlike tmpfs, ext4 makes use of tags in radix-tree. The approach I used
for tmpfs -- 512 entries in radix-tree per-hugepages -- doesn't work well
if we want to have coherent view on tags. So the first 8 patches of the
patchset converts tmpfs to use multi-order entries in radix-tree.
The same infrastructure used for ext4.

Writeback works for simple cases, but xfstests manages to trigger BUG_ON()
eventually. That's what I work on currently. My understanding of writeback
process is still rather limited and any help would be appreciated.

For now I try to make xfstests run smoothly on filesystem with huge=always
and 4k block size. Once it will be done, I'll widen testing to 1k blocks,
encryption and bigalloc.

Any comments?

[1] http://lkml.kernel.org/r/1465222029-45942-1-git-send-email-kirill.shutemov@linux.intel.com

TODO:
  - stabilize writeback;
  - make ext4_move_extents() work with huge pages (split them?);
  - check if memory reclaim process is adequate for huge pages with
    backing storage (unnecessary split_huge_page() ?);
  - handle shadow entries properly;
  - encryption, 1k blocks, bigalloc, ...
Kirill A. Shutemov (27):
  mm, shmem: swich huge tmpfs to multi-order radix-tree entries
  Revert "radix-tree: implement radix_tree_maybe_preload_order()"
  page-flags: relax page flag poliry for PG_error and PG_writeback
  mm, rmap: account file thp pages
  thp: allow splitting non-shmem file-backed THPs
  truncate: make sure invalidate_mapping_pages() can discard huge pages
  filemap: allocate huge page in page_cache_read(), if allowed
  filemap: handle huge pages in do_generic_file_read()
  filemap: allocate huge page in pagecache_get_page(), if allowed
  filemap: handle huge pages in filemap_fdatawait_range()
  HACK: readahead: alloc huge pages, if allowed
  HACK: block: bump BIO_MAX_PAGES
  mm: make write_cache_pages() work on huge pages
  thp: introduce hpage_size() and hpage_mask()
  fs: make block_read_full_page() be able to read huge page
  fs: make block_write_{begin,end}() be able to handle huge pages
  fs: make block_page_mkwrite() aware about huge pages
  truncate: make truncate_inode_pages_range() aware about huge pages
  ext4: make ext4_mpage_readpages() hugepage-aware
  ext4: make ext4_writepage() work on huge pages
  ext4: handle huge pages in ext4_page_mkwrite()
  ext4: handle huge pages in __ext4_block_zero_page_range()
  ext4: handle huge pages in ext4_da_write_end()
  ext4: relax assert in ext4_da_page_release_reservation()
  WIP: ext4: handle writeback with huge pages
  mm, fs, ext4: expand use of page_mapping() and page_to_pgoff()
  ext4, vfs: add huge= mount option

Matthew Wilcox (6):
  tools: Add WARN_ON_ONCE
  radix tree test suite: Allow GFP_ATOMIC allocations to fail
  radix-tree: Add radix_tree_join
  radix-tree: Add radix_tree_split
  radix-tree: Add radix_tree_split_preload()
  radix-tree: Handle multiorder entries being deleted by
    replace_clear_tags

 drivers/base/node.c                   |   6 +
 fs/buffer.c                           |  89 ++++---
 fs/ext4/ext4.h                        |   5 +
 fs/ext4/inode.c                       | 106 +++++---
 fs/ext4/page-io.c                     |  11 +-
 fs/ext4/readpage.c                    |  38 ++-
 fs/ext4/super.c                       |  19 ++
 fs/proc/meminfo.c                     |   4 +
 fs/proc/task_mmu.c                    |   5 +-
 include/linux/bio.h                   |   2 +-
 include/linux/buffer_head.h           |   9 +-
 include/linux/fs.h                    |   5 +
 include/linux/huge_mm.h               |  16 ++
 include/linux/mm.h                    |   1 +
 include/linux/mmzone.h                |   2 +
 include/linux/page-flags.h            |   8 +-
 include/linux/pagemap.h               |  22 +-
 include/linux/radix-tree.h            |  10 +-
 lib/radix-tree.c                      | 357 ++++++++++++++++++--------
 mm/filemap.c                          | 458 +++++++++++++++++++++++-----------
 mm/huge_memory.c                      |  51 +++-
 mm/khugepaged.c                       |  26 +-
 mm/memory.c                           |   4 +-
 mm/page-writeback.c                   |  19 +-
 mm/page_alloc.c                       |   5 +
 mm/readahead.c                        |  16 +-
 mm/rmap.c                             |  12 +-
 mm/shmem.c                            |  36 +--
 mm/truncate.c                         | 106 +++++++-
 mm/vmstat.c                           |   2 +
 tools/include/asm/bug.h               |  11 +
 tools/testing/radix-tree/Makefile     |   2 +-
 tools/testing/radix-tree/linux.c      |   7 +-
 tools/testing/radix-tree/linux/bug.h  |   2 +-
 tools/testing/radix-tree/linux/gfp.h  |  24 +-
 tools/testing/radix-tree/linux/slab.h |   5 -
 tools/testing/radix-tree/multiorder.c |  82 ++++++
 tools/testing/radix-tree/test.h       |   9 +
 38 files changed, 1162 insertions(+), 430 deletions(-)

-- 
2.8.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2016-07-26  0:35 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-26  0:35 Kirill A. Shutemov [this message]
2016-07-26  0:35 ` [PATCHv1, RFC 01/33] tools: Add WARN_ON_ONCE Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 02/33] radix tree test suite: Allow GFP_ATOMIC allocations to fail Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 03/33] radix-tree: Add radix_tree_join Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 04/33] radix-tree: Add radix_tree_split Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 05/33] radix-tree: Add radix_tree_split_preload() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 06/33] radix-tree: Handle multiorder entries being deleted by replace_clear_tags Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 07/33] mm, shmem: swich huge tmpfs to multi-order radix-tree entries Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 08/33] Revert "radix-tree: implement radix_tree_maybe_preload_order()" Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 09/33] page-flags: relax page flag poliry for PG_error and PG_writeback Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 10/33] mm, rmap: account file thp pages Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 11/33] thp: allow splitting non-shmem file-backed THPs Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 12/33] truncate: make sure invalidate_mapping_pages() can discard huge pages Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 13/33] filemap: allocate huge page in page_cache_read(), if allowed Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 14/33] filemap: handle huge pages in do_generic_file_read() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 15/33] filemap: allocate huge page in pagecache_get_page(), if allowed Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 16/33] filemap: handle huge pages in filemap_fdatawait_range() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 17/33] HACK: readahead: alloc huge pages, if allowed Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 18/33] HACK: block: bump BIO_MAX_PAGES Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 19/33] mm: make write_cache_pages() work on huge pages Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 20/33] thp: introduce hpage_size() and hpage_mask() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 21/33] fs: make block_read_full_page() be able to read huge page Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 22/33] fs: make block_write_{begin,end}() be able to handle huge pages Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 23/33] fs: make block_page_mkwrite() aware about " Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 24/33] truncate: make truncate_inode_pages_range() " Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 25/33] ext4: make ext4_mpage_readpages() hugepage-aware Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 26/33] ext4: make ext4_writepage() work on huge pages Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 27/33] ext4: handle huge pages in ext4_page_mkwrite() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 28/33] ext4: handle huge pages in __ext4_block_zero_page_range() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 29/33] ext4: handle huge pages in ext4_da_write_end() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 30/33] ext4: relax assert in ext4_da_page_release_reservation() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 31/33] WIP: ext4: handle writeback with huge pages Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 32/33] mm, fs, ext4: expand use of page_mapping() and page_to_pgoff() Kirill A. Shutemov
2016-07-26  0:35 ` [PATCHv1, RFC 33/33] ext4, vfs: add huge= mount option Kirill A. Shutemov
2016-07-26 17:29 ` [PATCHv1, RFC 00/33] ext4: support of huge pages Theodore Ts'o
2016-07-26 19:12   ` Kirill A. Shutemov
2016-07-27  9:17     ` Jan Kara
2016-07-27 10:33       ` Kirill A. Shutemov
2016-07-27 14:09         ` Andrea Arcangeli
2016-08-10  0:54 ` [PATCH] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1469493335-3622-1-git-send-email-kirill.shutemov@linux.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=jack@suse.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).