[RFC v6 00/51] Large pages in the page cache

* [RFC v6 00/51] Large pages in the page cache
@ 2020-06-10 20:12 Matthew Wilcox
  2020-06-10 20:12 ` [PATCH v6 01/51] mm: Print head flags in dump_page Matthew Wilcox
                   ` (52 more replies)
  0 siblings, 53 replies; 60+ messages in thread
From: Matthew Wilcox @ 2020-06-10 20:12 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Another fortnight, another dump of my current large pages work.
I've squished a lot of bugs this time.  xfstests is much happier now,
running for 1631 seconds and getting as far as generic/086.  This patchset
is getting a little big, so I'm going to try to get some bits of it
upstream soon (the bits that make sense regardless of whether the rest
of this is merged).

It's now based on linus' master (6f630784cc0d), and you can get it from
http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-pagecache
if you'd rather see it there (this branch is force-pushed frequently)

The primary idea here is that a large part of the overhead in dealing
with individual pages is that there's just so darned many of them.
We would be better off dealing with fewer, larger pages, even if they
don't get to be the size necessary for the CPU to use a larger TLB entry.

The approach taken is to make THPs support arbitrary power-of-two sizes
(instead of just PMDs).  There's probably some tuning to be done to decide
what sizes are worth using, but we're a fair way from doing performance
work with this patchset yet.

TODO:
 - Fix arc/arm/arm64/mips/powerpc/space flush_dcache_page() to
   support THPs natively
 - Actually create large pages for sufficiently large writes
 - Copy in larger chunks for write() in iomap
 - More bug fixing

v6:
 - Improved debug output for large pages (will send to Andrew soon)
 - Make compound_nr() more efficient (will send to Andrew soon)
 - Renamed hpage_nr_pages() to thp_nr_pages()
 - Added thp_head()
 - Set the THP_SUPPORT flag in shmfs
 - Change zero_user_segments() to call flush_dcache_page() once for the
   head page instead of once for each subpage.  The architectures listed
   above need to be fixed.
 - Fix shmem & truncate to call zero_user_segment() with the head page
 - Fix page_is_mergeable() for THPs
 - Fix a bug in iomap_iop_set_range_uptodate() where I was assuming that
   the offset was block-aligned
 - Fix a few more places that assume unsigned int is large enough to hold
   offset/length within a page
 - Fix doing writeback of a page after discarding its iop due to a partial
   truncate
 - Convert the iomap write paths more comprehensively.  That's now four
   separate patches
v5:
 - Add a mapping AS_LARGE_PAGES flag to reduce the levels of indirection
   (Dave Chinner)
 - Change iomap_invalidate_page() to handle subpages of a THP being punched
 - Ensure we don't call page_cache_async_readahead() with a tail page
 - Revert to Bill's original patch for thp_get_unmapped_area() to allow
   for hardware page sizes other than PMD to be supported more easily
 - Remove a few more HPAGE_PMD_NR
 - Move shmem_punch_compound() to truncate.c and rename it to punch_thp()
 - Add support for page_private to punch_thp()

v4:
 - Fix thp_size typo
 - Fix the iomap page_mkwrite() path to operate on the head page, even
   though the vm_fault has a pointer to the tail page
 - Fix iomap_finish_ioend() to use bio_for_each_thp_segment_all()
 - Rework PageDoubleMap (see first two patches for details)
 - Fix page_cache_delete() to handle shadow entries being stored to a THP
 - Fix the assertion in pagecache_get_page() to handle tail pages
 - Change PageReadahead from NO_COMPOUND to ONLY_HEAD
 - Handle PageReadahead being set on head pages
 - Handle total_mapcount correctly (Kirill)
 - Pull the FS_LARGE_PAGES check out into mapping_large_pages()
 - Fix page size assumption in truncate_cleanup_page()
 - Avoid splitting large pages unnecessarily on truncate
 - Disable the page cache truncation introduced as part of the read-only
   THP patch set
 - Call compound_head() in iomap buffered write paths -- we retrieve a
   (potentially) tail page from the page cache and need to use that for
   flush_dcache_page(), but we expect to operate on a head page in most
   of the iomap code

Kirill A. Shutemov (1):
  mm: Fix total_mapcount assumption of page size

Matthew Wilcox (Oracle) (49):
  mm: Print head flags in dump_page
  mm: Print the inode number in dump_page
  mm: Print hashed address of struct page
  mm: Move PageDoubleMap bit
  mm: Simplify PageDoubleMap with PF_SECOND policy
  mm: Store compound_nr as well as compound_order
  mm: Move page-flags include to top of file
  mm: Add thp_order
  mm: Add thp_size
  mm: Replace hpage_nr_pages with thp_nr_pages
  mm: Add thp_head
  mm: Introduce offset_in_thp
  mm: Support arbitrary THP sizes
  fs: Add a filesystem flag for THPs
  fs: Do not update nr_thps for mappings which support THPs
  fs: Introduce i_blocks_per_page
  fs: Make page_mkwrite_check_truncate thp-aware
  mm: Support THPs in zero_user_segments
  mm: Zero the head page, not the tail page
  block: Add bio_for_each_thp_segment_all
  block: Support THPs in page_is_mergeable
  iomap: Support arbitrarily many blocks per page
  iomap: Support THPs in iomap_adjust_read_range
  iomap: Support THPs in invalidatepage
  iomap: Support THPs in read paths
  iomap: Convert iomap_write_end types
  iomap: Change calling convention for zeroing
  iomap: Change iomap_write_begin calling convention
  iomap: Support THPs in write paths
  iomap: Inline data shouldn't see THPs
  iomap: Handle tail pages in iomap_page_mkwrite
  xfs: Support THPs
  mm: Make prep_transhuge_page return its argument
  mm: Add __page_cache_alloc_order
  mm: Allow THPs to be added to the page cache
  mm: Allow THPs to be removed from the page cache
  mm: Remove page fault assumption of compound page size
  mm: Remove assumptions of THP size
  mm: Avoid splitting THPs
  mm: Fix truncation for pages of arbitrary size
  mm: Handle truncates that split THPs
  mm: Support storing shadow entries for THPs
  mm: Support retrieving tail pages from the page cache
  mm: Support tail pages in wait_for_stable_page
  mm: Add DEFINE_READAHEAD
  mm: Make page_cache_readahead_unbounded take a readahead_control
  mm: Make __do_page_cache_readahead take a readahead_control
  mm: Allow PageReadahead to be set on head pages
  mm: Add THP readahead

William Kucharski (1):
  mm: Align THP mappings for non-DAX

 block/bio.c                |   2 +-
 drivers/nvdimm/btt.c       |   4 +-
 drivers/nvdimm/pmem.c      |   6 +-
 fs/dax.c                   |  13 +-
 fs/ext4/verity.c           |   4 +-
 fs/f2fs/verity.c           |   4 +-
 fs/inode.c                 |   2 +
 fs/iomap/buffered-io.c     | 250 +++++++++++++++++++------------------
 fs/jfs/jfs_metapage.c      |   2 +-
 fs/xfs/xfs_aops.c          |   4 +-
 fs/xfs/xfs_super.c         |   2 +-
 include/linux/bio.h        |  13 ++
 include/linux/bvec.h       |  23 ++++
 include/linux/dax.h        |   3 +-
 include/linux/fs.h         |  28 +----
 include/linux/highmem.h    |  11 +-
 include/linux/huge_mm.h    |  65 ++++++++--
 include/linux/mm.h         |  46 +++----
 include/linux/mm_inline.h  |   6 +-
 include/linux/mm_types.h   |   1 +
 include/linux/page-flags.h |  46 ++-----
 include/linux/pagemap.h    | 102 ++++++++++++---
 mm/compaction.c            |   2 +-
 mm/debug.c                 |  23 ++--
 mm/filemap.c               | 101 +++++++++------
 mm/gup.c                   |   2 +-
 mm/highmem.c               |  62 ++++++++-
 mm/huge_memory.c           |  38 +++---
 mm/hugetlb.c               |   2 +-
 mm/internal.h              |  17 +--
 mm/memcontrol.c            |  10 +-
 mm/memory.c                |   7 +-
 mm/memory_hotplug.c        |   7 +-
 mm/mempolicy.c             |   2 +-
 mm/migrate.c               |  16 +--
 mm/mlock.c                 |   9 +-
 mm/page-writeback.c        |   1 +
 mm/page_alloc.c            |   5 +-
 mm/page_io.c               |   4 +-
 mm/page_vma_mapped.c       |   6 +-
 mm/readahead.c             | 145 ++++++++++++++++-----
 mm/rmap.c                  |  18 +--
 mm/shmem.c                 |  39 ++----
 mm/swap.c                  |  16 +--
 mm/swap_state.c            |   6 +-
 mm/swapfile.c              |   2 +-
 mm/truncate.c              |  70 ++++++++++-
 mm/vmscan.c                |  12 +-
 48 files changed, 795 insertions(+), 464 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 60+ messages in thread