[RFC 0/3] THP Shrinker

* [RFC 0/3] THP Shrinker
@ 2022-08-25 21:30 alexlzhu
  2022-08-25 21:30 ` [RFC 1/3] mm: add thp_utilization metrics to debugfs alexlzhu
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: alexlzhu @ 2022-08-25 21:30 UTC (permalink / raw)
  To: linux-mm
  Cc: willy, hannes, akpm, riel, kernel-team, linux-kernel, Alexander Zhu

From: Alexander Zhu <alexlzhu@fb.com>

Transparent Hugepages use a larger page size of 2MB in comparison to
normal sized pages that are 4kb. A larger page size allows for fewer TLB
cache misses and thus more efficient use of the CPU. Using a larger page
size also results in more memory waste, which can hurt performance in some
use cases. THPs are currently enabled in the Linux Kernel by applications
in limited virtual address ranges via the madvise system call.  The THP
shrinker tries to find a balance between increased use of THPs, and
increased use of memory. It shrinks the size of memory by removing the
underutilized THPs that are identified by the thp_utilization scanner. 

In our experiments we have noticed that the least utilized THPs are almost
entirely unutilized.

Sample Output: 

Utilized[0-50]: 1331 680884
Utilized[51-101]: 9 3983
Utilized[102-152]: 3 1187
Utilized[153-203]: 0 0
Utilized[204-255]: 2 539
Utilized[256-306]: 5 1135
Utilized[307-357]: 1 192
Utilized[358-408]: 0 0
Utilized[409-459]: 1 57
Utilized[460-512]: 400 13
Last Scan Time: 223.98
Last Scan Duration: 70.65

Above is a sample obtained from one of our test machines when THP is always
enabled. Of the 1331 THPs in this thp_utilization sample that have from
0-50 utilized subpages, we see that there are 680884 free pages. This
comes out to 680884 / (512 * 1331) = 99.91% zero pages in the least
utilized bucket. This represents 680884 * 4KB = 2.7GB memory waste.

Also note that the vast majority of pages are either in the least utilized
[0-50] or most utilized [460-512] buckets. The least utilized THPs are 
responsible for almost all of the memory waste when THP is always 
enabled. Thus by clearing out THPs in the lowest utilization bucket
we extract most of the improvement in CPU efficiency. We have seen 
similar results on our production hosts.

This patchset introduces the THP shrinker we have developed to identify
and split the least utilized THPs. It includes the thp_utilization 
changes that groups anonymous THPs into buckets, the split_huge_page()
changes that identify and zap zero 4KB pages within THPs and the shrinker
changes. It should be noted that the split_huge_page() changes are based
off previous work done by Yu Zhao. 

In the future, we intend to allow additional tuning to the shrinker
based on workload depending on CPU/IO/Memory pressure and the 
amount of anonymous memory. The long term goal is to eventually always 
enable THP for all applications and deprecate madvise entirely.

Alexander Zhu (3):
  mm: add thp_utilization metrics to debugfs
  mm: changes to split_huge_page() to free zero filled tail pages
  mm: THP low utilization shrinker

 Documentation/admin-guide/mm/transhuge.rst    |   9 +
 include/linux/huge_mm.h                       |   9 +
 include/linux/list_lru.h                      |  24 ++
 include/linux/mm_types.h                      |   5 +
 include/linux/rmap.h                          |   2 +-
 include/linux/vm_event_item.h                 |   2 +
 mm/huge_memory.c                              | 333 +++++++++++++++++-
 mm/list_lru.c                                 |  49 +++
 mm/migrate.c                                  |  60 +++-
 mm/migrate_device.c                           |   4 +-
 mm/page_alloc.c                               |   6 +
 mm/vmstat.c                                   |   2 +
 .../selftests/vm/split_huge_page_test.c       |  58 ++-
 tools/testing/selftests/vm/vm_util.c          |  23 ++
 tools/testing/selftests/vm/vm_util.h          |   1 +
 15 files changed, 569 insertions(+), 18 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 15+ messages in thread