All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO
@ 2020-12-21 16:25 Liang Li
  2020-12-22  8:47   ` David Hildenbrand
                   ` (4 more replies)
  0 siblings, 5 replies; 37+ messages in thread
From: Liang Li @ 2020-12-21 16:25 UTC (permalink / raw)
  To: Alexander Duyck, Mel Gorman, Andrew Morton, Andrea Arcangeli,
	Dan Williams, Michael S. Tsirkin, David Hildenbrand, Jason Wang,
	Dave Hansen, Michal Hocko, Liang Li
  Cc: linux-mm, linux-kernel, virtualization

The first version can be found at: https://lkml.org/lkml/2020/4/12/42

Zero out the page content usually happens when allocating pages with
the flag of __GFP_ZERO, this is a time consuming operation, it makes
the population of a large vma area very slowly. This patch introduce
a new feature for zero out pages before page allocation, it can help
to speed up page allocation with __GFP_ZERO.

My original intention for adding this feature is to shorten VM
creation time when SR-IOV devicde is attached, it works good and the
VM creation time is reduced by about 90%.

Creating a VM [64G RAM, 32 CPUs] with GPU passthrough
=====================================================
QEMU use 4K pages, THP is off
                  round1      round2      round3
w/o this patch:    23.5s       24.7s       24.6s
w/ this patch:     10.2s       10.3s       11.2s

QEMU use 4K pages, THP is on
                  round1      round2      round3
w/o this patch:    17.9s       14.8s       14.9s
w/ this patch:     1.9s        1.8s        1.9s
=====================================================

Obviously, it can do more than this. We can benefit from this feature
in the flowing case:

Interactive sence
=================
Shorten application lunch time on desktop or mobile phone, it can help
to improve the user experience. Test shows on a
server [Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz], zero out 1GB RAM by
the kernel will take about 200ms, while some mainly used application
like Firefox browser, Office will consume 100 ~ 300 MB RAM just after
launch, by pre zero out free pages, it means the application launch
time will be reduced about 20~60ms (can be visual sensed?). May be
we can make use of this feature to speed up the launch of Andorid APP
(I didn't do any test for Android).

Virtulization
=============
Speed up VM creation and shorten guest boot time, especially for PCI
SR-IOV device passthrough scenario. Compared with some of the para
vitalization solutions, it is easy to deploy because it’s transparent
to guest and can handle DMA properly in BIOS stage, while the para
virtualization solution can’t handle it well.

Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memory
overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest page
to the VMM, VMM will unmap the corresponding host page for reclaim,
when guest allocate a page just reclaimed, host will allocate a new page
and zero it out for guest, in this case pre zero out free page will help
to speed up the proccess of fault in and reduce the performance impaction.

Speed up kernel routine
=======================
This can’t be guaranteed because we don’t pre zero out all the free pages,
but is true for most case. It can help to speed up some important system
call just like fork, which will allocate zero pages for building page
table. And speed up the process of page fault, especially for huge page
fault. The POC of Hugetlb free page pre zero out has been done.

Security
========
This is a weak version of "introduce init_on_alloc=1 and init_on_free=1
boot options", which zero out page in a asynchronous way. For users can't
tolerate the impaction of 'init_on_alloc=1' or 'init_on_free=1' brings,
this feauture provide another choice.

For the feedback of the first version, cache pollution is the main concern
of the mm guys, On the other hand, this feature is really helpful for
some use case. May be we should let the user decide wether to use it.
So a switch is added in the /sys files, users who don’t like it can turn
off the switch, or by configuring a large batch size to reduce cache
pollution.

To make the whole function works, support of pre zero out free huge pages
should be added for hugetlbfs, I will send another patch for it.

Liang Li (4):
  mm: let user decide page reporting option
  mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
  mm: make page reporing worker works better for low order page
  mm: Add batch size for free page reporting

 drivers/virtio/virtio_balloon.c |   3 +
 include/linux/highmem.h         |  31 +++-
 include/linux/page-flags.h      |  16 +-
 include/linux/page_reporting.h  |   3 +
 include/trace/events/mmflags.h  |   7 +
 mm/Kconfig                      |  10 ++
 mm/Makefile                     |   1 +
 mm/huge_memory.c                |   3 +-
 mm/page_alloc.c                 |   4 +
 mm/page_prezero.c               | 266 ++++++++++++++++++++++++++++++++
 mm/page_prezero.h               |  13 ++
 mm/page_reporting.c             |  49 +++++-
 mm/page_reporting.h             |  16 +-
 13 files changed, 405 insertions(+), 17 deletions(-)
 create mode 100644 mm/page_prezero.c
 create mode 100644 mm/page_prezero.h

Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Liang Li <liliang324@gmail.com>
-- 
2.18.2


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-01-05 10:29 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-21 16:25 [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO Liang Li
2020-12-22  8:47 ` David Hildenbrand
2020-12-22  8:47   ` David Hildenbrand
2020-12-22 11:31   ` Liang Li
2020-12-22 11:31     ` Liang Li
2020-12-22 11:57     ` David Hildenbrand
2020-12-22 11:57       ` David Hildenbrand
2020-12-22 14:00       ` Liang Li
2020-12-22 14:00         ` Liang Li
2020-12-23  8:41         ` David Hildenbrand
2020-12-23  8:41           ` David Hildenbrand
2020-12-23 12:11           ` Liang Li
2020-12-23 12:11             ` Liang Li
2021-01-04 20:18             ` David Hildenbrand
2021-01-04 20:18               ` David Hildenbrand
2021-01-05  2:14               ` Liang Li
2021-01-05  2:14                 ` Liang Li
2021-01-05  9:39                 ` David Hildenbrand
2021-01-05  9:39                   ` David Hildenbrand
2021-01-05 10:22                   ` Liang Li
2021-01-05 10:22                     ` Liang Li
2021-01-05 10:27                     ` David Hildenbrand
2021-01-05 10:27                       ` David Hildenbrand
2020-12-22 12:23 ` Matthew Wilcox
2020-12-22 12:23   ` Matthew Wilcox
2020-12-22 14:42   ` Liang Li
2020-12-22 14:42     ` Liang Li
2021-01-04 12:51     ` Michal Hocko
2021-01-04 13:45       ` Liang Li
2021-01-04 13:45         ` Liang Li
2020-12-22 17:11 ` Daniel Jordan
2020-12-22 19:13 ` Alexander Duyck
2020-12-22 19:13   ` Alexander Duyck
2020-12-22 19:13   ` Alexander Duyck
2021-01-04 12:55 ` Michal Hocko
2021-01-04 14:07   ` Liang Li
2021-01-04 14:07     ` Liang Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.