linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nitesh Narayan Lal <nitesh@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
	kvm@vger.kernel.org, mst@redhat.com,
	linux-kernel@vger.kernel.org, willy@infradead.org,
	mhocko@kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org,
	mgorman@techsingularity.net, vbabka@suse.cz
Cc: yang.zhang.wz@gmail.com, konrad.wilk@oracle.com,
	david@redhat.com, pagupta@redhat.com, riel@surriel.com,
	lcapitulino@redhat.com, dave.hansen@intel.com,
	wei.w.wang@intel.com, aarcange@redhat.com, pbonzini@redhat.com,
	dan.j.williams@intel.com, alexander.h.duyck@linux.intel.com,
	osalvador@suse.de
Subject: Re: [PATCH v12 0/6] mm / virtio: Provide support for unused page reporting
Date: Wed, 23 Oct 2019 07:35:41 -0400	[thread overview]
Message-ID: <c50e102c-f72e-df8a-714f-a33897ddbb9f@redhat.com> (raw)
In-Reply-To: <20191022221223.17338.5860.stgit@localhost.localdomain>


On 10/22/19 6:27 PM, Alexander Duyck wrote:
> This series provides an asynchronous means of reporting unused guest
> pages to a hypervisor so that the memory associated with those pages can
> be dropped and reused by other processes and/or guests.
>
> When enabled it will allocate a set of statistics to track the number of
> reported pages. When the nr_free for a given free_area is greater than
> this by the high water mark we will schedule a worker to begin allocating
> the non-reported memory and to provide it to the reporting interface via a
> scatterlist.
>
> Currently this is only in use by virtio-balloon however there is the hope
> that at some point in the future other hypervisors might be able to make
> use of it. In the virtio-balloon/QEMU implementation the hypervisor is
> currently using MADV_DONTNEED to indicate to the host kernel that the page
> is currently unused. It will be faulted back into the guest the next time
> the page is accessed.
>
> To track if a page is reported or not the Uptodate flag was repurposed and
> used as a Reported flag for Buddy pages. While we are processing the pages
> in a given zone we have a set of pointers we track called
> reported_boundary that is used to keep our processing time to a minimum.
> Without these we would have to iterate through all of the reported pages
> which would become a significant burden. I measured as much as a 20%
> performance degradation without using the boundary pointers. In the event
> of something like compaction needing to process the zone at the same time
> it currently resorts to resetting the boundary if it is rearranging the
> list. However in the future it could choose to delay processing the zone
> if a flag is set indicating that a zone is being actively processed.
>
> Below are the results from various benchmarks. I primarily focused on two
> tests. The first is the will-it-scale/page_fault2 test, and the other is
> a modified version of will-it-scale/page_fault1 that was enabled to use
> THP. I did this as it allows for better visibility into different parts
> of the memory subsystem. The guest is running on one node of a E5-2630 v3
> CPU with 48G of RAM that I split up into two logical nodes in the guest
> in order to test with NUMA as well.
>
> Test		    page_fault1 (THP)     page_fault2
> Baseline	 1  1256106.33  +/-0.09%   482202.67  +/-0.46%
>                 16  8864441.67  +/-0.09%  3734692.00  +/-1.23%
>
> Patches applied  1  1257096.00  +/-0.06%   477436.00  +/-0.16%
>                 16  8864677.33  +/-0.06%  3800037.00  +/-0.19%
>
> Patches enabled	 1  1258420.00  +/-0.04%   480080.00  +/-0.07%
>  MADV disabled  16  8753840.00  +/-1.27%  3782764.00  +/-0.37%
>
> Patches enabled	 1  1267916.33  +/-0.08%   472075.67  +/-0.39%
>                 16  8287050.33  +/-0.67%  3774500.33  +/-0.11%
>
> The results above are for a baseline with a linux-next-20191021 kernel,
> that kernel with this patch set applied but page reporting disabled in
> virtio-balloon, patches applied but the madvise disabled by direct
> assigning a device, and the patches applied and page reporting fully
> enabled.  These results include the deviation seen between the average
> value reported here versus the high and/or low value. I observed that
> during the test the memory usage for the first three tests never dropped
> whereas with the patches fully enabled the VM would drop to using only a
> few GB of the host's memory when switching from memhog to page fault tests.
>
> Most of the overhead seen with this patch set fully enabled is due to the
> fact that accessing the reported pages will cause a page fault and the host
> will have to zero the page before giving it back to the guest. The overall
> guest size is kept fairly small to only a few GB while the test is running.
> This overhead is much more visible when using THP than with standard 4K
> pages. As such for the case where the host memory is not oversubscribed
> this results in a performance regression, however if the host memory were
> oversubscribed this patch set should result in a performance improvement
> as swapping memory from the host can be avoided.
>
> There is currently an alternative patch set[1] that has been under work
> for some time however the v12 version of that patch set could not be
> tested as it triggered a kernel panic when I attempted to test it. It
> requires multiple modifications to get up and running with performance
> comparable to this patch set. A follow-on set has yet to be posted. As
> such I have not included results from that patch set, and I would
> appreciate it if we could keep this patch set the focus of any discussion
> on this thread.
>
> For info on earlier versions you will need to follow the links provided
> with the respective versions.
>
> [1]: https://lore.kernel.org/lkml/20190812131235.27244-1-nitesh@redhat.com/
>
> Changes from v10:
> https://lore.kernel.org/lkml/20190918175109.23474.67039.stgit@localhost.localdomain/
> Rebased on "Add linux-next specific files for 20190930"
> Added page_is_reported() macro to prevent unneeded testing of PageReported bit
> Fixed several spots where comments referred to older aeration naming
> Set upper limit for phdev->capacity to page reporting high water mark
> Updated virtio page poison detection logic to also cover init_on_free
> Tweaked page_reporting_notify_free to reduce code size
> Removed dead code in non-reporting path
>
> Changes from v11:
> https://lore.kernel.org/lkml/20191001152441.27008.99285.stgit@localhost.localdomain/
> Removed unnecessary whitespace change from patch 2
> Minor tweak to get_unreported_page to avoid excess writes to boundary
> Rewrote cover page to lay out additional performance info.
>
> ---
>
> Alexander Duyck (6):
>       mm: Adjust shuffle code to allow for future coalescing
>       mm: Use zone and order instead of free area in free_list manipulators
>       mm: Introduce Reported pages
>       mm: Add device side and notifier for unused page reporting
>       virtio-balloon: Pull page poisoning config out of free page hinting
>       virtio-balloon: Add support for providing unused page reports to host
>
>
>  drivers/virtio/Kconfig              |    1 
>  drivers/virtio/virtio_balloon.c     |   88 ++++++++-
>  include/linux/mmzone.h              |   60 ++----
>  include/linux/page-flags.h          |   11 +
>  include/linux/page_reporting.h      |   31 +++
>  include/uapi/linux/virtio_balloon.h |    1 
>  mm/Kconfig                          |   11 +
>  mm/Makefile                         |    1 
>  mm/compaction.c                     |    5 
>  mm/memory_hotplug.c                 |    2 
>  mm/page_alloc.c                     |  194 +++++++++++++++----
>  mm/page_reporting.c                 |  353 +++++++++++++++++++++++++++++++++++
>  mm/page_reporting.h                 |  225 ++++++++++++++++++++++
>  mm/shuffle.c                        |   12 +
>  mm/shuffle.h                        |    6 +
>  15 files changed, 899 insertions(+), 102 deletions(-)
>  create mode 100644 include/linux/page_reporting.h
>  create mode 100644 mm/page_reporting.c
>  create mode 100644 mm/page_reporting.h
>
> --
>

I think Michal Hocko suggested us to include a brief detail about the background
explaining how we ended up with the current approach and what all things we have
already tried.
That would help someone reviewing the patch-series for the first time to
understand it in a better way.

--
Nitesh



  parent reply	other threads:[~2019-10-23 11:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-22 22:27 [PATCH v12 0/6] mm / virtio: Provide support for unused page reporting Alexander Duyck
2019-10-22 22:27 ` [PATCH v12 1/6] mm: Adjust shuffle code to allow for future coalescing Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 2/6] mm: Use zone and order instead of free area in free_list manipulators Alexander Duyck
2019-10-23  8:26   ` David Hildenbrand
2019-10-23 15:16     ` Alexander Duyck
2019-10-24  9:32       ` David Hildenbrand
2019-10-24 15:19         ` Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 3/6] mm: Introduce Reported pages Alexander Duyck
2019-10-22 23:03   ` Andrew Morton
2019-10-22 23:25     ` Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 4/6] mm: Add device side and notifier for unused page reporting Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 5/6] virtio-balloon: Pull page poisoning config out of free page hinting Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 6/6] virtio-balloon: Add support for providing unused page reports to host Alexander Duyck
2019-10-22 22:29 ` [PATCH v12 QEMU 1/3] virtio-ballon: Implement support for page poison tracking feature Alexander Duyck
2019-10-22 22:29 ` [PATCH v12 QEMU 2/3] virtio-balloon: Add bit to notify guest of unused page reporting Alexander Duyck
2019-10-22 22:29 ` [PATCH v12 QEMU 3/3] virtio-balloon: Provide a interface for " Alexander Duyck
2019-10-22 23:01 ` [PATCH v12 0/6] mm / virtio: Provide support " Andrew Morton
2019-10-22 23:43   ` Alexander Duyck
2019-10-23 11:19     ` Nitesh Narayan Lal
2019-10-23 11:35 ` Nitesh Narayan Lal [this message]
2019-10-23 22:24   ` Alexander Duyck
2019-10-28 14:34 ` Nitesh Narayan Lal
2019-10-28 15:24   ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c50e102c-f72e-df8a-714f-a33897ddbb9f@redhat.com \
    --to=nitesh@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).