linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pankaj Gupta <pagupta@redhat.com>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	nitesh@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
	david@redhat.com, dave hansen <dave.hansen@intel.com>,
	linux-kernel@vger.kernel.org, willy@infradead.org,
	mhocko@kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org,
	virtio-dev@lists.oasis-open.org, osalvador@suse.de,
	yang zhang wz <yang.zhang.wz@gmail.com>,
	riel@surriel.com, konrad wilk <konrad.wilk@oracle.com>,
	lcapitulino@redhat.com, wei w wang <wei.w.wang@intel.com>,
	aarcange@redhat.com, pbonzini@redhat.com,
	dan j williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v6 0/6] mm / virtio: Provide support for unused page reporting
Date: Fri, 23 Aug 2019 01:16:34 -0400 (EDT)	[thread overview]
Message-ID: <860165703.10076075.1566537394212.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <31b75078d004a1ccf77b710b35b8f847f404de9a.camel@linux.intel.com>


> On Thu, 2019-08-22 at 06:43 -0400, Pankaj Gupta wrote:
> > > This series provides an asynchronous means of reporting to a hypervisor
> > > that a guest page is no longer in use and can have the data associated
> > > with it dropped. To do this I have implemented functionality that allows
> > > for what I am referring to as unused page reporting
> > > 
> > > The functionality for this is fairly simple. When enabled it will
> > > allocate
> > > statistics to track the number of reported pages in a given free area.
> > > When the number of free pages exceeds this value plus a high water value,
> > > currently 32, it will begin performing page reporting which consists of
> > > pulling pages off of free list and placing them into a scatter list. The
> > > scatterlist is then given to the page reporting device and it will
> > > perform
> > > the required action to make the pages "reported", in the case of
> > > virtio-balloon this results in the pages being madvised as MADV_DONTNEED
> > > and as such they are forced out of the guest. After this they are placed
> > > back on the free list, and an additional bit is added if they are not
> > > merged indicating that they are a reported buddy page instead of a
> > > standard buddy page. The cycle then repeats with additional non-reported
> > > pages being pulled until the free areas all consist of reported pages.
> > > 
> > > I am leaving a number of things hard-coded such as limiting the lowest
> > > order processed to PAGEBLOCK_ORDER, and have left it up to the guest to
> > > determine what the limit is on how many pages it wants to allocate to
> > > process the hints. The upper limit for this is based on the size of the
> > > queue used to store the scattergather list.
> > > 
> > > My primary testing has just been to verify the memory is being freed
> > > after
> > > allocation by running memhog 40g on a 40g guest and watching the total
> > > free memory via /proc/meminfo on the host. With this I have verified most
> > > of the memory is freed after each iteration.
> > 
> > I tried to go through the entire patch series. I can see you reported a
> > -3.27 drop from the baseline. If its because of re-faulting the page after
> > host has freed them? Can we avoid freeing all the pages from the guest
> > free_area
> > and keep some pages(maybe some mixed order), so that next allocation is
> > done from
> > the guest itself than faulting to host. This will work with real workload
> > where
> > allocation and deallocation happen at regular intervals.
> > 
> > This can be further optimized based on other factors like host memory
> > pressure etc.
> > 
> > Thanks,
> > Pankaj
> 
> When I originally started implementing and testing this code I was seeing
> less than a 1% regression. I didn't feel like that was really an accurate
> result since it wasn't putting much stress on the changed code so I have
> modified my tests and kernel so that I have memory shuffting and THP
> enabled. In addition I have gone out of my way to lock things down to a
> single NUMA node on my host system as the code I had would sometimes
> perform better than baseline when running the test due to the fact that
> memory was being freed back to the hose and then reallocated which
> actually allowed for better NUMA locality.
> 
> The general idea was I wanted to know what the worst case penalty would be
> for running this code, and it turns out most of that is just the cost of
> faulting back in the pages. By enabling memory shuffling I am forcing the
> memory to churn as pages are added to both the head and tail of the
> free_list. The test itself was modified so that it didn't allocate order 0
> pages and instead was allocating transparent huge pages so the effects
> were as visible as possible. Without that the page faulting overhead would
> mostly fall into the noise of having to allocate the memory as order 0
> pages, that is what I had essentially seen earlier when I was running the
> stock page_fault1 test.

Right. I think the reason is this test is allocating THP's in guest, host side
you are still using order 0 pages, I assume?

> 
> This code does no hinting on anything smaller than either MAX_ORDER - 1 or
> HUGETLB_PAGE_ORDER pages, and it only starts when there are at least 32 of
> them available to hint on. This results in us not starting to perform the
> hinting until there is 64MB to 128MB of memory sitting in the higher order
> regions of the zone.

o.k

> 
> The hinting itself stops as soon as we run out of unhinted pages to pull
> from. When this occurs we let any pages that are freed after that
> accumulate until we get back to 32 pages being free in a given order.
> During this time we should build up the cache of warm pages that you
> mentioned, assuming that shuffling is not enabled.

I was thinking about something like retaining pages to a lower watermark here.
Looks like we still might have few lower order pages in free list if they are
not merged to orders which are hinted. 

> 
> As far as further optimizations I don't think there is anything here that
> prevents us from doing that. For now I am focused on just getting the
> basics in place so we have a foundation to start from.

Agree. Thanks for explaining.

Best rgards,
Pankaj

> 
> Thanks.
> 
> - Alex
> 
> 

  reply	other threads:[~2019-08-23  5:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 14:59 [PATCH v6 0/6] mm / virtio: Provide support for unused page reporting Alexander Duyck
2019-08-21 14:59 ` [PATCH v6 1/6] mm: Adjust shuffle code to allow for future coalescing Alexander Duyck
2019-08-21 14:59 ` [PATCH v6 2/6] mm: Move set/get_pcppage_migratetype to mmzone.h Alexander Duyck
2019-08-21 14:59 ` [PATCH v6 3/6] mm: Use zone and order instead of free area in free_list manipulators Alexander Duyck
2019-08-21 14:59 ` [PATCH v6 4/6] mm: Introduce Reported pages Alexander Duyck
2019-08-22 16:18   ` [virtio-dev] " Nitesh Narayan Lal
2019-08-22 16:24     ` Alexander Duyck
2019-08-21 14:59 ` [PATCH v6 5/6] virtio-balloon: Pull page poisoning config out of free page hinting Alexander Duyck
2019-08-21 15:00 ` [PATCH v6 6/6] virtio-balloon: Add support for providing unused page reports to host Alexander Duyck
2019-08-21 15:00 ` [PATCH v6 QEMU 1/3] virtio-ballon: Implement support for page poison tracking feature Alexander Duyck
2019-08-21 15:00 ` [PATCH v6 QEMU 2/3] virtio-balloon: Add bit to notify guest of unused page reporting Alexander Duyck
2019-08-21 15:00 ` [PATCH v6 QEMU 3/3] virtio-balloon: Provide a interface for " Alexander Duyck
2019-08-22 10:43 ` [PATCH v6 0/6] mm / virtio: Provide support " Pankaj Gupta
2019-08-22 15:32   ` Alexander Duyck
2019-08-23  5:16     ` Pankaj Gupta [this message]
2019-08-23 15:02       ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=860165703.10076075.1566537394212.JavaMail.zimbra@redhat.com \
    --to=pagupta@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=nitesh@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).