linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: Nitesh Narayan Lal <nitesh@redhat.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	kvm@vger.kernel.org, david@redhat.com, mst@redhat.com,
	dave.hansen@intel.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org
Cc: yang.zhang.wz@gmail.com, pagupta@redhat.com, riel@surriel.com,
	konrad.wilk@oracle.com, willy@infradead.org,
	lcapitulino@redhat.com, wei.w.wang@intel.com,
	aarcange@redhat.com, pbonzini@redhat.com,
	dan.j.williams@intel.com
Subject: Re: [PATCH v4 4/6] mm: Introduce Reported pages
Date: Thu, 08 Aug 2019 08:11:27 -0700	[thread overview]
Message-ID: <f721f0642c3eff7bf2d07110e16cc4ce199ab23a.camel@linux.intel.com> (raw)
In-Reply-To: <ad008459-5618-0859-82cb-6cc7c8e5dcc4@redhat.com>

On Thu, 2019-08-08 at 09:45 -0400, Nitesh Narayan Lal wrote:
> On 8/7/19 6:42 PM, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > 
> > In order to pave the way for free page reporting in virtualized
> > environments we will need a way to get pages out of the free lists and
> > identify those pages after they have been returned. To accomplish this,
> > this patch adds the concept of a Reported Buddy, which is essentially
> > meant to just be the Uptodate flag used in conjunction with the Buddy
> > page type.
> > 
> > It adds a set of pointers we shall call "boundary" which represents the
> > upper boundary between the unreported and reported pages. The general idea
> > is that in order for a page to cross from one side of the boundary to the
> > other it will need to go through the reporting process. Ultimately a
> > free_list has been fully processed when the boundary has been moved from
> > the tail all they way up to occupying the first entry in the list.
> > 
> > Doing this we should be able to make certain that we keep the reported
> > pages as one contiguous block in each free list. This will allow us to
> > efficiently manipulate the free lists whenever we need to go in and start
> > sending reports to the hypervisor that there are new pages that have been
> > freed and are no longer in use.
> > 
> > An added advantage to this approach is that we should be reducing the
> > overall memory footprint of the guest as it will be more likely to recycle
> > warm pages versus trying to allocate the reported pages that were likely
> > evicted from the guest memory.
> > 
> > Since we will only be reporting one zone at a time we keep the boundary
> > limited to being defined for just the zone we are currently reporting pages
> > from. Doing this we can keep the number of additional pointers needed quite
> > small. To flag that the boundaries are in place we use a single bit
> > in the zone to indicate that reporting and the boundaries are active.
> > 
> > The determination of when to start reporting is based on the tracking of
> > the number of free pages in a given area versus the number of reported
> > pages in that area. We keep track of the number of reported pages per
> > free_area in a separate zone specific area. We do this to avoid modifying
> > the free_area structure as this can lead to false sharing for the highest
> > order with the zone lock which leads to a noticeable performance
> > degradation.
> > 
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  include/linux/mmzone.h         |   40 +++++
> >  include/linux/page-flags.h     |   11 +
> >  include/linux/page_reporting.h |  138 ++++++++++++++++++
> >  mm/Kconfig                     |    5 +
> >  mm/Makefile                    |    1 
> >  mm/memory_hotplug.c            |    1 
> >  mm/page_alloc.c                |  136 +++++++++++++++++
> >  mm/page_reporting.c            |  313 ++++++++++++++++++++++++++++++++++++++++
> >  8 files changed, 637 insertions(+), 8 deletions(-)
> >  create mode 100644 include/linux/page_reporting.h
> >  create mode 100644 mm/page_reporting.c
> > 
> > 

<snip>

> > diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> > new file mode 100644
> > index 000000000000..ae26dd77bce9
> > --- /dev/null
> > +++ b/mm/page_reporting.c
> > @@ -0,0 +1,313 @@

<snip>

> > +/*
> > + * The page reporting cycle consists of 4 stages, fill, report, drain, and idle.
> > + * We will cycle through the first 3 stages until we fail to obtain any
> > + * pages, in that case we will switch to idle.
> > + */
> > +static void page_reporting_cycle(struct zone *zone,
> > +				 struct page_reporting_dev_info *phdev)
> > +{
> > +	/*
> > +	 * Guarantee boundaries and stats are populated before we
> > +	 * start placing reported pages in the zone.
> > +	 */
> > +	if (page_reporting_populate_metadata(zone))
> > +		return;
> > +
> > +	spin_lock(&zone->lock);
> > +
> > +	/* set bit indicating boundaries are present */
> > +	__set_bit(ZONE_PAGE_REPORTING_ACTIVE, &zone->flags);
> > +
> > +	do {
> > +		/* Pull pages out of allocator into a scaterlist */
> > +		unsigned int nents = page_reporting_fill(zone, phdev);
> > +
> > +		/* no pages were acquired, give up */
> > +		if (!nents)
> > +			break;
> > +
> > +		spin_unlock(&zone->lock);
> > +
> > +		/* begin processing pages in local list */
> > +		phdev->report(phdev, nents);
> > +
> > +		spin_lock(&zone->lock);
> > +
> > +		/*
> > +		 * We should have a scatterlist of pages that have been
> > +		 * processed. Return them to their original free lists.
> > +		 */
> > +		page_reporting_drain(zone, phdev);
> > +
> > +		/* keep pulling pages till there are none to pull */
> > +	} while (test_bit(ZONE_PAGE_REPORTING_REQUESTED, &zone->flags));
> > +
> > +	/* processing of the zone is complete, we can disable boundaries */
> > +	__clear_bit(ZONE_PAGE_REPORTING_ACTIVE, &zone->flags);
> > +
> > +	spin_unlock(&zone->lock);
> > +}
> > +
> > +static void page_reporting_process(struct work_struct *work)
> > +{
> > +	struct delayed_work *d_work = to_delayed_work(work);
> > +	struct page_reporting_dev_info *phdev =
> > +		container_of(d_work, struct page_reporting_dev_info, work);
> > +	struct zone *zone = first_online_pgdat()->node_zones;
> > +
> > +	do {
> > +		if (test_bit(ZONE_PAGE_REPORTING_REQUESTED, &zone->flags))
> > +			page_reporting_cycle(zone, phdev);
> > +
> > +		/*
> > +		 * Move to next zone, if at the end of the list
> > +		 * test to see if we can just go into idle.
> > +		 */
> > +		zone = next_zone(zone);
> > +		if (zone)
> > +			continue;
> > +		zone = first_online_pgdat()->node_zones;
> > +
> > +		/*
> > +		 * As long as refcnt has not reached zero there are still
> > +		 * zones to be processed.
> > +		 */
> 
> can you please explain the reason why you are not using
> for_each_populated_zone() here?
> 
> 
> > +	} while (atomic_read(&phdev->refcnt));
> > +}
> > +

It mostly has to do with the way this code evolved. Originally I had this
starting at the last zone that was processed and resuming there with the
assumption that a noisy zone was going to trigger frequent events so why
walk the zones each time. However we aren't starting the loop that often
so I just decided to start at the beginning and walk the zones until I
found the one that had requested the reporting.

Also I probably wouldn't bother with the "populated" version of the call
since we already have a field to search for so it would just be a matter
of creating my own macro that would only give us zones that are requesting
reporting.

The last bit is that really the exit condition isn't that we hit the end
of the zone list. The exit condition for this loop is that phdev->refcnt
is zero. The problem with using for_each_zone is that you would keep
walking the zone list anyway until you hit the end of it even if we have
already processed the zones that requested reporting.


  reply	other threads:[~2019-08-08 15:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-07 22:41 [PATCH v4 0/6] mm / virtio: Provide support for unused page reporting Alexander Duyck
2019-08-07 22:41 ` [PATCH v4 1/6] mm: Adjust shuffle code to allow for future coalescing Alexander Duyck
2019-08-07 22:41 ` [PATCH v4 2/6] mm: Move set/get_pcppage_migratetype to mmzone.h Alexander Duyck
2019-08-07 22:42 ` [PATCH v4 3/6] mm: Use zone and order instead of free area in free_list manipulators Alexander Duyck
2019-08-07 22:42 ` [PATCH v4 4/6] mm: Introduce Reported pages Alexander Duyck
2019-08-08 13:45   ` Nitesh Narayan Lal
2019-08-08 15:11     ` Alexander Duyck [this message]
2019-08-07 22:42 ` [PATCH v4 5/6] virtio-balloon: Pull page poisoning config out of free page hinting Alexander Duyck
2019-08-07 22:42 ` [PATCH v4 6/6] virtio-balloon: Add support for providing unused page reports to host Alexander Duyck
2019-08-12  9:53   ` Michael S. Tsirkin
2019-08-12 15:20     ` Alexander Duyck
2019-08-12 15:43       ` Michael S. Tsirkin
2019-08-12 15:49       ` David Hildenbrand
2019-08-12 16:09         ` Alexander Duyck
2019-08-07 22:43 ` [PATCH v4 QEMU 1/3] virtio-ballon: Implement support for page poison tracking feature Alexander Duyck
2019-08-07 22:43 ` [PATCH v4 QEMU 2/3] virtio-balloon: Add bit to notify guest of unused page reporting Alexander Duyck
2019-08-07 22:43 ` [PATCH v4 QEMU 3/3] virtio-balloon: Provide a interface for " Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f721f0642c3eff7bf2d07110e16cc4ce199ab23a.camel@linux.intel.com \
    --to=alexander.h.duyck@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=nitesh@redhat.com \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    --subject='Re: [PATCH v4 4/6] mm: Introduce Reported pages' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox