linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: kvm@vger.kernel.org, david@redhat.com, mst@redhat.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, yang.zhang.wz@gmail.com,
	pagupta@redhat.com, konrad.wilk@oracle.com, nitesh@redhat.com,
	riel@surriel.com, willy@infradead.org, lcapitulino@redhat.com,
	dave.hansen@intel.com, wei.w.wang@intel.com, aarcange@redhat.com,
	pbonzini@redhat.com, dan.j.williams@intel.com, mhocko@kernel.org,
	alexander.h.duyck@linux.intel.com, vbabka@suse.cz,
	osalvador@suse.de
Subject: Re: [PATCH v17 4/9] mm: Introduce Reported pages
Date: Wed, 19 Feb 2020 14:55:11 +0000	[thread overview]
Message-ID: <20200219145511.GS3466@techsingularity.net> (raw)
In-Reply-To: <20200211224635.29318.19750.stgit@localhost.localdomain>

On Tue, Feb 11, 2020 at 02:46:35PM -0800, Alexander Duyck wrote:
> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> new file mode 100644
> index 000000000000..1047c6872d4f
> --- /dev/null
> +++ b/mm/page_reporting.c
> @@ -0,0 +1,319 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/mm.h>
> +#include <linux/mmzone.h>
> +#include <linux/page_reporting.h>
> +#include <linux/gfp.h>
> +#include <linux/export.h>
> +#include <linux/delay.h>
> +#include <linux/scatterlist.h>
> +
> +#include "page_reporting.h"
> +#include "internal.h"
> +
> +#define PAGE_REPORTING_DELAY	(2 * HZ)

I assume there is nothing special about 2 seconds other than "do some
progress every so often".

> +static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
> +
> +enum {
> +	PAGE_REPORTING_IDLE = 0,
> +	PAGE_REPORTING_REQUESTED,
> +	PAGE_REPORTING_ACTIVE
> +};
> +
> +/* request page reporting */
> +static void
> +__page_reporting_request(struct page_reporting_dev_info *prdev)
> +{
> +	unsigned int state;
> +
> +	/* Check to see if we are in desired state */
> +	state = atomic_read(&prdev->state);
> +	if (state == PAGE_REPORTING_REQUESTED)
> +		return;
> +
> +	/*
> +	 *  If reporting is already active there is nothing we need to do.
> +	 *  Test against 0 as that represents PAGE_REPORTING_IDLE.
> +	 */
> +	state = atomic_xchg(&prdev->state, PAGE_REPORTING_REQUESTED);
> +	if (state != PAGE_REPORTING_IDLE)
> +		return;
> +
> +	/*
> +	 * Delay the start of work to allow a sizable queue to build. For
> +	 * now we are limiting this to running no more than once every
> +	 * couple of seconds.
> +	 */
> +	schedule_delayed_work(&prdev->work, PAGE_REPORTING_DELAY);
> +}

Seems a fair use of atomics.

> +static int
> +page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
> +		     unsigned int order, unsigned int mt,
> +		     struct scatterlist *sgl, unsigned int *offset)
> +{
> +	struct free_area *area = &zone->free_area[order];
> +	struct list_head *list = &area->free_list[mt];
> +	unsigned int page_len = PAGE_SIZE << order;
> +	struct page *page, *next;
> +	int err = 0;
> +
> +	/*
> +	 * Perform early check, if free area is empty there is
> +	 * nothing to process so we can skip this free_list.
> +	 */
> +	if (list_empty(list))
> +		return err;
> +
> +	spin_lock_irq(&zone->lock);
> +
> +	/* loop through free list adding unreported pages to sg list */
> +	list_for_each_entry_safe(page, next, list, lru) {
> +		/* We are going to skip over the reported pages. */
> +		if (PageReported(page))
> +			continue;
> +
> +		/* Attempt to pull page from list */
> +		if (!__isolate_free_page(page, order))
> +			break;
> +

Might want to note that you are breaking because the only reason to fail
the isolation is that watermarks are not met and we are likely under
memory pressure. It's not a big issue.

However, while I think this is correct, it's hard to follow. This loop can
be broken out of with pages still on the scatter gather list. The current
flow guarantees that err will not be set at this point so the caller
cleans it up so we always drain the list either here or in the caller.

While I think it works, it's a bit fragile. I recommend putting a comment
above this noting why it's safe and put a VM_WARN_ON_ONCE(err) before the
break in case someone tries to change this in a years time and does not
spot that the flow to reach page_reporting_drain *somewhere* is critical.

> +		/* Add page to scatter list */
> +		--(*offset);
> +		sg_set_page(&sgl[*offset], page, page_len, 0);
> +
> +		/* If scatterlist isn't full grab more pages */
> +		if (*offset)
> +			continue;
> +
> +		/* release lock before waiting on report processing */
> +		spin_unlock_irq(&zone->lock);
> +
> +		/* begin processing pages in local list */
> +		err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY);
> +
> +		/* reset offset since the full list was reported */
> +		*offset = PAGE_REPORTING_CAPACITY;
> +
> +		/* reacquire zone lock and resume processing */
> +		spin_lock_irq(&zone->lock);
> +
> +		/* flush reported pages from the sg list */
> +		page_reporting_drain(prdev, sgl, PAGE_REPORTING_CAPACITY, !err);
> +
> +		/*
> +		 * Reset next to first entry, the old next isn't valid
> +		 * since we dropped the lock to report the pages
> +		 */
> +		next = list_first_entry(list, struct page, lru);
> +
> +		/* exit on error */
> +		if (err)
> +			break;
> +	}
> +
> +	spin_unlock_irq(&zone->lock);
> +
> +	return err;
> +}

I complained about the use of zone lock before but in this version, I
think I'm ok with it. The lock is held for the free list manipulations
which is what it's for. The state management with atomics seems
reasonable.

Otherwise I think this is ok and I think the implementation right. Of
great importance to me was the allocator fast paths but they seem to be
adequately protected by a static branch so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

The ack applies regardless of whether you decide to document and
defensively protect page_reporting_cycle against losing pages on the
scatter/gather list but I do recommend it.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2020-02-19 14:55 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 22:45 [PATCH v17 0/9] mm / virtio: Provide support for free page reporting Alexander Duyck
2020-02-11 22:46 ` [PATCH v17 1/9] mm: Adjust shuffle code to allow for future coalescing Alexander Duyck
2020-02-11 22:46 ` [PATCH v17 2/9] mm: Use zone and order instead of free area in free_list manipulators Alexander Duyck
2020-02-11 22:46 ` [PATCH v17 3/9] mm: Add function __putback_isolated_page Alexander Duyck
2020-02-19 14:33   ` Mel Gorman
2020-02-11 22:46 ` [PATCH v17 4/9] mm: Introduce Reported pages Alexander Duyck
2020-02-19 14:55   ` Mel Gorman [this message]
2020-02-20 18:44     ` Alexander Duyck
2020-02-20 22:35       ` Mel Gorman
2020-02-21 19:25         ` Alexander Duyck
2020-02-21 20:19           ` Mel Gorman
2020-02-11 22:46 ` [PATCH v17 5/9] virtio-balloon: Pull page poisoning config out of free page hinting Alexander Duyck
2020-02-11 22:46 ` [PATCH v17 6/9] virtio-balloon: Add support for providing free page reports to host Alexander Duyck
2020-02-11 22:47 ` [PATCH v17 7/9] mm/page_reporting: Rotate reported pages to the tail of the list Alexander Duyck
2020-02-19 14:59   ` Mel Gorman
2020-02-11 22:47 ` [PATCH v17 8/9] mm/page_reporting: Add budget limit on how many pages can be reported per pass Alexander Duyck
2020-02-19 15:02   ` Mel Gorman
2020-02-11 22:47 ` [PATCH v17 9/9] mm/page_reporting: Add free page reporting documentation Alexander Duyck
2020-02-11 22:51 ` [PATCH v17 QEMU 1/3] virtio-ballon: Implement support for page poison tracking feature Alexander Duyck
2020-02-11 22:51 ` [PATCH v17 QEMU 2/3] virtio-balloon: Add support for providing free page reports to host Alexander Duyck
2020-02-11 22:51 ` [PATCH v17 QEMU 3/3] virtio-balloon: Provide a interface for free page reporting Alexander Duyck
2020-02-11 22:53 ` [PATCH v17 QEMU 4/3 RFC] memory: Add support for MADV_FREE as mechanism to lazy discard pages Alexander Duyck
2020-02-11 23:05 ` [PATCH v17 0/9] mm / virtio: Provide support for free page reporting Andrew Morton
2020-02-11 23:55   ` Alexander Duyck
2020-02-12  0:19     ` Andrew Morton
2020-02-12  1:19       ` Alexander Duyck
2020-02-18 16:37       ` Alexander Duyck
2020-02-19  8:49         ` Mel Gorman
2020-02-19 15:06         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200219145511.GS3466@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=nitesh@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).