All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Wei Wang <wei.w.wang@intel.com>
Cc: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com,
	akpm@linux-foundation.org, mawilcox@microsoft.com,
	david@redhat.com, cornelia.huck@de.ibm.com,
	mgorman@techsingularity.net, aarcange@redhat.com,
	amit.shah@redhat.com, pbonzini@redhat.com, willy@infradead.org,
	liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
	quan.xu@aliyun.com
Subject: Re: [PATCH v15 4/5] mm: support reporting free page blocks
Date: Mon, 28 Aug 2017 15:33:26 +0200	[thread overview]
Message-ID: <20170828133326.GN17097@dhcp22.suse.cz> (raw)
In-Reply-To: <1503914913-28893-5-git-send-email-wei.w.wang@intel.com>

On Mon 28-08-17 18:08:32, Wei Wang wrote:
> This patch adds support to walk through the free page blocks in the
> system and report them via a callback function. Some page blocks may
> leave the free list after zone->lock is released, so it is the caller's
> responsibility to either detect or prevent the use of such pages.
> 
> One use example of this patch is to accelerate live migration by skipping
> the transfer of free pages reported from the guest. A popular method used
> by the hypervisor to track which part of memory is written during live
> migration is to write-protect all the guest memory. So, those pages that
> are reported as free pages but are written after the report function
> returns will be captured by the hypervisor, and they will be added to the
> next round of memory transfer.

OK, looks much better. I still have few nits.

> +extern void walk_free_mem_block(void *opaque,
> +				int min_order,
> +				bool (*report_page_block)(void *, unsigned long,
> +							  unsigned long));
> +

please add names to arguments of the prototype

>  /*
>   * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
>   * into the buddy system. The freed pages will be poisoned with pattern
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d00f74..81eedc7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4762,6 +4762,71 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  	show_swap_cache_info();
>  }
>  
> +/**
> + * walk_free_mem_block - Walk through the free page blocks in the system
> + * @opaque: the context passed from the caller
> + * @min_order: the minimum order of free lists to check
> + * @report_page_block: the callback function to report free page blocks

page_block has meaning in the core MM which doesn't strictly match its
usage here. Moreover we are reporting pfn ranges rather than struct page
range. So report_pfn_range would suit better.

[...]
> +	for_each_populated_zone(zone) {
> +		for (order = MAX_ORDER - 1; order >= min_order; order--) {
> +			for (mt = 0; !stop && mt < MIGRATE_TYPES; mt++) {
> +				spin_lock_irqsave(&zone->lock, flags);
> +				list = &zone->free_area[order].free_list[mt];
> +				list_for_each_entry(page, list, lru) {
> +					pfn = page_to_pfn(page);
> +					stop = report_page_block(opaque, pfn,
> +								 1 << order);
> +					if (stop)
> +						break;

					if (stop) {
						spin_unlock_irqrestore(&zone->lock, flags);
						return;
					}

would be both easier and less error prone. E.g. You wouldn't pointlessly
iterate over remaining orders just to realize there is nothing to be
done for those...

> +				}
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			}
> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(walk_free_mem_block);

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Wei Wang <wei.w.wang@intel.com>
Cc: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com,
	akpm@linux-foundation.org, mawilcox@microsoft.com,
	david@redhat.com, cornelia.huck@de.ibm.com,
	mgorman@techsingularity.net, aarcange@redhat.com,
	amit.shah@redhat.com, pbonzini@redhat.com, willy@infradead.org,
	liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
	quan.xu@aliyun.com
Subject: Re: [PATCH v15 4/5] mm: support reporting free page blocks
Date: Mon, 28 Aug 2017 15:33:26 +0200	[thread overview]
Message-ID: <20170828133326.GN17097@dhcp22.suse.cz> (raw)
In-Reply-To: <1503914913-28893-5-git-send-email-wei.w.wang@intel.com>

On Mon 28-08-17 18:08:32, Wei Wang wrote:
> This patch adds support to walk through the free page blocks in the
> system and report them via a callback function. Some page blocks may
> leave the free list after zone->lock is released, so it is the caller's
> responsibility to either detect or prevent the use of such pages.
> 
> One use example of this patch is to accelerate live migration by skipping
> the transfer of free pages reported from the guest. A popular method used
> by the hypervisor to track which part of memory is written during live
> migration is to write-protect all the guest memory. So, those pages that
> are reported as free pages but are written after the report function
> returns will be captured by the hypervisor, and they will be added to the
> next round of memory transfer.

OK, looks much better. I still have few nits.

> +extern void walk_free_mem_block(void *opaque,
> +				int min_order,
> +				bool (*report_page_block)(void *, unsigned long,
> +							  unsigned long));
> +

please add names to arguments of the prototype

>  /*
>   * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
>   * into the buddy system. The freed pages will be poisoned with pattern
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d00f74..81eedc7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4762,6 +4762,71 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  	show_swap_cache_info();
>  }
>  
> +/**
> + * walk_free_mem_block - Walk through the free page blocks in the system
> + * @opaque: the context passed from the caller
> + * @min_order: the minimum order of free lists to check
> + * @report_page_block: the callback function to report free page blocks

page_block has meaning in the core MM which doesn't strictly match its
usage here. Moreover we are reporting pfn ranges rather than struct page
range. So report_pfn_range would suit better.

[...]
> +	for_each_populated_zone(zone) {
> +		for (order = MAX_ORDER - 1; order >= min_order; order--) {
> +			for (mt = 0; !stop && mt < MIGRATE_TYPES; mt++) {
> +				spin_lock_irqsave(&zone->lock, flags);
> +				list = &zone->free_area[order].free_list[mt];
> +				list_for_each_entry(page, list, lru) {
> +					pfn = page_to_pfn(page);
> +					stop = report_page_block(opaque, pfn,
> +								 1 << order);
> +					if (stop)
> +						break;

					if (stop) {
						spin_unlock_irqrestore(&zone->lock, flags);
						return;
					}

would be both easier and less error prone. E.g. You wouldn't pointlessly
iterate over remaining orders just to realize there is nothing to be
done for those...

> +				}
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			}
> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(walk_free_mem_block);

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Wei Wang <wei.w.wang@intel.com>
Cc: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com,
	akpm@linux-foundation.org, mawilcox@microsoft.com,
	david@redhat.com, cornelia.huck@de.ibm.com,
	mgorman@techsingularity.net, aarcange@redhat.com,
	amit.shah@redhat.com, pbonzini@redhat.com, willy@infradead.org,
	liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
	quan.xu@aliyun.com
Subject: Re: [Qemu-devel] [PATCH v15 4/5] mm: support reporting free page blocks
Date: Mon, 28 Aug 2017 15:33:26 +0200	[thread overview]
Message-ID: <20170828133326.GN17097@dhcp22.suse.cz> (raw)
In-Reply-To: <1503914913-28893-5-git-send-email-wei.w.wang@intel.com>

On Mon 28-08-17 18:08:32, Wei Wang wrote:
> This patch adds support to walk through the free page blocks in the
> system and report them via a callback function. Some page blocks may
> leave the free list after zone->lock is released, so it is the caller's
> responsibility to either detect or prevent the use of such pages.
> 
> One use example of this patch is to accelerate live migration by skipping
> the transfer of free pages reported from the guest. A popular method used
> by the hypervisor to track which part of memory is written during live
> migration is to write-protect all the guest memory. So, those pages that
> are reported as free pages but are written after the report function
> returns will be captured by the hypervisor, and they will be added to the
> next round of memory transfer.

OK, looks much better. I still have few nits.

> +extern void walk_free_mem_block(void *opaque,
> +				int min_order,
> +				bool (*report_page_block)(void *, unsigned long,
> +							  unsigned long));
> +

please add names to arguments of the prototype

>  /*
>   * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
>   * into the buddy system. The freed pages will be poisoned with pattern
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d00f74..81eedc7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4762,6 +4762,71 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>  	show_swap_cache_info();
>  }
>  
> +/**
> + * walk_free_mem_block - Walk through the free page blocks in the system
> + * @opaque: the context passed from the caller
> + * @min_order: the minimum order of free lists to check
> + * @report_page_block: the callback function to report free page blocks

page_block has meaning in the core MM which doesn't strictly match its
usage here. Moreover we are reporting pfn ranges rather than struct page
range. So report_pfn_range would suit better.

[...]
> +	for_each_populated_zone(zone) {
> +		for (order = MAX_ORDER - 1; order >= min_order; order--) {
> +			for (mt = 0; !stop && mt < MIGRATE_TYPES; mt++) {
> +				spin_lock_irqsave(&zone->lock, flags);
> +				list = &zone->free_area[order].free_list[mt];
> +				list_for_each_entry(page, list, lru) {
> +					pfn = page_to_pfn(page);
> +					stop = report_page_block(opaque, pfn,
> +								 1 << order);
> +					if (stop)
> +						break;

					if (stop) {
						spin_unlock_irqrestore(&zone->lock, flags);
						return;
					}

would be both easier and less error prone. E.g. You wouldn't pointlessly
iterate over remaining orders just to realize there is nothing to be
done for those...

> +				}
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			}
> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(walk_free_mem_block);

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2017-08-28 13:33 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 10:08 [PATCH v15 0/5] Virtio-balloon Enhancement Wei Wang
2017-08-28 10:08 ` [virtio-dev] " Wei Wang
2017-08-28 10:08 ` [Qemu-devel] " Wei Wang
2017-08-28 10:08 ` Wei Wang
2017-08-28 10:08 ` [PATCH v15 1/5] lib/xbitmap: Introduce xbitmap Wei Wang
2017-08-28 10:08 ` Wei Wang
2017-08-28 10:08   ` [virtio-dev] " Wei Wang
2017-08-28 10:08   ` [Qemu-devel] " Wei Wang
2017-08-28 10:08   ` Wei Wang
2017-09-11 12:54   ` Matthew Wilcox
2017-09-11 12:54   ` Matthew Wilcox
2017-09-11 12:54     ` [Qemu-devel] " Matthew Wilcox
2017-09-11 12:54     ` Matthew Wilcox
2017-09-12 13:23     ` Wei Wang
2017-09-12 13:23       ` [virtio-dev] " Wei Wang
2017-09-12 13:23       ` [Qemu-devel] " Wei Wang
2017-09-12 13:23       ` Wei Wang
2017-09-12 13:23     ` Wei Wang
2017-08-28 10:08 ` [PATCH v15 2/5] lib/xbitmap: add xb_find_next_bit() and xb_zero() Wei Wang
2017-08-28 10:08   ` [virtio-dev] " Wei Wang
2017-08-28 10:08   ` [Qemu-devel] " Wei Wang
2017-08-28 10:08   ` Wei Wang
2017-09-11 13:27   ` Matthew Wilcox
2017-09-11 13:27     ` [Qemu-devel] " Matthew Wilcox
2017-09-11 13:27     ` Matthew Wilcox
2017-09-30  4:24     ` Wang, Wei W
2017-09-30  4:24     ` Wang, Wei W
2017-09-30  4:24       ` [virtio-dev] " Wang, Wei W
2017-09-30  4:24       ` [Qemu-devel] " Wang, Wei W
2017-09-30  4:24       ` Wang, Wei W
2017-09-30  4:24       ` Wang, Wei W
2017-09-11 13:27   ` Matthew Wilcox
2017-08-28 10:08 ` Wei Wang
2017-08-28 10:08 ` [PATCH v15 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG Wei Wang
2017-08-28 10:08   ` [virtio-dev] " Wei Wang
2017-08-28 10:08   ` [Qemu-devel] " Wei Wang
2017-08-28 10:08   ` Wei Wang
2017-08-28 18:03   ` Michael S. Tsirkin
2017-08-28 18:03   ` Michael S. Tsirkin
2017-08-28 18:03     ` [virtio-dev] " Michael S. Tsirkin
2017-08-28 18:03     ` [Qemu-devel] " Michael S. Tsirkin
2017-08-28 18:03     ` Michael S. Tsirkin
2017-08-29  3:09     ` Wei Wang
2017-08-29  3:09       ` [virtio-dev] " Wei Wang
2017-08-29  3:09       ` [Qemu-devel] " Wei Wang
2017-08-29  3:09       ` Wei Wang
2017-09-08  3:36       ` Michael S. Tsirkin
2017-09-08  3:36         ` [virtio-dev] " Michael S. Tsirkin
2017-09-08  3:36         ` [Qemu-devel] " Michael S. Tsirkin
2017-09-08  3:36         ` Michael S. Tsirkin
2017-09-08 11:09         ` [virtio-dev] " Wei Wang
2017-09-08 11:09         ` Wei Wang
2017-09-08 11:09           ` Wei Wang
2017-09-08 11:09           ` [Qemu-devel] " Wei Wang
2017-09-08 11:09           ` Wei Wang
2017-09-29  4:01           ` Michael S. Tsirkin
2017-09-29  4:01             ` Michael S. Tsirkin
2017-09-29  4:01             ` [Qemu-devel] " Michael S. Tsirkin
2017-09-29  4:01             ` Michael S. Tsirkin
2017-09-29  4:01             ` Michael S. Tsirkin
2017-09-29  6:55             ` [virtio-dev] " Wei Wang
2017-09-29  6:55               ` Wei Wang
2017-09-29  6:55               ` [Qemu-devel] " Wei Wang
2017-09-29  6:55               ` Wei Wang
2017-09-29  6:55             ` Wei Wang
2017-09-29  4:01           ` Michael S. Tsirkin
2017-09-08  3:36       ` Michael S. Tsirkin
2017-08-29  3:09     ` Wei Wang
2017-08-28 10:08 ` Wei Wang
2017-08-28 10:08 ` [PATCH v15 4/5] mm: support reporting free page blocks Wei Wang
2017-08-28 10:08   ` [virtio-dev] " Wei Wang
2017-08-28 10:08   ` [Qemu-devel] " Wei Wang
2017-08-28 10:08   ` Wei Wang
2017-08-28 13:33   ` Michal Hocko [this message]
2017-08-28 13:33     ` [Qemu-devel] " Michal Hocko
2017-08-28 13:33     ` Michal Hocko
2017-08-28 14:09     ` Michal Hocko
2017-08-28 14:09     ` Michal Hocko
2017-08-28 14:09       ` [Qemu-devel] " Michal Hocko
2017-08-28 14:09       ` Michal Hocko
2017-08-29  3:23     ` Wei Wang
2017-08-29  3:23       ` [virtio-dev] " Wei Wang
2017-08-29  3:23       ` [Qemu-devel] " Wei Wang
2017-08-29  3:23       ` Wei Wang
2017-08-29  3:23     ` Wei Wang
2017-08-28 13:33   ` Michal Hocko
2017-08-28 10:08 ` Wei Wang
2017-08-28 10:08 ` [PATCH v15 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ Wei Wang
2017-08-28 10:08 ` Wei Wang
2017-08-28 10:08   ` [virtio-dev] " Wei Wang
2017-08-28 10:08   ` [Qemu-devel] " Wei Wang
2017-08-28 10:08   ` Wei Wang
2017-09-05 11:57   ` Wang, Wei W
2017-09-05 11:57     ` [virtio-dev] " Wang, Wei W
2017-09-05 11:57     ` [Qemu-devel] " Wang, Wei W
2017-09-05 11:57     ` Wang, Wei W
2017-09-05 11:57   ` Wang, Wei W

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170828133326.GN17097@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.shah@redhat.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=liliang.opensource@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mawilcox@microsoft.com \
    --cc=mgorman@techsingularity.net \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quan.xu@aliyun.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.