All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Wang, Wei W" <wei.w.wang@intel.com>
To: "virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"liliang.opensource@gmail.com" <liliang.opensource@gmail.com>,
	"yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>,
	"quan.xu0@gmail.com" <quan.xu0@gmail.com>,
	"nilal@redhat.com" <nilal@redhat.com>,
	"riel@redhat.com" <riel@redhat.com>,
	"peterx@redhat.com" <peterx@redhat.com>
Subject: RE: [PATCH v35 1/5] mm: support to get hints of free page blocks
Date: Tue, 10 Jul 2018 10:16:57 +0000	[thread overview]
Message-ID: <286AC319A985734F985F78AFA26841F7396E91B6@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <1531215067-35472-2-git-send-email-wei.w.wang@intel.com>

On Tuesday, July 10, 2018 5:31 PM, Wang, Wei W wrote:
> Subject: [PATCH v35 1/5] mm: support to get hints of free page blocks
> 
> This patch adds support to get free page blocks from a free page list.
> The physical addresses of the blocks are stored to a list of buffers passed
> from the caller. The obtained free page blocks are hints about free pages,
> because there is no guarantee that they are still on the free page list after the
> function returns.
> 
> One use example of this patch is to accelerate live migration by skipping the
> transfer of free pages reported from the guest. A popular method used by
> the hypervisor to track which part of memory is written during live migration
> is to write-protect all the guest memory. So, those pages that are hinted as
> free pages but are written after this function returns will be captured by the
> hypervisor, and they will be added to the next round of memory transfer.
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  include/linux/mm.h |  3 ++
>  mm/page_alloc.c    | 98
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 101 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h index a0fbb9f..5ce654f
> 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2007,6 +2007,9 @@ extern void free_area_init(unsigned long *
> zones_size);  extern void free_area_init_node(int nid, unsigned long *
> zones_size,
>  		unsigned long zone_start_pfn, unsigned long *zholes_size);
> extern void free_initmem(void);
> +unsigned long max_free_page_blocks(int order); int
> +get_from_free_page_list(int order, struct list_head *pages,
> +			    unsigned int size, unsigned long *loaded_num);
> 
>  /*
>   * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..b67839b
> 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5043,6 +5043,104 @@ void show_free_areas(unsigned int filter,
> nodemask_t *nodemask)
>  	show_swap_cache_info();
>  }
> 
> +/**
> + * max_free_page_blocks - estimate the max number of free page blocks
> + * @order: the order of the free page blocks to estimate
> + *
> + * This function gives a rough estimation of the possible maximum
> +number of
> + * free page blocks a free list may have. The estimation works on an
> +assumption
> + * that all the system pages are on that list.
> + *
> + * Context: Any context.
> + *
> + * Return: The largest number of free page blocks that the free list can have.
> + */
> +unsigned long max_free_page_blocks(int order) {
> +	return totalram_pages / (1 << order);
> +}
> +EXPORT_SYMBOL_GPL(max_free_page_blocks);
> +
> +/**
> + * get_from_free_page_list - get hints of free pages from a free page
> +list
> + * @order: the order of the free page list to check
> + * @pages: the list of page blocks used as buffers to load the
> +addresses
> + * @size: the size of each buffer in bytes
> + * @loaded_num: the number of addresses loaded to the buffers
> + *
> + * This function offers hints about free pages. The addresses of free
> +page
> + * blocks are stored to the list of buffers passed from the caller.
> +There is
> + * no guarantee that the obtained free pages are still on the free page
> +list
> + * after the function returns. pfn_to_page on the obtained free pages
> +is
> + * strongly discouraged and if there is an absolute need for that, make
> +sure
> + * to contact MM people to discuss potential problems.
> + *
> + * The addresses are currently stored to a buffer in little endian.
> +This
> + * avoids the overhead of converting endianness by the caller who needs
> +data
> + * in the little endian format. Big endian support can be added on
> +demand in
> + * the future.
> + *
> + * Context: Process context.
> + *
> + * Return: 0 if all the free page block addresses are stored to the buffers;
> + *         -ENOSPC if the buffers are not sufficient to store all the
> + *         addresses; or -EINVAL if an unexpected argument is received (e.g.
> + *         incorrect @order, empty buffer list).
> + */
> +int get_from_free_page_list(int order, struct list_head *pages,
> +			    unsigned int size, unsigned long *loaded_num) {


Hi Linus,

We  took your original suggestion - pass in pre-allocated buffers to load the addresses (now we use a list of pre-allocated page blocks as buffers). Hope that suggestion is still acceptable (the advantage of this method was explained here: https://lkml.org/lkml/2018/6/28/184).
Look forward to getting your feedback. Thanks.

Best,
Wei 

WARNING: multiple messages have this Message-ID (diff)
From: "Wang, Wei W" <wei.w.wang@intel.com>
To: "virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"liliang.opensource@gmail.com" <liliang.opensource@gmail.com>,
	"yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>,
	"quan.xu0@gmail.com" <quan.xu0@gmail.com>,
	"nilal@redhat.com" <nilal@redhat.com>,
	"riel@redhat.com" <riel@redhat.com>,
	"peterx@redhat.com" <peterx@redhat.com>
Subject: [virtio-dev] RE: [PATCH v35 1/5] mm: support to get hints of free page blocks
Date: Tue, 10 Jul 2018 10:16:57 +0000	[thread overview]
Message-ID: <286AC319A985734F985F78AFA26841F7396E91B6@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <1531215067-35472-2-git-send-email-wei.w.wang@intel.com>

On Tuesday, July 10, 2018 5:31 PM, Wang, Wei W wrote:
> Subject: [PATCH v35 1/5] mm: support to get hints of free page blocks
> 
> This patch adds support to get free page blocks from a free page list.
> The physical addresses of the blocks are stored to a list of buffers passed
> from the caller. The obtained free page blocks are hints about free pages,
> because there is no guarantee that they are still on the free page list after the
> function returns.
> 
> One use example of this patch is to accelerate live migration by skipping the
> transfer of free pages reported from the guest. A popular method used by
> the hypervisor to track which part of memory is written during live migration
> is to write-protect all the guest memory. So, those pages that are hinted as
> free pages but are written after this function returns will be captured by the
> hypervisor, and they will be added to the next round of memory transfer.
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  include/linux/mm.h |  3 ++
>  mm/page_alloc.c    | 98
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 101 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h index a0fbb9f..5ce654f
> 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2007,6 +2007,9 @@ extern void free_area_init(unsigned long *
> zones_size);  extern void free_area_init_node(int nid, unsigned long *
> zones_size,
>  		unsigned long zone_start_pfn, unsigned long *zholes_size);
> extern void free_initmem(void);
> +unsigned long max_free_page_blocks(int order); int
> +get_from_free_page_list(int order, struct list_head *pages,
> +			    unsigned int size, unsigned long *loaded_num);
> 
>  /*
>   * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..b67839b
> 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5043,6 +5043,104 @@ void show_free_areas(unsigned int filter,
> nodemask_t *nodemask)
>  	show_swap_cache_info();
>  }
> 
> +/**
> + * max_free_page_blocks - estimate the max number of free page blocks
> + * @order: the order of the free page blocks to estimate
> + *
> + * This function gives a rough estimation of the possible maximum
> +number of
> + * free page blocks a free list may have. The estimation works on an
> +assumption
> + * that all the system pages are on that list.
> + *
> + * Context: Any context.
> + *
> + * Return: The largest number of free page blocks that the free list can have.
> + */
> +unsigned long max_free_page_blocks(int order) {
> +	return totalram_pages / (1 << order);
> +}
> +EXPORT_SYMBOL_GPL(max_free_page_blocks);
> +
> +/**
> + * get_from_free_page_list - get hints of free pages from a free page
> +list
> + * @order: the order of the free page list to check
> + * @pages: the list of page blocks used as buffers to load the
> +addresses
> + * @size: the size of each buffer in bytes
> + * @loaded_num: the number of addresses loaded to the buffers
> + *
> + * This function offers hints about free pages. The addresses of free
> +page
> + * blocks are stored to the list of buffers passed from the caller.
> +There is
> + * no guarantee that the obtained free pages are still on the free page
> +list
> + * after the function returns. pfn_to_page on the obtained free pages
> +is
> + * strongly discouraged and if there is an absolute need for that, make
> +sure
> + * to contact MM people to discuss potential problems.
> + *
> + * The addresses are currently stored to a buffer in little endian.
> +This
> + * avoids the overhead of converting endianness by the caller who needs
> +data
> + * in the little endian format. Big endian support can be added on
> +demand in
> + * the future.
> + *
> + * Context: Process context.
> + *
> + * Return: 0 if all the free page block addresses are stored to the buffers;
> + *         -ENOSPC if the buffers are not sufficient to store all the
> + *         addresses; or -EINVAL if an unexpected argument is received (e.g.
> + *         incorrect @order, empty buffer list).
> + */
> +int get_from_free_page_list(int order, struct list_head *pages,
> +			    unsigned int size, unsigned long *loaded_num) {


Hi Linus,

We  took your original suggestion - pass in pre-allocated buffers to load the addresses (now we use a list of pre-allocated page blocks as buffers). Hope that suggestion is still acceptable (the advantage of this method was explained here: https://lkml.org/lkml/2018/6/28/184).
Look forward to getting your feedback. Thanks.

Best,
Wei 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2018-07-10 10:17 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-10  9:31 [PATCH v35 0/5] Virtio-balloon: support free page reporting Wei Wang
2018-07-10  9:31 ` [virtio-dev] " Wei Wang
2018-07-10  9:31 ` Wei Wang
2018-07-10  9:31 ` [PATCH v35 1/5] mm: support to get hints of free page blocks Wei Wang
2018-07-10  9:31   ` [virtio-dev] " Wei Wang
2018-07-10 10:16   ` Wang, Wei W [this message]
2018-07-10 10:16     ` [virtio-dev] " Wang, Wei W
2018-07-10 10:16     ` Wang, Wei W
2018-07-10 10:16   ` Wang, Wei W
2018-07-10 17:33   ` Linus Torvalds
2018-07-10 17:33     ` Linus Torvalds
2018-07-11  1:28     ` Wei Wang
2018-07-11  1:28     ` Wei Wang
2018-07-11  1:28       ` [virtio-dev] " Wei Wang
2018-07-11  1:44       ` Linus Torvalds
2018-07-11  9:21         ` Michal Hocko
2018-07-11  9:21           ` Michal Hocko
2018-07-11 10:52           ` Wei Wang
2018-07-11 10:52             ` [virtio-dev] " Wei Wang
2018-07-11 10:52             ` Wei Wang
2018-07-11 11:09             ` Michal Hocko
2018-07-11 11:09               ` Michal Hocko
2018-07-11 13:55               ` Wang, Wei W
2018-07-11 13:55               ` Wang, Wei W
2018-07-11 13:55                 ` [virtio-dev] " Wang, Wei W
2018-07-11 14:38                 ` Michal Hocko
2018-07-11 14:38                 ` Michal Hocko
2018-07-11 19:36               ` Michael S. Tsirkin
2018-07-11 19:36               ` Michael S. Tsirkin
2018-07-11 19:36                 ` [virtio-dev] " Michael S. Tsirkin
2018-07-11 16:23           ` Linus Torvalds
2018-07-11 16:23           ` Linus Torvalds
2018-07-12  2:21             ` Wei Wang
2018-07-12  2:21               ` [virtio-dev] " Wei Wang
2018-07-12  2:30               ` Linus Torvalds
2018-07-12  2:30               ` Linus Torvalds
2018-07-12  2:52                 ` Wei Wang
2018-07-12  2:52                   ` [virtio-dev] " Wei Wang
2018-07-12  2:52                   ` Wei Wang
2018-07-12  8:13                   ` Michal Hocko
2018-07-12 11:34                     ` Wei Wang
2018-07-12 11:34                     ` Wei Wang
2018-07-12 11:34                       ` [virtio-dev] " Wei Wang
2018-07-12 11:49                       ` Michal Hocko
2018-07-13  0:33                         ` Wei Wang
2018-07-13  0:33                           ` [virtio-dev] " Wei Wang
2018-07-13  0:33                         ` Wei Wang
2018-07-12 11:49                       ` Michal Hocko
2018-07-12  8:13                   ` Michal Hocko
2018-07-12  2:21             ` Wei Wang
2018-07-12 13:12             ` Michal Hocko
2018-07-12 13:12               ` Michal Hocko
2018-07-11  1:44       ` Linus Torvalds
2018-07-11  4:00     ` Michael S. Tsirkin
2018-07-11  4:00       ` [virtio-dev] " Michael S. Tsirkin
2018-07-11  4:04       ` Michael S. Tsirkin
2018-07-11  4:04       ` Michael S. Tsirkin
2018-07-11  4:04         ` [virtio-dev] " Michael S. Tsirkin
2018-07-11  4:00     ` Michael S. Tsirkin
2018-07-10  9:31 ` Wei Wang
2018-07-10  9:31 ` [PATCH v35 2/5] virtio-balloon: remove BUG() in init_vqs Wei Wang
2018-07-10  9:31   ` [virtio-dev] " Wei Wang
2018-07-10  9:31 ` Wei Wang
2018-07-10  9:31 ` [PATCH v35 3/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-07-10  9:31   ` [virtio-dev] " Wei Wang
2018-07-10  9:31 ` Wei Wang
2018-07-10  9:31 ` [PATCH v35 4/5] mm/page_poison: expose page_poisoning_enabled to kernel modules Wei Wang
2018-07-10  9:31   ` [virtio-dev] " Wei Wang
2018-07-10  9:31 ` Wei Wang
2018-07-10  9:31 ` [PATCH v35 5/5] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON Wei Wang
2018-07-10  9:31   ` [virtio-dev] " Wei Wang
2018-07-10  9:31 ` Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=286AC319A985734F985F78AFA26841F7396E91B6@SHSMSX101.ccr.corp.intel.com \
    --to=wei.w.wang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=liliang.opensource@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=quan.xu0@gmail.com \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.