All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <wei.w.wang@intel.com>
To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org,
	akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org, pbonzini@redhat.com,
	wei.w.wang@intel.com, liliang.opensource@gmail.com,
	yang.zhang.wz@gmail.com, quan.xu0@gmail.com, nilal@redhat.com,
	riel@redhat.com, peterx@redhat.com
Subject: [PATCH v33 1/4] mm: add a function to get free page blocks
Date: Fri, 15 Jun 2018 12:43:10 +0800	[thread overview]
Message-ID: <1529037793-35521-2-git-send-email-wei.w.wang@intel.com> (raw)
In-Reply-To: <1529037793-35521-1-git-send-email-wei.w.wang@intel.com>

This patch adds a function to get free pages blocks from a free page
list. The obtained free page blocks are hints about free pages, because
there is no guarantee that they are still on the free page list after
the function returns.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are hinted as free pages but are written after this function returns will
be captured by the hypervisor, and they will be added to the next round of
memory transfer.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/mm.h |  1 +
 mm/page_alloc.c    | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e49388..c58b4e5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2002,6 +2002,7 @@ extern void free_area_init(unsigned long * zones_size);
 extern void free_area_init_node(int nid, unsigned long * zones_size,
 		unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
+uint32_t get_from_free_page_list(int order, __le64 buf[], uint32_t size);
 
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 07b3c23..7c816d9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5043,6 +5043,58 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 	show_swap_cache_info();
 }
 
+/**
+ * get_from_free_page_list - get free page blocks from a free page list
+ * @order: the order of the free page list to check
+ * @buf: the array to store the physical addresses of the free page blocks
+ * @size: the array size
+ *
+ * This function offers hints about free pages. There is no guarantee that
+ * the obtained free pages are still on the free page list after the function
+ * returns. pfn_to_page on the obtained free pages is strongly discouraged
+ * and if there is an absolute need for that, make sure to contact MM people
+ * to discuss potential problems.
+ *
+ * The addresses are currently stored to the array in little endian. This
+ * avoids the overhead of converting endianness by the caller who needs data
+ * in the little endian format. Big endian support can be added on demand in
+ * the future.
+ *
+ * Return the number of free page blocks obtained from the free page list.
+ * The maximum number of free page blocks that can be obtained is limited to
+ * the caller's array size.
+ */
+uint32_t get_from_free_page_list(int order, __le64 buf[], uint32_t size)
+{
+	struct zone *zone;
+	enum migratetype mt;
+	struct page *page;
+	struct list_head *list;
+	unsigned long addr, flags;
+	uint32_t index = 0;
+
+	for_each_populated_zone(zone) {
+		spin_lock_irqsave(&zone->lock, flags);
+		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
+			list = &zone->free_area[order].free_list[mt];
+			list_for_each_entry(page, list, lru) {
+				addr = page_to_pfn(page) << PAGE_SHIFT;
+				if (likely(index < size)) {
+					buf[index++] = cpu_to_le64(addr);
+				} else {
+					spin_unlock_irqrestore(&zone->lock,
+							       flags);
+					return index;
+				}
+			}
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+	return index;
+}
+EXPORT_SYMBOL_GPL(get_from_free_page_list);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
2.7.4


WARNING: multiple messages have this Message-ID (diff)
From: Wei Wang <wei.w.wang@intel.com>
To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org,
	akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org, pbonzini@redhat.com,
	wei.w.wang@intel.com, liliang.opensource@gmail.com,
	yang.zhang.wz@gmail.com, quan.xu0@gmail.com, nilal@redhat.com,
	riel@redhat.com, peterx@redhat.com
Subject: [virtio-dev] [PATCH v33 1/4] mm: add a function to get free page blocks
Date: Fri, 15 Jun 2018 12:43:10 +0800	[thread overview]
Message-ID: <1529037793-35521-2-git-send-email-wei.w.wang@intel.com> (raw)
In-Reply-To: <1529037793-35521-1-git-send-email-wei.w.wang@intel.com>

This patch adds a function to get free pages blocks from a free page
list. The obtained free page blocks are hints about free pages, because
there is no guarantee that they are still on the free page list after
the function returns.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are hinted as free pages but are written after this function returns will
be captured by the hypervisor, and they will be added to the next round of
memory transfer.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/mm.h |  1 +
 mm/page_alloc.c    | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e49388..c58b4e5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2002,6 +2002,7 @@ extern void free_area_init(unsigned long * zones_size);
 extern void free_area_init_node(int nid, unsigned long * zones_size,
 		unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
+uint32_t get_from_free_page_list(int order, __le64 buf[], uint32_t size);
 
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 07b3c23..7c816d9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5043,6 +5043,58 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 	show_swap_cache_info();
 }
 
+/**
+ * get_from_free_page_list - get free page blocks from a free page list
+ * @order: the order of the free page list to check
+ * @buf: the array to store the physical addresses of the free page blocks
+ * @size: the array size
+ *
+ * This function offers hints about free pages. There is no guarantee that
+ * the obtained free pages are still on the free page list after the function
+ * returns. pfn_to_page on the obtained free pages is strongly discouraged
+ * and if there is an absolute need for that, make sure to contact MM people
+ * to discuss potential problems.
+ *
+ * The addresses are currently stored to the array in little endian. This
+ * avoids the overhead of converting endianness by the caller who needs data
+ * in the little endian format. Big endian support can be added on demand in
+ * the future.
+ *
+ * Return the number of free page blocks obtained from the free page list.
+ * The maximum number of free page blocks that can be obtained is limited to
+ * the caller's array size.
+ */
+uint32_t get_from_free_page_list(int order, __le64 buf[], uint32_t size)
+{
+	struct zone *zone;
+	enum migratetype mt;
+	struct page *page;
+	struct list_head *list;
+	unsigned long addr, flags;
+	uint32_t index = 0;
+
+	for_each_populated_zone(zone) {
+		spin_lock_irqsave(&zone->lock, flags);
+		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
+			list = &zone->free_area[order].free_list[mt];
+			list_for_each_entry(page, list, lru) {
+				addr = page_to_pfn(page) << PAGE_SHIFT;
+				if (likely(index < size)) {
+					buf[index++] = cpu_to_le64(addr);
+				} else {
+					spin_unlock_irqrestore(&zone->lock,
+							       flags);
+					return index;
+				}
+			}
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+	return index;
+}
+EXPORT_SYMBOL_GPL(get_from_free_page_list);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
2.7.4


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2018-06-15  5:10 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-15  4:43 [PATCH v33 0/4] Virtio-balloon: support free page reporting Wei Wang
2018-06-15  4:43 ` [virtio-dev] " Wei Wang
2018-06-15  4:43 ` Wei Wang [this message]
2018-06-15  4:43   ` [virtio-dev] [PATCH v33 1/4] mm: add a function to get free page blocks Wei Wang
2018-06-15 23:08   ` Linus Torvalds
2018-06-15 23:08     ` Linus Torvalds
2018-06-16  1:19     ` Wang, Wei W
2018-06-16  1:19       ` [virtio-dev] " Wang, Wei W
2018-06-16  1:19       ` Wang, Wei W
2018-06-26  1:55     ` Michael S. Tsirkin
2018-06-26  1:55       ` [virtio-dev] " Michael S. Tsirkin
2018-06-26  1:55       ` Michael S. Tsirkin
2018-06-27 16:05       ` Linus Torvalds
2018-06-27 16:05         ` Linus Torvalds
2018-06-27 19:07         ` Michael S. Tsirkin
2018-06-27 19:07           ` [virtio-dev] " Michael S. Tsirkin
2018-06-27 19:07           ` Michael S. Tsirkin
2018-06-28  8:39           ` Wei Wang
2018-06-28  8:39             ` [virtio-dev] " Wei Wang
2018-06-28  8:39           ` Wei Wang
2018-06-16  4:50   ` Matthew Wilcox
2018-06-16  4:50     ` Matthew Wilcox
2018-06-17  0:07     ` Wang, Wei W
2018-06-17  0:07       ` [virtio-dev] " Wang, Wei W
2018-06-17  0:07       ` Wang, Wei W
2018-06-17  0:07       ` Wang, Wei W
2018-06-18  2:16     ` Michael S. Tsirkin
2018-06-18  2:16       ` [virtio-dev] " Michael S. Tsirkin
2018-06-18  2:16       ` Michael S. Tsirkin
2018-06-15  4:43 ` Wei Wang
2018-06-15  4:43 ` [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-06-15  4:43   ` [virtio-dev] " Wei Wang
2018-06-15 11:42   ` Michael S. Tsirkin
2018-06-15 11:42     ` [virtio-dev] " Michael S. Tsirkin
2018-06-15 11:42     ` Michael S. Tsirkin
2018-06-15 14:11     ` Wang, Wei W
2018-06-15 14:11       ` [virtio-dev] " Wang, Wei W
2018-06-15 14:11       ` Wang, Wei W
2018-06-15 14:11       ` Wang, Wei W
2018-06-15 14:29       ` Michael S. Tsirkin
2018-06-15 14:29         ` [virtio-dev] " Michael S. Tsirkin
2018-06-15 14:29         ` Michael S. Tsirkin
2018-06-15 14:29         ` Michael S. Tsirkin
2018-06-16  1:09         ` Wang, Wei W
2018-06-16  1:09           ` [virtio-dev] " Wang, Wei W
2018-06-16  1:09           ` Wang, Wei W
2018-06-16  1:09           ` Wang, Wei W
2018-06-18  2:28           ` Michael S. Tsirkin
2018-06-18  2:28             ` [virtio-dev] " Michael S. Tsirkin
2018-06-18  2:28             ` Michael S. Tsirkin
2018-06-18  2:28             ` Michael S. Tsirkin
2018-06-19  1:06             ` [virtio-dev] " Wang, Wei W
2018-06-19  1:06               ` Wang, Wei W
2018-06-19  1:06               ` Wang, Wei W
2018-06-19  1:06               ` Wang, Wei W
2018-06-19  3:05               ` [virtio-dev] " Michael S. Tsirkin
2018-06-19  3:05               ` Michael S. Tsirkin
2018-06-19  3:05                 ` Michael S. Tsirkin
2018-06-19  3:05                 ` Michael S. Tsirkin
2018-06-19  3:05                 ` Michael S. Tsirkin
2018-06-19 12:13                 ` Wei Wang
2018-06-19 12:13                   ` Wei Wang
2018-06-19 12:13                   ` Wei Wang
2018-06-19 12:13                   ` Wei Wang
2018-06-19 14:43                   ` Michael S. Tsirkin
2018-06-19 14:43                   ` Michael S. Tsirkin
2018-06-19 14:43                     ` Michael S. Tsirkin
2018-06-19 14:43                     ` Michael S. Tsirkin
2018-06-19 14:43                     ` Michael S. Tsirkin
2018-06-20  9:11                     ` Wang, Wei W
2018-06-20  9:11                       ` Wang, Wei W
2018-06-20  9:11                       ` Wang, Wei W
2018-06-20  9:11                       ` Wang, Wei W
2018-06-20 14:14                       ` Michael S. Tsirkin
2018-06-20 14:14                         ` Michael S. Tsirkin
2018-06-20 14:14                         ` Michael S. Tsirkin
2018-06-20 14:14                         ` Michael S. Tsirkin
2018-06-19  1:06             ` Wang, Wei W
2018-06-15 14:11     ` Wang, Wei W
2018-06-15  4:43 ` Wei Wang
2018-06-15  4:43 ` [PATCH v33 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules Wei Wang
2018-06-15  4:43   ` [virtio-dev] " Wei Wang
2018-06-15  4:43 ` Wei Wang
2018-06-15  4:43 ` [PATCH v33 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON Wei Wang
2018-06-15  4:43   ` [virtio-dev] " Wei Wang
2018-06-15  4:43 ` Wei Wang
2018-06-15 11:29 ` [PATCH v33 0/4] Virtio-balloon: support free page reporting Michael S. Tsirkin
2018-06-15 11:29   ` [virtio-dev] " Michael S. Tsirkin
2018-06-15 14:28   ` Wang, Wei W
2018-06-15 14:28     ` [virtio-dev] " Wang, Wei W
2018-06-15 14:28     ` Wang, Wei W
2018-06-15 14:28     ` Wang, Wei W
2018-06-15 14:38     ` Michael S. Tsirkin
2018-06-15 14:38       ` [virtio-dev] " Michael S. Tsirkin
2018-06-15 14:38       ` Michael S. Tsirkin
2018-06-15 14:38       ` Michael S. Tsirkin
2018-06-15 11:29 ` Michael S. Tsirkin
2018-06-15 19:21 ` Luiz Capitulino
2018-06-15 19:21 ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1529037793-35521-2-git-send-email-wei.w.wang@intel.com \
    --to=wei.w.wang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=liliang.opensource@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=quan.xu0@gmail.com \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.