linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/5] make balloon pages movable by compaction
@ 2012-08-25  5:24 Rafael Aquini
  2012-08-25  5:24 ` [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility Rafael Aquini
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Rafael Aquini @ 2012-08-25  5:24 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, virtualization, Rusty Russell, Michael S. Tsirkin,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney, Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Rafael Aquini (5):
  mm: introduce a common interface for balloon pages mobility
  mm: introduce compaction and migration for ballooned pages
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce putback_movable_pages()
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c    | 287 ++++++++++++++++++++++++++++++++++---
 include/linux/balloon_compaction.h | 137 ++++++++++++++++++
 include/linux/migrate.h            |   2 +
 include/linux/pagemap.h            |  18 +++
 include/linux/vm_event_item.h      |   8 +-
 mm/Kconfig                         |  15 ++
 mm/Makefile                        |   2 +-
 mm/balloon_compaction.c            | 174 ++++++++++++++++++++++
 mm/compaction.c                    |  51 ++++---
 mm/migrate.c                       |  57 +++++++-
 mm/page_alloc.c                    |   2 +-
 mm/vmstat.c                        |  10 +-
 12 files changed, 715 insertions(+), 48 deletions(-)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c


Change log:
v9:
 * Adjust rcu_dereference usage to leverage page lock protection  (Paul, Peter);
 * Enhance doc on compaction interface introduced to balloon driver   (Michael);
 * Fix issue with isolated pages breaking leak_balloon() logics       (Michael);
v8:
 * introduce a common MM interface for balloon driver page compaction (Michael);
 * remove the global state preventing multiple balloon device support (Michael);
 * introduce RCU protection/syncrhonization to balloon page->mapping  (Michael);
v7:
 * fix a potential page leak case at 'putback_balloon_page'               (Mel);
 * adjust vm-events-counter patch and remove its drop-on-merge message    (Rik);
 * add 'putback_movable_pages' to avoid hacks on 'putback_lru_pages'  (Minchan);
v6:
 * rename 'is_balloon_page()' to 'movable_balloon_page()' 		  (Rik);
v5:
 * address Andrew Morton's review comments on the patch series;
 * address a couple extra nitpick suggestions on PATCH 01 	      (Minchan);
v4: 
 * address Rusty Russel's review comments on PATCH 02;
 * re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3;
V3: 
 * address reviewers nitpick suggestions on PATCH 01		 (Mel, Minchan);
V2: 
 * address Mel Gorman's review comments on PATCH 01;


Preliminary test results:
(2 VCPU 2048mB RAM KVM guest running 3.6.0_rc3+ -- after a reboot)

* 64mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 0
compact_balloon_migrated 0
compact_balloon_released 0
compact_balloon_returned 0
[root@localhost ~]# 
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null 
[1]   Done                    echo 1 > /proc/sys/vm/compact_memory
[2]   Done                    echo 1 > /proc/sys/vm/compact_memory
[3]   Done                    echo 1 > /proc/sys/vm/compact_memory
[4]   Done                    echo 1 > /proc/sys/vm/compact_memory
[5]-  Done                    echo 1 > /proc/sys/vm/compact_memory
[6]+  Done                    echo 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 3108
compact_pages_moved 43169
compact_pagemigrate_failed 95
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 16384
compact_balloon_migrated 16384
compact_balloon_released 16384
compact_balloon_returned 0


* 128 mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 0
compact_balloon_migrated 0
compact_balloon_released 0
compact_balloon_returned 0
[root@localhost ~]# 
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null  
[1]   Done                    echo 1 > /proc/sys/vm/compact_memory
[2]   Done                    echo 1 > /proc/sys/vm/compact_memory
[3]   Done                    echo 1 > /proc/sys/vm/compact_memory
[4]   Done                    echo 1 > /proc/sys/vm/compact_memory
[5]-  Done                    echo 1 > /proc/sys/vm/compact_memory
[6]+  Done                    echo 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 3062
compact_pages_moved 49774
compact_pagemigrate_failed 129
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 26076
compact_balloon_migrated 25957
compact_balloon_released 25957
compact_balloon_returned 119

-- 
1.7.11.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility
  2012-08-25  5:24 [PATCH v9 0/5] make balloon pages movable by compaction Rafael Aquini
@ 2012-08-25  5:24 ` Rafael Aquini
  2012-08-26  7:55   ` Michael S. Tsirkin
  2012-08-25  5:24 ` [PATCH v9 2/5] mm: introduce compaction and migration for ballooned pages Rafael Aquini
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Rafael Aquini @ 2012-08-25  5:24 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, virtualization, Rusty Russell, Michael S. Tsirkin,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney, Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 include/linux/balloon_compaction.h | 137 +++++++++++++++++++++++++++++
 include/linux/pagemap.h            |  18 ++++
 mm/Kconfig                         |  15 ++++
 mm/Makefile                        |   2 +-
 mm/balloon_compaction.c            | 172 +++++++++++++++++++++++++++++++++++++
 5 files changed, 343 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c

diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
new file mode 100644
index 0000000..7afb0ae
--- /dev/null
+++ b/include/linux/balloon_compaction.h
@@ -0,0 +1,137 @@
+/*
+ * include/linux/balloon_compaction.h
+ *
+ * Common interface definitions for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini <aquini@redhat.com>
+ */
+#ifndef _LINUX_BALLOON_COMPACTION_H
+#define _LINUX_BALLOON_COMPACTION_H
+#ifdef __KERNEL__
+
+#include <linux/rcupdate.h>
+#include <linux/pagemap.h>
+#include <linux/gfp.h>
+
+#ifdef CONFIG_BALLOON_COMPACTION
+#define count_balloon_event(e)	count_vm_event(e)
+extern bool isolate_balloon_page(struct page *);
+extern void putback_balloon_page(struct page *);
+extern int migrate_balloon_page(struct page *newpage,
+				struct page *page, enum migrate_mode mode);
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+	return GFP_HIGHUSER_MOVABLE;
+}
+
+/*
+ * movable_balloon_page - test page->mapping->flags to identify balloon pages
+ *			  that can be moved by compaction/migration.
+ *
+ * This function is used at core compaction's page isolation scheme and so it's
+ * exposed to several system pages which may, or may not, be part of a memory
+ * balloon, and thus we cannot afford to hold a page locked to perform tests.
+ *
+ * Therefore, as we might return false positives in the case a balloon page
+ * is just released under us, the page->mapping->flags need to be retested
+ * with the proper page lock held, on the functions that will cope with the
+ * balloon page later.
+ */
+static inline bool movable_balloon_page(struct page *page)
+{
+	/*
+	 * Before dereferencing and testing mapping->flags, lets make sure
+	 * this is not a page that uses ->mapping in a different way
+	 */
+	if (!PageSlab(page) && !PageSwapCache(page) &&
+	    !PageAnon(page) && !page_mapped(page)) {
+		/*
+		 * While doing compaction core work, we cannot afford to hold
+		 * page lock as it might cause very undesirable side effects.
+		 */
+		struct address_space *mapping;
+		mapping = rcu_dereference_raw(page->mapping);
+		if (mapping)
+			return mapping_balloon(mapping);
+	}
+	return false;
+}
+
+/*
+ * __page_balloon_device - return the balloon device owing the page.
+ *
+ * This shall only be used at driver callbacks under proper page lock,
+ * to get access to the balloon device structure that owns @page.
+ */
+static inline void *__page_balloon_device(struct page *page)
+{
+	struct address_space *mapping;
+	mapping = rcu_dereference_protected(page->mapping, PageLocked(page));
+	if (mapping)
+		mapping = mapping->assoc_mapping;
+	return (void *)mapping;
+}
+
+/*
+ * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback descriptor
+ *				 to be used as balloon page->mapping->a_ops.
+ *
+ * @label     : declaration identifier (var name)
+ * @isolatepg : callback symbol name for performing the page isolation step
+ * @migratepg : callback symbol name for performing the page migration step
+ * @putbackpg : callback symbol name for performing the page putback step
+ *
+ * address_space_operations utilized methods for ballooned pages:
+ *   .migratepage    - used to perform balloon's page migration (as is)
+ *   .invalidatepage - used to isolate a page from balloon's page list
+ *   .freepage       - used to reinsert an isolated page to balloon's page list
+ */
+#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
+	const struct address_space_operations (label) = {		    \
+		.migratepage    = (migratepg),				    \
+		.invalidatepage = (isolatepg),				    \
+		.freepage       = (putbackpg),				    \
+	}
+
+#else
+#define count_balloon_event(e)	do { } while (0)
+static inline bool movable_balloon_page(struct page *page) { return false; }
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline void putback_balloon_page(struct page *page) { return; }
+
+static inline int migrate_balloon_page(struct page *newpage,
+				struct page *page, enum migrate_mode mode)
+{
+	return 0;
+}
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+	return GFP_HIGHUSER;
+}
+
+#define DEFINE_BALLOON_MAPPING_AOPS(label, migratepg, isolatepg, putbackpg) \
+	const struct address_space_operations *(label) = NULL
+
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+/* return code to identify when a ballooned page has been migrated */
+#define BALLOON_MIGRATION_RETURN	0xba1100
+
+extern struct address_space *alloc_balloon_mapping(void *balloon_device,
+				const struct address_space_operations *a_ops);
+
+static inline void assign_balloon_mapping(struct page *page,
+					  struct address_space *mapping)
+{
+	rcu_assign_pointer(page->mapping, mapping);
+}
+
+static inline void clear_balloon_mapping(struct page *page)
+{
+	rcu_assign_pointer(page->mapping, NULL);
+}
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_BALLOON_COMPACTION_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e42c762..6df0664 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -24,6 +24,7 @@ enum mapping_flags {
 	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
 	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
+	AS_BALLOON_MAP  = __GFP_BITS_SHIFT + 4, /* balloon page special map */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -53,6 +54,23 @@ static inline int mapping_unevictable(struct address_space *mapping)
 	return !!mapping;
 }
 
+static inline void mapping_set_balloon(struct address_space *mapping)
+{
+	set_bit(AS_BALLOON_MAP, &mapping->flags);
+}
+
+static inline void mapping_clear_balloon(struct address_space *mapping)
+{
+	clear_bit(AS_BALLOON_MAP, &mapping->flags);
+}
+
+static inline int mapping_balloon(struct address_space *mapping)
+{
+	if (mapping)
+		return test_bit(AS_BALLOON_MAP, &mapping->flags);
+	return !!mapping;
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
diff --git a/mm/Kconfig b/mm/Kconfig
index d5c8019..0bd783b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -188,6 +188,21 @@ config SPLIT_PTLOCK_CPUS
 	default "4"
 
 #
+# support for memory balloon compaction
+config BALLOON_COMPACTION
+	bool "Allow for balloon memory compaction/migration"
+	select COMPACTION
+	depends on VIRTIO_BALLOON
+	help
+	  Memory fragmentation introduced by ballooning might reduce
+	  significantly the number of 2MB contiguous memory blocks that can be
+	  used within a guest, thus imposing performance penalties associated
+	  with the reduced number of transparent huge pages that could be used
+	  by the guest workload. Allowing the compaction & migration for memory
+	  pages enlisted as being part of memory balloon devices avoids the
+	  scenario aforementioned and helps improving memory defragmentation.
+
+#
 # support for memory compaction
 config COMPACTION
 	bool "Allow for memory compaction"
diff --git a/mm/Makefile b/mm/Makefile
index 92753e2..78d8caa 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -16,7 +16,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   readahead.o swap.o truncate.o vmscan.o shmem.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
 			   mm_init.o mmu_context.o percpu.o slab_common.o \
-			   compaction.o $(mmu-y)
+			   compaction.o balloon_compaction.o $(mmu-y)
 
 obj-y += init-mm.o
 
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
new file mode 100644
index 0000000..86a3692
--- /dev/null
+++ b/mm/balloon_compaction.c
@@ -0,0 +1,172 @@
+/*
+ * mm/balloon_compaction.c
+ *
+ * Common interface for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini <aquini@redhat.com>
+ */
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+#include <linux/balloon_compaction.h>
+
+/*
+ * alloc_balloon_mapping - allocates a special ->mapping for ballooned pages.
+ * @balloon_device: pointer address that references the balloon device which
+ *                 owns pages bearing this ->mapping.
+ * @a_ops: balloon_mapping address_space_operations descriptor.
+ *
+ * Users must call it to properly allocate and initialize an instance of
+ * struct address_space which will be used as the special page->mapping for
+ * balloon devices enlisted page instances.
+ */
+struct address_space *alloc_balloon_mapping(void *balloon_device,
+				const struct address_space_operations *a_ops)
+{
+	struct address_space *mapping;
+
+	mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
+	if (!mapping)
+		return NULL;
+
+	/*
+	 * Give a clean 'zeroed' status to all elements of this special
+	 * balloon page->mapping struct address_space instance.
+	 */
+	address_space_init_once(mapping);
+
+	/*
+	 * Set mapping->flags appropriately, to allow balloon ->mapping
+	 * identification, as well as give a proper hint to the balloon
+	 * driver on what GFP allocation mask shall be used.
+	 */
+	mapping_set_balloon(mapping);
+	mapping_set_gfp_mask(mapping, balloon_mapping_gfp_mask());
+
+	/* balloon's page->mapping->a_ops callback descriptor */
+	mapping->a_ops = a_ops;
+
+	/*
+	 * balloon special page->mapping overloads ->assoc_mapping
+	 * to held a reference back to the balloon device wich 'owns'
+	 * a given page. This is the way we can cope with multiple
+	 * balloon devices without losing reference of several
+	 * ballooned pagesets.
+	 */
+	mapping->assoc_mapping = balloon_device;
+
+	return mapping;
+}
+EXPORT_SYMBOL_GPL(alloc_balloon_mapping);
+
+#ifdef CONFIG_BALLOON_COMPACTION
+
+static inline bool __is_movable_balloon_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	mapping = rcu_dereference_protected(page->mapping, PageLocked(page));
+	if (mapping)
+		return mapping_balloon(mapping);
+
+	return false;
+}
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+	page->mapping->a_ops->invalidatepage(page, 0);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+	page->mapping->a_ops->freepage(page);
+}
+
+static inline int __migrate_balloon_page(struct address_space *mapping,
+		struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+	return page->mapping->a_ops->migratepage(mapping, newpage, page, mode);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool isolate_balloon_page(struct page *page)
+{
+	if (likely(get_page_unless_zero(page))) {
+		/*
+		 * As balloon pages are not isolated from LRU lists, concurrent
+		 * compaction threads can race against page migration functions
+		 * move_to_new_page() & __unmap_and_move().
+		 * In order to avoid having an already isolated balloon page
+		 * being (wrongly) re-isolated while it is under migration,
+		 * lets be sure we have the page lock before proceeding with
+		 * the balloon page isolation steps.
+		 */
+		if (likely(trylock_page(page))) {
+			/*
+			 * A ballooned page, by default, has just one refcount.
+			 * Prevent concurrent compaction threads from isolating
+			 * an already isolated balloon page by refcount check.
+			 */
+			if (__is_movable_balloon_page(page) &&
+			    (page_count(page) == 2)) {
+				__isolate_balloon_page(page);
+				unlock_page(page);
+				return true;
+			} else if (unlikely(!__is_movable_balloon_page(page))) {
+				dump_page(page);
+				__WARN();
+			}
+			unlock_page(page);
+		}
+		/*
+		 * The page is either under migration, or it's isolated already
+		 * Drop the refcount taken for it.
+		 */
+		put_page(page);
+	}
+	return false;
+}
+
+/* putback_lru_page() counterpart for a ballooned page */
+void putback_balloon_page(struct page *page)
+{
+	/*
+	 * 'lock_page()' stabilizes the page and prevents races against
+	 * concurrent isolation threads attempting to re-isolate it.
+	 */
+	lock_page(page);
+
+	if (__is_movable_balloon_page(page)) {
+		__putback_balloon_page(page);
+		put_page(page);
+	} else {
+		dump_page(page);
+		__WARN();
+	}
+	unlock_page(page);
+}
+
+/* move_to_new_page() counterpart for a ballooned page */
+int migrate_balloon_page(struct page *newpage,
+			 struct page *page, enum migrate_mode mode)
+{
+	struct address_space *mapping;
+	int rc = -EAGAIN;
+
+	BUG_ON(!trylock_page(newpage));
+
+	if (WARN_ON(!__is_movable_balloon_page(page))) {
+		dump_page(page);
+		unlock_page(newpage);
+		return rc;
+	}
+
+	mapping = rcu_dereference_protected(page->mapping, PageLocked(page));
+	if (mapping)
+		rc = __migrate_balloon_page(mapping, newpage, page, mode);
+
+	unlock_page(newpage);
+	return rc;
+}
+
+#endif /* CONFIG_BALLOON_COMPACTION */
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v9 2/5] mm: introduce compaction and migration for ballooned pages
  2012-08-25  5:24 [PATCH v9 0/5] make balloon pages movable by compaction Rafael Aquini
  2012-08-25  5:24 ` [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility Rafael Aquini
@ 2012-08-25  5:24 ` Rafael Aquini
  2012-08-25  5:24 ` [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages Rafael Aquini
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Rafael Aquini @ 2012-08-25  5:24 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, virtualization, Rusty Russell, Michael S. Tsirkin,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney, Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 mm/compaction.c | 47 ++++++++++++++++++++++++++++-------------------
 mm/migrate.c    | 36 ++++++++++++++++++++++++++++++++++--
 2 files changed, 62 insertions(+), 21 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..e50836b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
 #include <linux/backing-dev.h>
 #include <linux/sysctl.h>
 #include <linux/sysfs.h>
+#include <linux/balloon_compaction.h>
 #include "internal.h"
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -358,32 +359,40 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
-		if (!PageLRU(page))
-			continue;
-
 		/*
-		 * PageLRU is set, and lru_lock excludes isolation,
-		 * splitting and collapsing (collapsing has already
-		 * happened if PageLRU is set).
+		 * It is possible to migrate LRU pages and balloon pages.
+		 * Skip any other type of page.
 		 */
-		if (PageTransHuge(page)) {
-			low_pfn += (1 << compound_order(page)) - 1;
-			continue;
-		}
+		if (PageLRU(page)) {
+			/*
+			 * PageLRU is set, and lru_lock excludes isolation,
+			 * splitting and collapsing (collapsing has already
+			 * happened if PageLRU is set).
+			 */
+			if (PageTransHuge(page)) {
+				low_pfn += (1 << compound_order(page)) - 1;
+				continue;
+			}
 
-		if (!cc->sync)
-			mode |= ISOLATE_ASYNC_MIGRATE;
+			if (!cc->sync)
+				mode |= ISOLATE_ASYNC_MIGRATE;
 
-		lruvec = mem_cgroup_page_lruvec(page, zone);
+			lruvec = mem_cgroup_page_lruvec(page, zone);
 
-		/* Try isolate the page */
-		if (__isolate_lru_page(page, mode) != 0)
-			continue;
+			/* Try isolate the page */
+			if (__isolate_lru_page(page, mode) != 0)
+				continue;
 
-		VM_BUG_ON(PageTransCompound(page));
+			VM_BUG_ON(PageTransCompound(page));
+
+			/* Successfully isolated */
+			del_page_from_lru_list(page, lruvec, page_lru(page));
+		} else if (unlikely(movable_balloon_page(page))) {
+			if (!isolate_balloon_page(page))
+				continue;
+		} else
+			continue;
 
-		/* Successfully isolated */
-		del_page_from_lru_list(page, lruvec, page_lru(page));
 		list_add(&page->lru, migratelist);
 		cc->nr_migratepages++;
 		nr_isolated++;
diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..ec439f8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,6 +35,7 @@
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
 #include <linux/gfp.h>
+#include <linux/balloon_compaction.h>
 
 #include <asm/tlbflush.h>
 
@@ -79,7 +80,10 @@ void putback_lru_pages(struct list_head *l)
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
-		putback_lru_page(page);
+		if (unlikely(movable_balloon_page(page)))
+			putback_balloon_page(page);
+		else
+			putback_lru_page(page);
 	}
 }
 
@@ -799,6 +803,18 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		goto skip_unmap;
 	}
 
+	if (unlikely(movable_balloon_page(page))) {
+		/*
+		 * A ballooned page does not need any special attention from
+		 * physical to virtual reverse mapping procedures.
+		 * Skip any attempt to unmap PTEs or to remap swap cache,
+		 * in order to avoid burning cycles at rmap level, and perform
+		 * the page migration right away (proteced by page lock).
+		 */
+		rc = migrate_balloon_page(newpage, page, mode);
+		goto uncharge;
+	}
+
 	/* Establish migration ptes or remove ptes */
 	try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 
@@ -814,7 +830,8 @@ skip_unmap:
 		put_anon_vma(anon_vma);
 
 uncharge:
-	mem_cgroup_end_migration(mem, page, newpage, rc == 0);
+	mem_cgroup_end_migration(mem, page, newpage,
+				 (rc == 0 || rc == BALLOON_MIGRATION_RETURN));
 unlock:
 	unlock_page(page);
 out:
@@ -846,6 +863,21 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 			goto out;
 
 	rc = __unmap_and_move(page, newpage, force, offlining, mode);
+
+	if (unlikely(rc == BALLOON_MIGRATION_RETURN)) {
+		/*
+		 * A ballooned page has been migrated already.
+		 * Now, it's the time to remove the old page from the isolated
+		 * pageset list and handle it back to Buddy, wrap-up counters
+		 * and return.
+		 */
+		dec_zone_page_state(page, NR_ISOLATED_ANON +
+				    page_is_file_cache(page));
+		list_del(&page->lru);
+		put_page(page);
+		__free_page(page);
+		return 0;
+	}
 out:
 	if (rc != -EAGAIN) {
 		/*
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages
  2012-08-25  5:24 [PATCH v9 0/5] make balloon pages movable by compaction Rafael Aquini
  2012-08-25  5:24 ` [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility Rafael Aquini
  2012-08-25  5:24 ` [PATCH v9 2/5] mm: introduce compaction and migration for ballooned pages Rafael Aquini
@ 2012-08-25  5:24 ` Rafael Aquini
  2012-08-26  7:42   ` Michael S. Tsirkin
  2012-08-25  5:24 ` [PATCH v9 4/5] mm: introduce putback_movable_pages() Rafael Aquini
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Rafael Aquini @ 2012-08-25  5:24 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, virtualization, Rusty Russell, Michael S. Tsirkin,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney, Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
the patch changes the balloon bookeeping pages counter into an atomic
counter, as well as it introduces the following locking scheme, in order to
enhance the syncronization methods for accessing elements of struct
virtio_balloon, thus providing protection against the concurrent accesses
introduced by parallel memory compaction threads.

 - balloon_lock (mutex) : synchronizes the access demand to elements of
			  struct virtio_balloon and its queue operations;
 - pages_lock (spinlock): special protection to balloon's pages bookmarking
			  elements (list and atomic counters) against the
			  potential memory compaction concurrency;

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 286 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 265 insertions(+), 21 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..9b0bc46 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,8 @@
 #include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/balloon_compaction.h>
+#include <linux/atomic.h>
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -34,6 +36,7 @@
  * page units.
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
+#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
 
 struct virtio_balloon
 {
@@ -46,11 +49,24 @@ struct virtio_balloon
 	/* The thread servicing the balloon. */
 	struct task_struct *thread;
 
+	/* balloon special page->mapping */
+	struct address_space *mapping;
+
+	/* Synchronize access/update to this struct virtio_balloon elements */
+	struct mutex balloon_lock;
+
 	/* Waiting for host to ack the pages we released. */
 	wait_queue_head_t acked;
 
+	/* Number of balloon pages isolated from 'pages' list for compaction */
+	atomic_t num_isolated_pages;
+
 	/* Number of balloon pages we've told the Host we're not using. */
-	unsigned int num_pages;
+	atomic_t num_pages;
+
+	/* Protect pages list, and pages bookeeping counters */
+	spinlock_t pages_lock;
+
 	/*
 	 * The pages we've told the Host we're not using.
 	 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -60,7 +76,7 @@ struct virtio_balloon
 
 	/* The array of pfns we tell the Host about. */
 	unsigned int num_pfns;
-	u32 pfns[256];
+	u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
 
 	/* Memory statistics */
 	int need_stats_update;
@@ -122,13 +138,17 @@ static void set_page_pfns(u32 pfns[], struct page *page)
 
 static void fill_balloon(struct virtio_balloon *vb, size_t num)
 {
+	/* Get the proper GFP alloc mask from vb->mapping flags */
+	gfp_t vb_gfp_mask = mapping_gfp_mask(vb->mapping);
+
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
+	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
 	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-					__GFP_NOMEMALLOC | __GFP_NOWARN);
+		struct page *page = alloc_page(vb_gfp_mask | __GFP_NORETRY |
+					       __GFP_NOWARN | __GFP_NOMEMALLOC);
 		if (!page) {
 			if (printk_ratelimit())
 				dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -139,9 +159,15 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 			break;
 		}
 		set_page_pfns(vb->pfns + vb->num_pfns, page);
-		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		totalram_pages--;
+
+		BUG_ON(!trylock_page(page));
+		spin_lock(&vb->pages_lock);
 		list_add(&page->lru, &vb->pages);
+		assign_balloon_mapping(page, vb->mapping);
+		atomic_add(VIRTIO_BALLOON_PAGES_PER_PAGE, &vb->num_pages);
+		spin_unlock(&vb->pages_lock);
+		unlock_page(page);
 	}
 
 	/* Didn't get any?  Oh well. */
@@ -149,6 +175,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 		return;
 
 	tell_host(vb, vb->inflate_vq);
+	mutex_unlock(&vb->balloon_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -162,19 +189,97 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 	}
 }
 
+#ifdef CONFIG_BALLOON_COMPACTION
+/* helper to __wait_on_isolated_pages() getting vb->pages list lenght */
+static inline int __pages_at_balloon_list(struct virtio_balloon *vb)
+{
+	return atomic_read(&vb->num_pages) -
+		atomic_read(&vb->num_isolated_pages);
+}
+
+/*
+ * __wait_on_isolated_pages - check if leak_balloon() must wait on isolated
+ *			      pages before proceeding with the page release.
+ * @vb         : pointer to the struct virtio_balloon describing this device.
+ * @leak_target: how many pages we are attempting to release this round.
+ *
+ * Shall only be called by leak_balloon() and under spin_lock(&vb->pages_lock);
+ */
+static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
+					    size_t leak_target)
+{
+	/*
+	 * There are no isolated pages for this balloon device, or
+	 * the leak target is smaller than # of pages on vb->pages list.
+	 * No need to wait, then.
+	 */
+	if (!atomic_read(&vb->num_isolated_pages) ||
+	    leak_target < __pages_at_balloon_list(vb))
+		return;
+	else {
+		/*
+		 * isolated pages are making our leak target bigger than the
+		 * total pages that we can release this round. Let's wait for
+		 * migration returning enough pages back to balloon's list.
+		 */
+		spin_unlock(&vb->pages_lock);
+		wait_event(vb->config_change,
+			   (!atomic_read(&vb->num_isolated_pages) ||
+			    leak_target <= __pages_at_balloon_list(vb)));
+		spin_lock(&vb->pages_lock);
+	}
+}
+#else
+#define __wait_on_isolated_pages(a, b)	do { } while (0)
+#endif /* CONFIG_BALLOON_COMPACTION */
+
 static void leak_balloon(struct virtio_balloon *vb, size_t num)
 {
-	struct page *page;
+	int i;
+	/* The array of pfns we tell the Host about. */
+	u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
+	unsigned int num_pfns = 0;
 
 	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	size_t leak_target = num = min(num, ARRAY_SIZE(pfns));
 
-	for (vb->num_pfns = 0; vb->num_pfns < num;
-	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		page = list_first_entry(&vb->pages, struct page, lru);
-		list_del(&page->lru);
-		set_page_pfns(vb->pfns + vb->num_pfns, page);
-		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
+	while (num_pfns < num) {
+		struct page *page = NULL;
+
+		spin_lock(&vb->pages_lock);
+		/*
+		 * leak_balloon() works releasing balloon pages by groups
+		 * of 'VIRTIO_BALLOON_ARRAY_PFNS_MAX' size at each round.
+		 * When compaction isolates pages from balloon page list,
+		 * we might end up finding less pages on balloon's list than
+		 * what is our desired 'leak_target'. If such occurrence
+		 * happens, we shall wait for enough pages being re-inserted
+		 * into balloon's page list before we proceed releasing them.
+		 */
+		__wait_on_isolated_pages(vb, leak_target);
+
+		if (!list_empty(&vb->pages))
+			page = list_first_entry(&vb->pages, struct page, lru);
+		/*
+		 * Grab the page lock to avoid racing against threads isolating
+		 * pages from, or migrating pages back to vb->pages list.
+		 * (both tasks are done under page lock protection)
+		 *
+		 * Failing to grab the page lock here means this page is being
+		 * isolated already, or its migration has not finished yet.
+		 */
+		if (page && trylock_page(page)) {
+			clear_balloon_mapping(page);
+			list_del(&page->lru);
+			set_page_pfns(pfns + num_pfns, page);
+			atomic_sub(VIRTIO_BALLOON_PAGES_PER_PAGE,
+				   &vb->num_pages);
+			num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
+			unlock_page(page);
+			/* compensate leak_target for this released page */
+			leak_target--;
+		}
+		spin_unlock(&vb->pages_lock);
 	}
 
 	/*
@@ -182,8 +287,15 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
 	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 	 * is true, we *have* to do it in this order
 	 */
+	mutex_lock(&vb->balloon_lock);
+
+	for (i = 0; i < num; i++)
+		vb->pfns[i] = pfns[i];
+
+	vb->num_pfns = num_pfns;
 	tell_host(vb, vb->deflate_vq);
-	release_pages_by_pfn(vb->pfns, vb->num_pfns);
+	release_pages_by_pfn(pfns, num_pfns);
+	mutex_unlock(&vb->balloon_lock);
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
@@ -239,6 +351,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
 	struct scatterlist sg;
 	unsigned int len;
 
+	mutex_lock(&vb->balloon_lock);
 	vb->need_stats_update = 0;
 	update_balloon_stats(vb);
 
@@ -249,6 +362,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
 	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
+	mutex_unlock(&vb->balloon_lock);
 }
 
 static void virtballoon_changed(struct virtio_device *vdev)
@@ -267,12 +381,12 @@ static inline s64 towards_target(struct virtio_balloon *vb)
 			      offsetof(struct virtio_balloon_config, num_pages),
 			      &v, sizeof(v));
 	target = le32_to_cpu(v);
-	return target - vb->num_pages;
+	return target - atomic_read(&vb->num_pages);
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
 {
-	__le32 actual = cpu_to_le32(vb->num_pages);
+	__le32 actual = cpu_to_le32(atomic_read(&vb->num_pages));
 
 	vb->vdev->config->set(vb->vdev,
 			      offsetof(struct virtio_balloon_config, actual),
@@ -339,9 +453,124 @@ static int init_vqs(struct virtio_balloon *vb)
 	return 0;
 }
 
+#ifdef CONFIG_BALLOON_COMPACTION
+/*
+ * virtballoon_isolatepage - perform the balloon page isolation on behalf of
+ *			     a compation thread.
+ *			     (must be called under page lock)
+ * @page: the page to isolated from balloon's page list.
+ * @mode: not used for balloon page isolation.
+ *
+ * A memory compaction thread works isolating pages from private lists,
+ * like LRUs or the balloon's page list (here), to a privative pageset that
+ * will be migrated subsequently. After the mentioned pageset gets isolated
+ * compaction relies on page migration procedures to do the heavy lifting.
+ *
+ * This function populates a balloon_mapping->a_ops callback method to help
+ * a compaction thread on isolating a page from the balloon page list, and
+ * thus allowing its posterior migration.
+ */
+void virtballoon_isolatepage(struct page *page, unsigned long mode)
+{
+	struct virtio_balloon *vb = __page_balloon_device(page);
+
+	BUG_ON(!vb);
+
+	spin_lock(&vb->pages_lock);
+	list_del(&page->lru);
+	atomic_inc(&vb->num_isolated_pages);
+	spin_unlock(&vb->pages_lock);
+}
+
+/*
+ * virtballoon_migratepage - perform the balloon page migration on behalf of
+ *			     a compation thread.
+ *			     (must be called under page lock)
+ * @mapping: the page->mapping which will be assigned to the new migrated page.
+ * @newpage: page that will replace the isolated page after migration finishes.
+ * @page   : the isolated (old) page that is about to be migrated to newpage.
+ * @mode   : compaction mode -- not used for balloon page migration.
+ *
+ * After a ballooned page gets isolated by compaction procedures, this is the
+ * function that performs the page migration on behalf of a compaction running
+ * thread. The page migration for virtio balloon is done in a simple swap
+ * fashion which follows these two macro steps:
+ *  1) insert newpage into vb->pages list and update the host about it;
+ *  2) update the host about the removed old page from vb->pages list;
+ *
+ * This function populates a balloon_mapping->a_ops callback method to allow
+ * a compaction thread to perform the balloon page migration task.
+ */
+int virtballoon_migratepage(struct address_space *mapping,
+		struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+	struct virtio_balloon *vb = __page_balloon_device(page);
+
+	BUG_ON(!vb);
+
+	mutex_lock(&vb->balloon_lock);
+
+	/* balloon's page migration 1st step */
+	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+	spin_lock(&vb->pages_lock);
+	list_add(&newpage->lru, &vb->pages);
+	assign_balloon_mapping(newpage, mapping);
+	atomic_dec(&vb->num_isolated_pages);
+	spin_unlock(&vb->pages_lock);
+	set_page_pfns(vb->pfns, newpage);
+	tell_host(vb, vb->inflate_vq);
+
+	/* balloon's page migration 2nd step */
+	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+	clear_balloon_mapping(page);
+	set_page_pfns(vb->pfns, page);
+	tell_host(vb, vb->deflate_vq);
+
+	mutex_unlock(&vb->balloon_lock);
+	wake_up(&vb->config_change);
+
+	return BALLOON_MIGRATION_RETURN;
+}
+
+/*
+ * virtballoon_putbackpage - insert an isolated page back into the list it was
+ *			     once taken off by a compaction thread.
+ *			     (must be called under page lock)
+ * @page: page that will be re-inserted into balloon page list.
+ *
+ * If by any mean, a compaction thread cannot finish all its job on its round,
+ * and some isolated pages are still remaining at compaction's thread privative
+ * pageset (waiting for migration), then those pages will get re-inserted into
+ * their appropriate lists before the compaction thread exits.
+ *
+ * This function populates a balloon_mapping->a_ops callback method to help
+ * compaction on inserting back into the appropriate list an isolated but
+ * not migrated balloon page.
+ */
+void virtballoon_putbackpage(struct page *page)
+{
+	struct virtio_balloon *vb = __page_balloon_device(page);
+
+	BUG_ON(!vb);
+
+	spin_lock(&vb->pages_lock);
+	list_add(&page->lru, &vb->pages);
+	atomic_dec(&vb->num_isolated_pages);
+	spin_unlock(&vb->pages_lock);
+	wake_up(&vb->config_change);
+}
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+/* define the balloon_mapping->a_ops callbacks to allow compaction/migration */
+static DEFINE_BALLOON_MAPPING_AOPS(virtio_balloon_aops,
+				   virtballoon_isolatepage,
+				   virtballoon_migratepage,
+				   virtballoon_putbackpage);
+
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
+	struct address_space *vb_mapping;
 	int err;
 
 	vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
@@ -351,15 +580,26 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	}
 
 	INIT_LIST_HEAD(&vb->pages);
-	vb->num_pages = 0;
+	mutex_init(&vb->balloon_lock);
+	spin_lock_init(&vb->pages_lock);
+
+	atomic_set(&vb->num_pages, 0);
+	atomic_set(&vb->num_isolated_pages, 0);
 	init_waitqueue_head(&vb->config_change);
 	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
 	vb->need_stats_update = 0;
 
+	vb_mapping = alloc_balloon_mapping(vb, &virtio_balloon_aops);
+	if (!vb_mapping) {
+		err = -ENOMEM;
+		goto out_free_vb;
+	}
+	vb->mapping = vb_mapping;
+
 	err = init_vqs(vb);
 	if (err)
-		goto out_free_vb;
+		goto out_free_vb_mapping;
 
 	vb->thread = kthread_run(balloon, vb, "vballoon");
 	if (IS_ERR(vb->thread)) {
@@ -371,6 +611,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 
 out_del_vqs:
 	vdev->config->del_vqs(vdev);
+out_free_vb_mapping:
+	kfree(vb_mapping);
 out_free_vb:
 	kfree(vb);
 out:
@@ -379,9 +621,11 @@ out:
 
 static void remove_common(struct virtio_balloon *vb)
 {
+	size_t num_pages;
 	/* There might be pages left in the balloon: free them. */
-	while (vb->num_pages)
-		leak_balloon(vb, vb->num_pages);
+	while ((num_pages = atomic_read(&vb->num_pages)) > 0)
+		leak_balloon(vb, num_pages);
+
 	update_balloon_size(vb);
 
 	/* Now we reset the device so we can clean up the queues. */
@@ -396,6 +640,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
 
 	kthread_stop(vb->thread);
 	remove_common(vb);
+	kfree(vb->mapping);
 	kfree(vb);
 }
 
@@ -408,7 +653,6 @@ static int virtballoon_freeze(struct virtio_device *vdev)
 	 * The kthread is already frozen by the PM core before this
 	 * function is called.
 	 */
-
 	remove_common(vb);
 	return 0;
 }
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v9 4/5] mm: introduce putback_movable_pages()
  2012-08-25  5:24 [PATCH v9 0/5] make balloon pages movable by compaction Rafael Aquini
                   ` (2 preceding siblings ...)
  2012-08-25  5:24 ` [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages Rafael Aquini
@ 2012-08-25  5:24 ` Rafael Aquini
  2012-08-25  5:25 ` [PATCH v9 5/5] mm: add vm event counters for balloon pages compaction Rafael Aquini
  2012-08-26  7:58 ` [PATCH v9 0/5] make balloon pages movable by compaction Michael S. Tsirkin
  5 siblings, 0 replies; 19+ messages in thread
From: Rafael Aquini @ 2012-08-25  5:24 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, virtualization, Rusty Russell, Michael S. Tsirkin,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney, Rafael Aquini

The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
hacks around putback_lru_pages() in order to allow ballooned pages to be
re-inserted on balloon page list as if a ballooned page was like a LRU page.

As ballooned pages are not legitimate LRU pages, this patch introduces
putback_movable_pages() to properly cope with cases where the isolated
pageset contains ballooned pages and LRU pages, thus fixing the mentioned
inelegant hack around putback_lru_pages().

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 include/linux/migrate.h |  2 ++
 mm/compaction.c         |  4 ++--
 mm/migrate.c            | 20 ++++++++++++++++++++
 mm/page_alloc.c         |  2 +-
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..ff103a1 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -10,6 +10,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 #ifdef CONFIG_MIGRATION
 
 extern void putback_lru_pages(struct list_head *l);
+extern void putback_movable_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t x,
@@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 #else
 
 static inline void putback_lru_pages(struct list_head *l) {}
+static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
 		enum migrate_mode mode) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index e50836b..409b2f5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -817,9 +817,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
 						nr_remaining);
 
-		/* Release LRU pages not migrated */
+		/* Release isolated pages not migrated */
 		if (err) {
-			putback_lru_pages(&cc->migratepages);
+			putback_movable_pages(&cc->migratepages);
 			cc->nr_migratepages = 0;
 			if (err == -ENOMEM) {
 				ret = COMPACT_PARTIAL;
diff --git a/mm/migrate.c b/mm/migrate.c
index ec439f8..e47daf5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -80,6 +80,26 @@ void putback_lru_pages(struct list_head *l)
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
+			putback_lru_page(page);
+	}
+}
+
+/*
+ * Put previously isolated pages back onto the appropriate lists
+ * from where they were once taken off for compaction/migration.
+ *
+ * This function shall be used instead of putback_lru_pages(),
+ * whenever the isolated pageset has been built by isolate_migratepages_range()
+ */
+void putback_movable_pages(struct list_head *l)
+{
+	struct page *page;
+	struct page *page2;
+
+	list_for_each_entry_safe(page, page2, l, lru) {
+		list_del(&page->lru);
+		dec_zone_page_state(page, NR_ISOLATED_ANON +
+				page_is_file_cache(page));
 		if (unlikely(movable_balloon_page(page)))
 			putback_balloon_page(page);
 		else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c66fb87..a0c2cc5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5675,7 +5675,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 				    0, false, MIGRATE_SYNC);
 	}
 
-	putback_lru_pages(&cc.migratepages);
+	putback_movable_pages(&cc.migratepages);
 	return ret > 0 ? 0 : ret;
 }
 
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v9 5/5] mm: add vm event counters for balloon pages compaction
  2012-08-25  5:24 [PATCH v9 0/5] make balloon pages movable by compaction Rafael Aquini
                   ` (3 preceding siblings ...)
  2012-08-25  5:24 ` [PATCH v9 4/5] mm: introduce putback_movable_pages() Rafael Aquini
@ 2012-08-25  5:25 ` Rafael Aquini
  2012-08-26  7:58 ` [PATCH v9 0/5] make balloon pages movable by compaction Michael S. Tsirkin
  5 siblings, 0 replies; 19+ messages in thread
From: Rafael Aquini @ 2012-08-25  5:25 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, virtualization, Rusty Russell, Michael S. Tsirkin,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney, Rafael Aquini

This patch introduces a new set of vm event counters to keep track of
ballooned pages compaction activity.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 drivers/virtio/virtio_balloon.c |  1 +
 include/linux/vm_event_item.h   |  8 +++++++-
 mm/balloon_compaction.c         |  2 ++
 mm/migrate.c                    |  1 +
 mm/vmstat.c                     | 10 +++++++++-
 5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9b0bc46..e1e8e30 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -528,6 +528,7 @@ int virtballoon_migratepage(struct address_space *mapping,
 
 	mutex_unlock(&vb->balloon_lock);
 	wake_up(&vb->config_change);
+	count_balloon_event(COMPACTBALLOONMIGRATED);
 
 	return BALLOON_MIGRATION_RETURN;
 }
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 57f7b10..13573fe 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
 		COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
 		COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
-#endif
+#ifdef CONFIG_BALLOON_COMPACTION
+		COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
+		COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
+		COMPACTBALLOONRELEASED, /* old-page released after migration */
+		COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
+#endif /* CONFIG_BALLOON_COMPACTION */
+#endif /* CONFIG_COMPACTION */
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
 #endif
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 86a3692..00e7ea9 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -110,6 +110,7 @@ bool isolate_balloon_page(struct page *page)
 			if (__is_movable_balloon_page(page) &&
 			    (page_count(page) == 2)) {
 				__isolate_balloon_page(page);
+				count_balloon_event(COMPACTBALLOONISOLATED);
 				unlock_page(page);
 				return true;
 			} else if (unlikely(!__is_movable_balloon_page(page))) {
@@ -139,6 +140,7 @@ void putback_balloon_page(struct page *page)
 	if (__is_movable_balloon_page(page)) {
 		__putback_balloon_page(page);
 		put_page(page);
+		count_balloon_event(COMPACTBALLOONRETURNED);
 	} else {
 		dump_page(page);
 		__WARN();
diff --git a/mm/migrate.c b/mm/migrate.c
index e47daf5..124b16b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -896,6 +896,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 		list_del(&page->lru);
 		put_page(page);
 		__free_page(page);
+		count_balloon_event(COMPACTBALLOONRELEASED);
 		return 0;
 	}
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index df7a674..5824ad2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -768,7 +768,15 @@ const char * const vmstat_text[] = {
 	"compact_stall",
 	"compact_fail",
 	"compact_success",
-#endif
+
+#ifdef CONFIG_BALLOON_COMPACTION
+	"compact_balloon_isolated",
+	"compact_balloon_migrated",
+	"compact_balloon_released",
+	"compact_balloon_returned",
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+#endif /* CONFIG_COMPACTION */
 
 #ifdef CONFIG_HUGETLB_PAGE
 	"htlb_buddy_alloc_success",
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages
  2012-08-25  5:24 ` [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages Rafael Aquini
@ 2012-08-26  7:42   ` Michael S. Tsirkin
  2012-08-27 19:47     ` Rafael Aquini
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-08-26  7:42 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Sat, Aug 25, 2012 at 02:24:58AM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
> 
> Besides making balloon pages movable at allocation time and introducing
> the necessary primitives to perform balloon page migration/compaction,
> the patch changes the balloon bookeeping pages counter into an atomic
> counter, as well as it introduces the following locking scheme, in order to
> enhance the syncronization methods for accessing elements of struct
> virtio_balloon, thus providing protection against the concurrent accesses
> introduced by parallel memory compaction threads.
> 
>  - balloon_lock (mutex) : synchronizes the access demand to elements of
> 			  struct virtio_balloon and its queue operations;
>  - pages_lock (spinlock): special protection to balloon's pages bookmarking
> 			  elements (list and atomic counters) against the
> 			  potential memory compaction concurrency;
> 
> Signed-off-by: Rafael Aquini <aquini@redhat.com>

OK, this looks better.
Some comments below.

> ---
>  drivers/virtio/virtio_balloon.c | 286 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 265 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 0908e60..9b0bc46 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -27,6 +27,8 @@
>  #include <linux/delay.h>
>  #include <linux/slab.h>
>  #include <linux/module.h>
> +#include <linux/balloon_compaction.h>
> +#include <linux/atomic.h>
>  
>  /*
>   * Balloon device works in 4K page units.  So each page is pointed to by
> @@ -34,6 +36,7 @@
>   * page units.
>   */
>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
> +#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>  
>  struct virtio_balloon
>  {
> @@ -46,11 +49,24 @@ struct virtio_balloon
>  	/* The thread servicing the balloon. */
>  	struct task_struct *thread;
>  
> +	/* balloon special page->mapping */
> +	struct address_space *mapping;
> +
> +	/* Synchronize access/update to this struct virtio_balloon elements */
> +	struct mutex balloon_lock;
> +
>  	/* Waiting for host to ack the pages we released. */
>  	wait_queue_head_t acked;
>  
> +	/* Number of balloon pages isolated from 'pages' list for compaction */
> +	atomic_t num_isolated_pages;
> +
>  	/* Number of balloon pages we've told the Host we're not using. */
> -	unsigned int num_pages;
> +	atomic_t num_pages;
> +
> +	/* Protect pages list, and pages bookeeping counters */
> +	spinlock_t pages_lock;
> +
>  	/*
>  	 * The pages we've told the Host we're not using.
>  	 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
> @@ -60,7 +76,7 @@ struct virtio_balloon
>  
>  	/* The array of pfns we tell the Host about. */
>  	unsigned int num_pfns;
> -	u32 pfns[256];
> +	u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
>  
>  	/* Memory statistics */
>  	int need_stats_update;
> @@ -122,13 +138,17 @@ static void set_page_pfns(u32 pfns[], struct page *page)
>  
>  static void fill_balloon(struct virtio_balloon *vb, size_t num)
>  {
> +	/* Get the proper GFP alloc mask from vb->mapping flags */
> +	gfp_t vb_gfp_mask = mapping_gfp_mask(vb->mapping);
> +
>  	/* We can only do one array worth at a time. */
>  	num = min(num, ARRAY_SIZE(vb->pfns));
>  
> +	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
>  	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> -		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
> -					__GFP_NOMEMALLOC | __GFP_NOWARN);
> +		struct page *page = alloc_page(vb_gfp_mask | __GFP_NORETRY |
> +					       __GFP_NOWARN | __GFP_NOMEMALLOC);
>  		if (!page) {
>  			if (printk_ratelimit())
>  				dev_printk(KERN_INFO, &vb->vdev->dev,
> @@ -139,9 +159,15 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
>  			break;
>  		}
>  		set_page_pfns(vb->pfns + vb->num_pfns, page);
> -		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		totalram_pages--;
> +
> +		BUG_ON(!trylock_page(page));
> +		spin_lock(&vb->pages_lock);
>  		list_add(&page->lru, &vb->pages);
> +		assign_balloon_mapping(page, vb->mapping);
> +		atomic_add(VIRTIO_BALLOON_PAGES_PER_PAGE, &vb->num_pages);
> +		spin_unlock(&vb->pages_lock);
> +		unlock_page(page);
>  	}
>  
>  	/* Didn't get any?  Oh well. */
> @@ -149,6 +175,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
>  		return;
>  
>  	tell_host(vb, vb->inflate_vq);
> +	mutex_unlock(&vb->balloon_lock);
>  }
>  
>  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> @@ -162,19 +189,97 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
>  	}
>  }
>  
> +#ifdef CONFIG_BALLOON_COMPACTION
> +/* helper to __wait_on_isolated_pages() getting vb->pages list lenght */
> +static inline int __pages_at_balloon_list(struct virtio_balloon *vb)
> +{
> +	return atomic_read(&vb->num_pages) -
> +		atomic_read(&vb->num_isolated_pages);
> +}
> +

Reading two atomics and doing math? Result can even be negative.
I did not look at use closely but it looks suspicious.
It's already the case everywhere except __wait_on_isolated_pages,
so just fix that, and then we can keep using int instead of atomics.


> +/*
> + * __wait_on_isolated_pages - check if leak_balloon() must wait on isolated
> + *			      pages before proceeding with the page release.
> + * @vb         : pointer to the struct virtio_balloon describing this device.
> + * @leak_target: how many pages we are attempting to release this round.
> + *
> + * Shall only be called by leak_balloon() and under spin_lock(&vb->pages_lock);
> + */
> +static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
> +					    size_t leak_target)
> +{
> +	/*
> +	 * There are no isolated pages for this balloon device, or
> +	 * the leak target is smaller than # of pages on vb->pages list.
> +	 * No need to wait, then.
> +	 */

This just repeats what's below. So it does not help
at all, better drop it. But maybe you could explain
why does it make sense?

> +	if (!atomic_read(&vb->num_isolated_pages) ||
> +	    leak_target < __pages_at_balloon_list(vb))
> +		return;
> +	else {
> +		/*
> +		 * isolated pages are making our leak target bigger than the
> +		 * total pages that we can release this round. Let's wait for
> +		 * migration returning enough pages back to balloon's list.
> +		 */
> +		spin_unlock(&vb->pages_lock);
> +		wait_event(vb->config_change,
> +			   (!atomic_read(&vb->num_isolated_pages) ||
> +			    leak_target <= __pages_at_balloon_list(vb)));

Why did we repeat the logic above? optimization to skip lock/unlock?

> +		spin_lock(&vb->pages_lock);
> +	}
> +}
> +#else
> +#define __wait_on_isolated_pages(a, b)	do { } while (0)
> +#endif /* CONFIG_BALLOON_COMPACTION */
> +
>  static void leak_balloon(struct virtio_balloon *vb, size_t num)
>  {
> -	struct page *page;
> +	int i;
> +	/* The array of pfns we tell the Host about. */
> +	u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];

That's 1K on stack - and can become more if we increase
VIRTIO_BALLOON_ARRAY_PFNS_MAX.  Probably too much - this is the reason
we use vb->pfns.

> +	unsigned int num_pfns = 0;
>  
>  	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	size_t leak_target = num = min(num, ARRAY_SIZE(pfns));
>  
> -	for (vb->num_pfns = 0; vb->num_pfns < num;
> -	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> -		page = list_first_entry(&vb->pages, struct page, lru);
> -		list_del(&page->lru);
> -		set_page_pfns(vb->pfns + vb->num_pfns, page);
> -		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
> +	while (num_pfns < num) {
> +		struct page *page = NULL;
> +
> +		spin_lock(&vb->pages_lock);
> +		/*
> +		 * leak_balloon() works releasing balloon pages by groups
> +		 * of 'VIRTIO_BALLOON_ARRAY_PFNS_MAX' size at each round.
> +		 * When compaction isolates pages from balloon page list,
> +		 * we might end up finding less pages on balloon's list than
> +		 * what is our desired 'leak_target'. If such occurrence
> +		 * happens, we shall wait for enough pages being re-inserted
> +		 * into balloon's page list before we proceed releasing them.
> +		 */
> +		__wait_on_isolated_pages(vb, leak_target);
> +
> +		if (!list_empty(&vb->pages))
> +			page = list_first_entry(&vb->pages, struct page, lru);
> +		/*
> +		 * Grab the page lock to avoid racing against threads isolating
> +		 * pages from, or migrating pages back to vb->pages list.
> +		 * (both tasks are done under page lock protection)
> +		 *
> +		 * Failing to grab the page lock here means this page is being
> +		 * isolated already, or its migration has not finished yet.
> +		 */
> +		if (page && trylock_page(page)) {
> +			clear_balloon_mapping(page);
> +			list_del(&page->lru);
> +			set_page_pfns(pfns + num_pfns, page);
> +			atomic_sub(VIRTIO_BALLOON_PAGES_PER_PAGE,
> +				   &vb->num_pages);
> +			num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
> +			unlock_page(page);
> +			/* compensate leak_target for this released page */
> +			leak_target--;
> +		}
> +		spin_unlock(&vb->pages_lock);
>  	}
>  
>  	/*
> @@ -182,8 +287,15 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> +	mutex_lock(&vb->balloon_lock);
> +
> +	for (i = 0; i < num; i++)
> +		vb->pfns[i] = pfns[i];
> +
> +	vb->num_pfns = num_pfns;
>  	tell_host(vb, vb->deflate_vq);
> -	release_pages_by_pfn(vb->pfns, vb->num_pfns);
> +	release_pages_by_pfn(pfns, num_pfns);
> +	mutex_unlock(&vb->balloon_lock);
>  }
>  
>  static inline void update_stat(struct virtio_balloon *vb, int idx,
> @@ -239,6 +351,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> +	mutex_lock(&vb->balloon_lock);
>  	vb->need_stats_update = 0;
>  	update_balloon_stats(vb);
>  
> @@ -249,6 +362,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
>  	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
>  		BUG();
>  	virtqueue_kick(vq);
> +	mutex_unlock(&vb->balloon_lock);
>  }
>  
>  static void virtballoon_changed(struct virtio_device *vdev)
> @@ -267,12 +381,12 @@ static inline s64 towards_target(struct virtio_balloon *vb)
>  			      offsetof(struct virtio_balloon_config, num_pages),
>  			      &v, sizeof(v));
>  	target = le32_to_cpu(v);
> -	return target - vb->num_pages;
> +	return target - atomic_read(&vb->num_pages);
>  }
>  
>  static void update_balloon_size(struct virtio_balloon *vb)
>  {
> -	__le32 actual = cpu_to_le32(vb->num_pages);
> +	__le32 actual = cpu_to_le32(atomic_read(&vb->num_pages));
>  
>  	vb->vdev->config->set(vb->vdev,
>  			      offsetof(struct virtio_balloon_config, actual),
> @@ -339,9 +453,124 @@ static int init_vqs(struct virtio_balloon *vb)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_BALLOON_COMPACTION
> +/*
> + * virtballoon_isolatepage - perform the balloon page isolation on behalf of
> + *			     a compation thread.
> + *			     (must be called under page lock)

Better 'called under page lock' - driver is not supposed
to call this.

> + * @page: the page to isolated from balloon's page list.
> + * @mode: not used for balloon page isolation.
> + *
> + * A memory compaction thread works isolating pages from private lists,

by isolating pages

> + * like LRUs or the balloon's page list (here), to a privative pageset that
> + * will be migrated subsequently. After the mentioned pageset gets isolated
> + * compaction relies on page migration procedures to do the heavy lifting.
> + *
> + * This function populates a balloon_mapping->a_ops callback method to help
> + * a compaction thread on isolating a page from the balloon page list, and
> + * thus allowing its posterior migration.

This function isolates a page from the balloon private page list.
Called through balloon_mapping->a_ops.

> + */
> +void virtballoon_isolatepage(struct page *page, unsigned long mode)
> +{
> +	struct virtio_balloon *vb = __page_balloon_device(page);
> +
> +	BUG_ON(!vb);
> +
> +	spin_lock(&vb->pages_lock);
> +	list_del(&page->lru);
> +	atomic_inc(&vb->num_isolated_pages);
> +	spin_unlock(&vb->pages_lock);
> +}
> +
> +/*
> + * virtballoon_migratepage - perform the balloon page migration on behalf of
> + *			     a compation thread.
> + *			     (must be called under page lock)
> + * @mapping: the page->mapping which will be assigned to the new migrated page.
> + * @newpage: page that will replace the isolated page after migration finishes.
> + * @page   : the isolated (old) page that is about to be migrated to newpage.
> + * @mode   : compaction mode -- not used for balloon page migration.
> + *
> + * After a ballooned page gets isolated by compaction procedures, this is the
> + * function that performs the page migration on behalf of a compaction running
> + * thread.

of a compaction thread

> The page migration for virtio balloon is done in a simple swap
> + * fashion which follows these two macro steps:
> + *  1) insert newpage into vb->pages list and update the host about it;
> + *  2) update the host about the removed old page from vb->pages list;

the old page removed from

> + *
> + * This function populates a balloon_mapping->a_ops callback method to allow
> + * a compaction thread to perform the balloon page migration task.

This function preforms the balloon page migration task.
Called through balloon_mapping->a_ops.


> + */
> +int virtballoon_migratepage(struct address_space *mapping,
> +		struct page *newpage, struct page *page, enum migrate_mode mode)
> +{
> +	struct virtio_balloon *vb = __page_balloon_device(page);
> +
> +	BUG_ON(!vb);
> +
> +	mutex_lock(&vb->balloon_lock);
> +
> +	/* balloon's page migration 1st step */
> +	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> +	spin_lock(&vb->pages_lock);
> +	list_add(&newpage->lru, &vb->pages);
> +	assign_balloon_mapping(newpage, mapping);
> +	atomic_dec(&vb->num_isolated_pages);
> +	spin_unlock(&vb->pages_lock);
> +	set_page_pfns(vb->pfns, newpage);
> +	tell_host(vb, vb->inflate_vq);
> +
> +	/* balloon's page migration 2nd step */
> +	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> +	clear_balloon_mapping(page);
> +	set_page_pfns(vb->pfns, page);
> +	tell_host(vb, vb->deflate_vq);
> +
> +	mutex_unlock(&vb->balloon_lock);
> +	wake_up(&vb->config_change);
> +
> +	return BALLOON_MIGRATION_RETURN;
> +}
> +
> +/*
> + * virtballoon_putbackpage - insert an isolated page back into the list it was
> + *			     once taken off by a compaction thread.
> + *			     (must be called under page lock)
> + * @page: page that will be re-inserted into balloon page list.
> + *
> + * If by any mean,

If for some reason

> a compaction thread cannot finish all its job on its round,

in one round

> + * and some isolated pages are still remaining at compaction's thread privative
> + * pageset (waiting for migration), then those pages will get re-inserted into
> + * their appropriate lists

appropriate -> private balloon

> before the compaction thread exits.

will exit.

> + *
> + * This function populates a balloon_mapping->a_ops callback method to help
> + * compaction on inserting back into the appropriate list an isolated but
> + * not migrated balloon page.

This function inserts an isolated but not migrated balloon page
back into private balloon list.
Called through balloon_mapping->a_ops.

> + */
> +void virtballoon_putbackpage(struct page *page)
> +{
> +	struct virtio_balloon *vb = __page_balloon_device(page);
> +
> +	BUG_ON(!vb);
> +
> +	spin_lock(&vb->pages_lock);
> +	list_add(&page->lru, &vb->pages);
> +	atomic_dec(&vb->num_isolated_pages);
> +	spin_unlock(&vb->pages_lock);
> +	wake_up(&vb->config_change);
> +}
> +#endif /* CONFIG_BALLOON_COMPACTION */
> +
> +/* define the balloon_mapping->a_ops callbacks to allow compaction/migration */
> +static DEFINE_BALLOON_MAPPING_AOPS(virtio_balloon_aops,
> +				   virtballoon_isolatepage,
> +				   virtballoon_migratepage,
> +				   virtballoon_putbackpage);
> +
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> +	struct address_space *vb_mapping;
>  	int err;
>  
>  	vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
> @@ -351,15 +580,26 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	}
>  
>  	INIT_LIST_HEAD(&vb->pages);
> -	vb->num_pages = 0;
> +	mutex_init(&vb->balloon_lock);
> +	spin_lock_init(&vb->pages_lock);
> +
> +	atomic_set(&vb->num_pages, 0);
> +	atomic_set(&vb->num_isolated_pages, 0);
>  	init_waitqueue_head(&vb->config_change);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
>  	vb->need_stats_update = 0;
>  
> +	vb_mapping = alloc_balloon_mapping(vb, &virtio_balloon_aops);
> +	if (!vb_mapping) {
> +		err = -ENOMEM;
> +		goto out_free_vb;
> +	}
> +	vb->mapping = vb_mapping;
> +
>  	err = init_vqs(vb);
>  	if (err)
> -		goto out_free_vb;
> +		goto out_free_vb_mapping;
>  
>  	vb->thread = kthread_run(balloon, vb, "vballoon");
>  	if (IS_ERR(vb->thread)) {
> @@ -371,6 +611,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  
>  out_del_vqs:
>  	vdev->config->del_vqs(vdev);
> +out_free_vb_mapping:
> +	kfree(vb_mapping);

We have alloc_balloon_mapping, it would be cleaner to have
free_balloon_mapping.

>  out_free_vb:
>  	kfree(vb);
>  out:
> @@ -379,9 +621,11 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	size_t num_pages;
>  	/* There might be pages left in the balloon: free them. */
> -	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +	while ((num_pages = atomic_read(&vb->num_pages)) > 0)
> +		leak_balloon(vb, num_pages);
> +
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -396,6 +640,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
>  
>  	kthread_stop(vb->thread);
>  	remove_common(vb);
> +	kfree(vb->mapping);
>  	kfree(vb);
>  }
>  
> @@ -408,7 +653,6 @@ static int virtballoon_freeze(struct virtio_device *vdev)
>  	 * The kthread is already frozen by the PM core before this
>  	 * function is called.
>  	 */
> -
>  	remove_common(vb);
>  	return 0;
>  }
> -- 
> 1.7.11.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility
  2012-08-25  5:24 ` [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility Rafael Aquini
@ 2012-08-26  7:55   ` Michael S. Tsirkin
  2012-08-27 20:28     ` Rafael Aquini
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-08-26  7:55 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Sat, Aug 25, 2012 at 02:24:56AM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
> 
> This patch introduces a common interface to help a balloon driver on
> making its page set movable to compaction, and thus allowing the system
> to better leverage the compation efforts on memory defragmentation.
> 
> Signed-off-by: Rafael Aquini <aquini@redhat.com>

Tons of rcu uses but not sync in sight. This looks suspicious.

> ---
>  include/linux/balloon_compaction.h | 137 +++++++++++++++++++++++++++++
>  include/linux/pagemap.h            |  18 ++++
>  mm/Kconfig                         |  15 ++++
>  mm/Makefile                        |   2 +-
>  mm/balloon_compaction.c            | 172 +++++++++++++++++++++++++++++++++++++
>  5 files changed, 343 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/balloon_compaction.h
>  create mode 100644 mm/balloon_compaction.c
> 
> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> new file mode 100644
> index 0000000..7afb0ae
> --- /dev/null
> +++ b/include/linux/balloon_compaction.h
> @@ -0,0 +1,137 @@
> +/*
> + * include/linux/balloon_compaction.h
> + *
> + * Common interface definitions for making balloon pages movable to compaction.
> + *
> + * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini <aquini@redhat.com>
> + */
> +#ifndef _LINUX_BALLOON_COMPACTION_H
> +#define _LINUX_BALLOON_COMPACTION_H
> +#ifdef __KERNEL__

We don't need ifdef __KERNEL__ - it's only useful
for headers exported to userspace. This one isn't.

> +
> +#include <linux/rcupdate.h>
> +#include <linux/pagemap.h>
> +#include <linux/gfp.h>
> +
> +#ifdef CONFIG_BALLOON_COMPACTION
> +#define count_balloon_event(e)	count_vm_event(e)
> +extern bool isolate_balloon_page(struct page *);
> +extern void putback_balloon_page(struct page *);
> +extern int migrate_balloon_page(struct page *newpage,
> +				struct page *page, enum migrate_mode mode);
> +
> +static inline gfp_t balloon_mapping_gfp_mask(void)
> +{
> +	return GFP_HIGHUSER_MOVABLE;
> +}
> +
> +/*
> + * movable_balloon_page - test page->mapping->flags to identify balloon pages
> + *			  that can be moved by compaction/migration.
> + *
> + * This function is used at core compaction's page isolation scheme and so it's
> + * exposed to several system pages which may, or may not, be part of a memory
> + * balloon, and thus we cannot afford to hold a page locked to perform tests.
> + *
> + * Therefore, as we might return false positives in the case a balloon page
> + * is just released under us, the page->mapping->flags need to be retested
> + * with the proper page lock held, on the functions that will cope with the
> + * balloon page later.
> + */
> +static inline bool movable_balloon_page(struct page *page)
> +{
> +	/*
> +	 * Before dereferencing and testing mapping->flags, lets make sure
> +	 * this is not a page that uses ->mapping in a different way
> +	 */
> +	if (!PageSlab(page) && !PageSwapCache(page) &&
> +	    !PageAnon(page) && !page_mapped(page)) {
> +		/*
> +		 * While doing compaction core work, we cannot afford to hold
> +		 * page lock as it might cause very undesirable side effects.
> +		 */
> +		struct address_space *mapping;
> +		mapping = rcu_dereference_raw(page->mapping);
> +		if (mapping)
> +			return mapping_balloon(mapping);
> +	}
> +	return false;
> +}
> +
> +/*
> + * __page_balloon_device - return the balloon device owing the page.

owning?

> + *
> + * This shall only be used at driver callbacks under proper page lock,
> + * to get access to the balloon device structure that owns @page.
> + */
> +static inline void *__page_balloon_device(struct page *page)
> +{
> +	struct address_space *mapping;
> +	mapping = rcu_dereference_protected(page->mapping, PageLocked(page));
> +	if (mapping)
> +		mapping = mapping->assoc_mapping;
> +	return (void *)mapping;

Never cast pointers to void *.

> +}
> +
> +/*
> + * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback descriptor
> + *				 to be used as balloon page->mapping->a_ops.
> + *
> + * @label     : declaration identifier (var name)
> + * @isolatepg : callback symbol name for performing the page isolation step
> + * @migratepg : callback symbol name for performing the page migration step
> + * @putbackpg : callback symbol name for performing the page putback step
> + *
> + * address_space_operations utilized methods for ballooned pages:
> + *   .migratepage    - used to perform balloon's page migration (as is)
> + *   .invalidatepage - used to isolate a page from balloon's page list
> + *   .freepage       - used to reinsert an isolated page to balloon's page list
> + */
> +#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
> +	const struct address_space_operations (label) = {		    \
> +		.migratepage    = (migratepg),				    \
> +		.invalidatepage = (isolatepg),				    \
> +		.freepage       = (putbackpg),				    \
> +	}
> +
> +#else
> +#define count_balloon_event(e)	do { } while (0)
> +static inline bool movable_balloon_page(struct page *page) { return false; }
> +static inline bool isolate_balloon_page(struct page *page) { return false; }
> +static inline void putback_balloon_page(struct page *page) { return; }
> +
> +static inline int migrate_balloon_page(struct page *newpage,
> +				struct page *page, enum migrate_mode mode)
> +{
> +	return 0;
> +}
> +
> +static inline gfp_t balloon_mapping_gfp_mask(void)
> +{
> +	return GFP_HIGHUSER;
> +}
> +
> +#define DEFINE_BALLOON_MAPPING_AOPS(label, migratepg, isolatepg, putbackpg) \
> +	const struct address_space_operations *(label) = NULL

Looks like the result can't be passed to alloc_balloon_mapping
anyway. Did you test-build with compaction off?

I would suggest adding free_balloon_mapping
and making both it and alloc_balloon_mapping
empty macros, or in case of alloc_balloon_mapping,
macro returning err_ptr(-EOPNOTSUPP).

This can then become

+	const struct {} (label) = {}

?

> +
> +#endif /* CONFIG_BALLOON_COMPACTION */
> +
> +/* return code to identify when a ballooned page has been migrated */
> +#define BALLOON_MIGRATION_RETURN	0xba1100
> +
> +extern struct address_space *alloc_balloon_mapping(void *balloon_device,
> +				const struct address_space_operations *a_ops);
> +
> +static inline void assign_balloon_mapping(struct page *page,
> +					  struct address_space *mapping)
> +{
> +	rcu_assign_pointer(page->mapping, mapping);
> +}
> +
> +static inline void clear_balloon_mapping(struct page *page)
> +{
> +	rcu_assign_pointer(page->mapping, NULL);
> +}
> +
> +#endif /* __KERNEL__ */
> +#endif /* _LINUX_BALLOON_COMPACTION_H */
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index e42c762..6df0664 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -24,6 +24,7 @@ enum mapping_flags {
>  	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> +	AS_BALLOON_MAP  = __GFP_BITS_SHIFT + 4, /* balloon page special map */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> @@ -53,6 +54,23 @@ static inline int mapping_unevictable(struct address_space *mapping)
>  	return !!mapping;
>  }
>  
> +static inline void mapping_set_balloon(struct address_space *mapping)
> +{
> +	set_bit(AS_BALLOON_MAP, &mapping->flags);
> +}
> +
> +static inline void mapping_clear_balloon(struct address_space *mapping)
> +{
> +	clear_bit(AS_BALLOON_MAP, &mapping->flags);
> +}
> +
> +static inline int mapping_balloon(struct address_space *mapping)
> +{
> +	if (mapping)
> +		return test_bit(AS_BALLOON_MAP, &mapping->flags);
> +	return !!mapping;
> +}
> +
>  static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
>  {
>  	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
> diff --git a/mm/Kconfig b/mm/Kconfig
> index d5c8019..0bd783b 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -188,6 +188,21 @@ config SPLIT_PTLOCK_CPUS
>  	default "4"
>  
>  #
> +# support for memory balloon compaction
> +config BALLOON_COMPACTION
> +	bool "Allow for balloon memory compaction/migration"
> +	select COMPACTION
> +	depends on VIRTIO_BALLOON
> +	help
> +	  Memory fragmentation introduced by ballooning might reduce
> +	  significantly the number of 2MB contiguous memory blocks that can be
> +	  used within a guest, thus imposing performance penalties associated
> +	  with the reduced number of transparent huge pages that could be used
> +	  by the guest workload. Allowing the compaction & migration for memory
> +	  pages enlisted as being part of memory balloon devices avoids the
> +	  scenario aforementioned and helps improving memory defragmentation.
> +
> +#
>  # support for memory compaction
>  config COMPACTION
>  	bool "Allow for memory compaction"
> diff --git a/mm/Makefile b/mm/Makefile
> index 92753e2..78d8caa 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -16,7 +16,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
>  			   readahead.o swap.o truncate.o vmscan.o shmem.o \
>  			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
>  			   mm_init.o mmu_context.o percpu.o slab_common.o \
> -			   compaction.o $(mmu-y)
> +			   compaction.o balloon_compaction.o $(mmu-y)
>  
>  obj-y += init-mm.o
>  
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> new file mode 100644
> index 0000000..86a3692
> --- /dev/null
> +++ b/mm/balloon_compaction.c
> @@ -0,0 +1,172 @@
> +/*
> + * mm/balloon_compaction.c
> + *
> + * Common interface for making balloon pages movable to compaction.
> + *
> + * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini <aquini@redhat.com>
> + */
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/export.h>
> +#include <linux/balloon_compaction.h>
> +
> +/*
> + * alloc_balloon_mapping - allocates a special ->mapping for ballooned pages.
> + * @balloon_device: pointer address that references the balloon device which
> + *                 owns pages bearing this ->mapping.
> + * @a_ops: balloon_mapping address_space_operations descriptor.
> + *
> + * Users must call it to properly allocate and initialize an instance of
> + * struct address_space which will be used as the special page->mapping for
> + * balloon devices enlisted page instances.
> + */
> +struct address_space *alloc_balloon_mapping(void *balloon_device,
> +				const struct address_space_operations *a_ops)
> +{
> +	struct address_space *mapping;
> +
> +	mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
> +	if (!mapping)
> +		return NULL;
> +
> +	/*
> +	 * Give a clean 'zeroed' status to all elements of this special
> +	 * balloon page->mapping struct address_space instance.
> +	 */
> +	address_space_init_once(mapping);
> +
> +	/*
> +	 * Set mapping->flags appropriately, to allow balloon ->mapping
> +	 * identification, as well as give a proper hint to the balloon
> +	 * driver on what GFP allocation mask shall be used.
> +	 */
> +	mapping_set_balloon(mapping);
> +	mapping_set_gfp_mask(mapping, balloon_mapping_gfp_mask());
> +
> +	/* balloon's page->mapping->a_ops callback descriptor */
> +	mapping->a_ops = a_ops;
> +
> +	/*
> +	 * balloon special page->mapping overloads ->assoc_mapping
> +	 * to held a reference back to the balloon device wich 'owns'
> +	 * a given page. This is the way we can cope with multiple
> +	 * balloon devices without losing reference of several
> +	 * ballooned pagesets.
> +	 */
> +	mapping->assoc_mapping = balloon_device;
> +
> +	return mapping;
> +}
> +EXPORT_SYMBOL_GPL(alloc_balloon_mapping);
> +
> +#ifdef CONFIG_BALLOON_COMPACTION
> +
> +static inline bool __is_movable_balloon_page(struct page *page)
> +{
> +	struct address_space *mapping;
> +
> +	mapping = rcu_dereference_protected(page->mapping, PageLocked(page));
> +	if (mapping)
> +		return mapping_balloon(mapping);
> +
> +	return false;
> +}
> +
> +static inline void __isolate_balloon_page(struct page *page)
> +{
> +	page->mapping->a_ops->invalidatepage(page, 0);
> +}
> +
> +static inline void __putback_balloon_page(struct page *page)
> +{
> +	page->mapping->a_ops->freepage(page);
> +}
> +
> +static inline int __migrate_balloon_page(struct address_space *mapping,
> +		struct page *newpage, struct page *page, enum migrate_mode mode)
> +{
> +	return page->mapping->a_ops->migratepage(mapping, newpage, page, mode);
> +}
> +
> +/* __isolate_lru_page() counterpart for a ballooned page */
> +bool isolate_balloon_page(struct page *page)
> +{
> +	if (likely(get_page_unless_zero(page))) {
> +		/*
> +		 * As balloon pages are not isolated from LRU lists, concurrent
> +		 * compaction threads can race against page migration functions
> +		 * move_to_new_page() & __unmap_and_move().
> +		 * In order to avoid having an already isolated balloon page
> +		 * being (wrongly) re-isolated while it is under migration,
> +		 * lets be sure we have the page lock before proceeding with
> +		 * the balloon page isolation steps.
> +		 */
> +		if (likely(trylock_page(page))) {
> +			/*
> +			 * A ballooned page, by default, has just one refcount.
> +			 * Prevent concurrent compaction threads from isolating
> +			 * an already isolated balloon page by refcount check.
> +			 */
> +			if (__is_movable_balloon_page(page) &&
> +			    (page_count(page) == 2)) {
> +				__isolate_balloon_page(page);
> +				unlock_page(page);
> +				return true;
> +			} else if (unlikely(!__is_movable_balloon_page(page))) {
> +				dump_page(page);
> +				__WARN();
> +			}
> +			unlock_page(page);
> +		}
> +		/*
> +		 * The page is either under migration, or it's isolated already
> +		 * Drop the refcount taken for it.
> +		 */
> +		put_page(page);
> +	}
> +	return false;
> +}
> +
> +/* putback_lru_page() counterpart for a ballooned page */
> +void putback_balloon_page(struct page *page)
> +{
> +	/*
> +	 * 'lock_page()' stabilizes the page and prevents races against
> +	 * concurrent isolation threads attempting to re-isolate it.
> +	 */
> +	lock_page(page);
> +
> +	if (__is_movable_balloon_page(page)) {
> +		__putback_balloon_page(page);
> +		put_page(page);
> +	} else {
> +		dump_page(page);
> +		__WARN();
> +	}
> +	unlock_page(page);
> +}
> +
> +/* move_to_new_page() counterpart for a ballooned page */
> +int migrate_balloon_page(struct page *newpage,
> +			 struct page *page, enum migrate_mode mode)
> +{
> +	struct address_space *mapping;
> +	int rc = -EAGAIN;
> +
> +	BUG_ON(!trylock_page(newpage));
> +
> +	if (WARN_ON(!__is_movable_balloon_page(page))) {
> +		dump_page(page);
> +		unlock_page(newpage);
> +		return rc;
> +	}
> +
> +	mapping = rcu_dereference_protected(page->mapping, PageLocked(page));
> +	if (mapping)
> +		rc = __migrate_balloon_page(mapping, newpage, page, mode);
> +
> +	unlock_page(newpage);
> +	return rc;
> +}
> +
> +#endif /* CONFIG_BALLOON_COMPACTION */
> -- 
> 1.7.11.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 0/5] make balloon pages movable by compaction
  2012-08-25  5:24 [PATCH v9 0/5] make balloon pages movable by compaction Rafael Aquini
                   ` (4 preceding siblings ...)
  2012-08-25  5:25 ` [PATCH v9 5/5] mm: add vm event counters for balloon pages compaction Rafael Aquini
@ 2012-08-26  7:58 ` Michael S. Tsirkin
  2012-08-26 14:40   ` Rik van Riel
  5 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-08-26  7:58 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Sat, Aug 25, 2012 at 02:24:55AM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
> 
> This patch-set follows the main idea discussed at 2012 LSFMMS session:
> "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
> to introduce the required changes to the virtio_balloon driver, as well as
> the changes to the core compaction & migration bits, in order to make those
> subsystems aware of ballooned pages and allow memory balloon pages become
> movable within a guest, thus avoiding the aforementioned fragmentation issue

Meta-question: are there any numbers showing gain from this patchset?

The reason I ask, on migration we notify host about each page
individually.  If this is rare maybe the patchset does not help much.
If this is common we would be better off building up a list of multiple
pages and passing them in one go.

> Rafael Aquini (5):
>   mm: introduce a common interface for balloon pages mobility
>   mm: introduce compaction and migration for ballooned pages
>   virtio_balloon: introduce migration primitives to balloon pages
>   mm: introduce putback_movable_pages()
>   mm: add vm event counters for balloon pages compaction
> 
>  drivers/virtio/virtio_balloon.c    | 287 ++++++++++++++++++++++++++++++++++---
>  include/linux/balloon_compaction.h | 137 ++++++++++++++++++
>  include/linux/migrate.h            |   2 +
>  include/linux/pagemap.h            |  18 +++
>  include/linux/vm_event_item.h      |   8 +-
>  mm/Kconfig                         |  15 ++
>  mm/Makefile                        |   2 +-
>  mm/balloon_compaction.c            | 174 ++++++++++++++++++++++
>  mm/compaction.c                    |  51 ++++---
>  mm/migrate.c                       |  57 +++++++-
>  mm/page_alloc.c                    |   2 +-
>  mm/vmstat.c                        |  10 +-
>  12 files changed, 715 insertions(+), 48 deletions(-)
>  create mode 100644 include/linux/balloon_compaction.h
>  create mode 100644 mm/balloon_compaction.c
> 
> 
> Change log:
> v9:
>  * Adjust rcu_dereference usage to leverage page lock protection  (Paul, Peter);
>  * Enhance doc on compaction interface introduced to balloon driver   (Michael);
>  * Fix issue with isolated pages breaking leak_balloon() logics       (Michael);
> v8:
>  * introduce a common MM interface for balloon driver page compaction (Michael);
>  * remove the global state preventing multiple balloon device support (Michael);
>  * introduce RCU protection/syncrhonization to balloon page->mapping  (Michael);
> v7:
>  * fix a potential page leak case at 'putback_balloon_page'               (Mel);
>  * adjust vm-events-counter patch and remove its drop-on-merge message    (Rik);
>  * add 'putback_movable_pages' to avoid hacks on 'putback_lru_pages'  (Minchan);
> v6:
>  * rename 'is_balloon_page()' to 'movable_balloon_page()' 		  (Rik);
> v5:
>  * address Andrew Morton's review comments on the patch series;
>  * address a couple extra nitpick suggestions on PATCH 01 	      (Minchan);
> v4: 
>  * address Rusty Russel's review comments on PATCH 02;
>  * re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3;
> V3: 
>  * address reviewers nitpick suggestions on PATCH 01		 (Mel, Minchan);
> V2: 
>  * address Mel Gorman's review comments on PATCH 01;
> 
> 
> Preliminary test results:
> (2 VCPU 2048mB RAM KVM guest running 3.6.0_rc3+ -- after a reboot)
> 
> * 64mB balloon:
> [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
> compact_blocks_moved 0
> compact_pages_moved 0
> compact_pagemigrate_failed 0
> compact_stall 0
> compact_fail 0
> compact_success 0
> compact_balloon_isolated 0
> compact_balloon_migrated 0
> compact_balloon_released 0
> compact_balloon_returned 0
> [root@localhost ~]# 
> [root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null 
> [1]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [2]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [3]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [4]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [5]-  Done                    echo 1 > /proc/sys/vm/compact_memory
> [6]+  Done                    echo 1 > /proc/sys/vm/compact_memory
> [root@localhost ~]# 
> [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
> compact_blocks_moved 3108
> compact_pages_moved 43169
> compact_pagemigrate_failed 95
> compact_stall 0
> compact_fail 0
> compact_success 0
> compact_balloon_isolated 16384
> compact_balloon_migrated 16384
> compact_balloon_released 16384
> compact_balloon_returned 0
> 
> 
> * 128 mB balloon:
> [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
> compact_blocks_moved 0
> compact_pages_moved 0
> compact_pagemigrate_failed 0
> compact_stall 0
> compact_fail 0
> compact_success 0
> compact_balloon_isolated 0
> compact_balloon_migrated 0
> compact_balloon_released 0
> compact_balloon_returned 0
> [root@localhost ~]# 
> [root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null  
> [1]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [2]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [3]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [4]   Done                    echo 1 > /proc/sys/vm/compact_memory
> [5]-  Done                    echo 1 > /proc/sys/vm/compact_memory
> [6]+  Done                    echo 1 > /proc/sys/vm/compact_memory
> [root@localhost ~]# 
> [root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
> compact_blocks_moved 3062
> compact_pages_moved 49774
> compact_pagemigrate_failed 129
> compact_stall 0
> compact_fail 0
> compact_success 0
> compact_balloon_isolated 26076
> compact_balloon_migrated 25957
> compact_balloon_released 25957
> compact_balloon_returned 119
> 
> -- 
> 1.7.11.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 0/5] make balloon pages movable by compaction
  2012-08-26  7:58 ` [PATCH v9 0/5] make balloon pages movable by compaction Michael S. Tsirkin
@ 2012-08-26 14:40   ` Rik van Riel
  2012-08-26 15:44     ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Rik van Riel @ 2012-08-26 14:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rafael Aquini, linux-mm, linux-kernel, virtualization,
	Rusty Russell, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On 08/26/2012 03:58 AM, Michael S. Tsirkin wrote:
> On Sat, Aug 25, 2012 at 02:24:55AM -0300, Rafael Aquini wrote:
>> Memory fragmentation introduced by ballooning might reduce significantly
>> the number of 2MB contiguous memory blocks that can be used within a guest,
>> thus imposing performance penalties associated with the reduced number of
>> transparent huge pages that could be used by the guest workload.
>>
>> This patch-set follows the main idea discussed at 2012 LSFMMS session:
>> "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
>> to introduce the required changes to the virtio_balloon driver, as well as
>> the changes to the core compaction & migration bits, in order to make those
>> subsystems aware of ballooned pages and allow memory balloon pages become
>> movable within a guest, thus avoiding the aforementioned fragmentation issue
>
> Meta-question: are there any numbers showing gain from this patchset?
>
> The reason I ask, on migration we notify host about each page
> individually.  If this is rare maybe the patchset does not help much.
> If this is common we would be better off building up a list of multiple
> pages and passing them in one go.

The gain is in getting a better THP allocation rate inside the
guest, allowing applications to run faster.

The rarer it is for this code to run, the better - it means we
are getting the benefits without the overhead :)

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 0/5] make balloon pages movable by compaction
  2012-08-26 14:40   ` Rik van Riel
@ 2012-08-26 15:44     ` Michael S. Tsirkin
  2012-08-27 20:22       ` Rafael Aquini
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-08-26 15:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Rafael Aquini, linux-mm, linux-kernel, virtualization,
	Rusty Russell, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Sun, Aug 26, 2012 at 10:40:37AM -0400, Rik van Riel wrote:
> On 08/26/2012 03:58 AM, Michael S. Tsirkin wrote:
> >On Sat, Aug 25, 2012 at 02:24:55AM -0300, Rafael Aquini wrote:
> >>Memory fragmentation introduced by ballooning might reduce significantly
> >>the number of 2MB contiguous memory blocks that can be used within a guest,
> >>thus imposing performance penalties associated with the reduced number of
> >>transparent huge pages that could be used by the guest workload.
> >>
> >>This patch-set follows the main idea discussed at 2012 LSFMMS session:
> >>"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
> >>to introduce the required changes to the virtio_balloon driver, as well as
> >>the changes to the core compaction & migration bits, in order to make those
> >>subsystems aware of ballooned pages and allow memory balloon pages become
> >>movable within a guest, thus avoiding the aforementioned fragmentation issue
> >
> >Meta-question: are there any numbers showing gain from this patchset?
> >
> >The reason I ask, on migration we notify host about each page
> >individually.  If this is rare maybe the patchset does not help much.
> >If this is common we would be better off building up a list of multiple
> >pages and passing them in one go.
> 
> The gain is in getting a better THP allocation rate inside the
> guest, allowing applications to run faster.
> 
> The rarer it is for this code to run, the better - it means we
> are getting the benefits without the overhead :)

I am simply asking how was this patchset tested.
It would be nice to have this info in commit log.
Since this is an optimization patch it is strange
to see one with no numbers at all.
For example, you probably run some workload and
played with the balloon, and then saw less huge pages
without the patch and more with?
Please put this info in the cover letter.

> -- 
> All rights reversed

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages
  2012-08-26  7:42   ` Michael S. Tsirkin
@ 2012-08-27 19:47     ` Rafael Aquini
  2012-08-28 15:54       ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael Aquini @ 2012-08-27 19:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Sun, Aug 26, 2012 at 10:42:44AM +0300, Michael S. Tsirkin wrote:
> 
> Reading two atomics and doing math? Result can even be negative.
> I did not look at use closely but it looks suspicious.
Doc on atomic_read says:
"
The read is atomic in that the return value is guaranteed to be one of the
values initialized or modified with the interface operations if a proper
implicit or explicit memory barrier is used after possible runtime
initialization by any other thread and the value is modified only with the
interface operations.
"

There's no runtime init by other thread than balloon's itself at device register,
and the operations (inc, dec) are made by the proper interface operations
only when protected by the spinlock pages_lock. It does not look suspicious, IMHO.
I'm failing to see how it could become a negative on that case, since you cannot
isolate more pages than what was previoulsy inflated to balloon's list.


> It's already the case everywhere except __wait_on_isolated_pages,
> so just fix that, and then we can keep using int instead of atomics.
> 
Sorry, I quite didn't get you here. fix what?

 
> That's 1K on stack - and can become more if we increase
> VIRTIO_BALLOON_ARRAY_PFNS_MAX.  Probably too much - this is the reason
> we use vb->pfns.
>
If we want to use vb->pfns we'll have to make leak_balloon mutual exclusive with
page migration (as it was before), but that will inevictably bring us back to
the discussion on breaking the loop when isolated pages make leak_balloon find
less pages than it wants to release at each leak round.

 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 0/5] make balloon pages movable by compaction
  2012-08-26 15:44     ` Michael S. Tsirkin
@ 2012-08-27 20:22       ` Rafael Aquini
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael Aquini @ 2012-08-27 20:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rik van Riel, linux-mm, linux-kernel, virtualization,
	Rusty Russell, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Sun, Aug 26, 2012 at 06:44:23PM +0300, Michael S. Tsirkin wrote:
> 
> I am simply asking how was this patchset tested.
> It would be nice to have this info in commit log.
> Since this is an optimization patch it is strange
> to see one with no numbers at all.
> For example, you probably run some workload and
> played with the balloon, and then saw less huge pages
> without the patch and more with?
> Please put this info in the cover letter.

Will do it, for sure. As soon as we get closer to an agreement on how the code
has to behave and looks like. I'll use Mel's mmtests bench suite for that.

Cheers!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility
  2012-08-26  7:55   ` Michael S. Tsirkin
@ 2012-08-27 20:28     ` Rafael Aquini
  2012-08-28 15:23       ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael Aquini @ 2012-08-27 20:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Sun, Aug 26, 2012 at 10:55:58AM +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 25, 2012 at 02:24:56AM -0300, Rafael Aquini wrote:
> > Memory fragmentation introduced by ballooning might reduce significantly
> > the number of 2MB contiguous memory blocks that can be used within a guest,
> > thus imposing performance penalties associated with the reduced number of
> > transparent huge pages that could be used by the guest workload.
> > 
> > This patch introduces a common interface to help a balloon driver on
> > making its page set movable to compaction, and thus allowing the system
> > to better leverage the compation efforts on memory defragmentation.
> > 
> > Signed-off-by: Rafael Aquini <aquini@redhat.com>
> 
> Tons of rcu uses but not sync in sight. This looks suspicious.

There's no critical section marked with rcu_read_lock/rcu_read_unlock. that's
why there's no call for sync anywhere. As we are behaving mostly as updaters,
the hole rcu usage is awkward and it's placed basically to enforce the proper
order. To avoid hurting the RCU API usage with this awk approach I'll drop it
for the next series submission (it will use barriers instead).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility
  2012-08-27 20:28     ` Rafael Aquini
@ 2012-08-28 15:23       ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-08-28 15:23 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Mon, Aug 27, 2012 at 05:28:35PM -0300, Rafael Aquini wrote:
> On Sun, Aug 26, 2012 at 10:55:58AM +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 25, 2012 at 02:24:56AM -0300, Rafael Aquini wrote:
> > > Memory fragmentation introduced by ballooning might reduce significantly
> > > the number of 2MB contiguous memory blocks that can be used within a guest,
> > > thus imposing performance penalties associated with the reduced number of
> > > transparent huge pages that could be used by the guest workload.
> > > 
> > > This patch introduces a common interface to help a balloon driver on
> > > making its page set movable to compaction, and thus allowing the system
> > > to better leverage the compation efforts on memory defragmentation.
> > > 
> > > Signed-off-by: Rafael Aquini <aquini@redhat.com>
> > 
> > Tons of rcu uses but not sync in sight. This looks suspicious.
> 
> There's no critical section marked with rcu_read_lock/rcu_read_unlock. that's
> why there's no call for sync anywhere. As we are behaving mostly as updaters,
> the hole rcu usage is awkward and it's placed basically to enforce the proper
> order. To avoid hurting the RCU API usage with this awk approach I'll drop it
> for the next series submission (it will use barriers instead).

If everything is under page lock, barriers are likely not required.
If not, they might not be sufficient.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages
  2012-08-27 19:47     ` Rafael Aquini
@ 2012-08-28 15:54       ` Michael S. Tsirkin
  2012-08-28 17:37         ` Rafael Aquini
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-08-28 15:54 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Mon, Aug 27, 2012 at 04:47:13PM -0300, Rafael Aquini wrote:
> On Sun, Aug 26, 2012 at 10:42:44AM +0300, Michael S. Tsirkin wrote:
> > 
> > Reading two atomics and doing math? Result can even be negative.
> > I did not look at use closely but it looks suspicious.
> Doc on atomic_read says:
> "
> The read is atomic in that the return value is guaranteed to be one of the
> values initialized or modified with the interface operations if a proper
> implicit or explicit memory barrier is used after possible runtime
> initialization by any other thread and the value is modified only with the
> interface operations.
> "
> 
> There's no runtime init by other thread than balloon's itself at device register,
> and the operations (inc, dec) are made by the proper interface operations
> only when protected by the spinlock pages_lock. It does not look suspicious, IMHO.

Any use of multiple atomics is suspicious.
Please just avoid it if you can. What's wrong with locking?

> I'm failing to see how it could become a negative on that case, since you cannot
> isolate more pages than what was previoulsy inflated to balloon's list.

There is no order guarantee. So in
A - B you can read B long after both A and B has been incremented.
Maybe it is safe in this case but it needs careful documentation
to explain how ordering works. Much easier to keep it all simple.

> 
> > It's already the case everywhere except __wait_on_isolated_pages,
> > so just fix that, and then we can keep using int instead of atomics.
> > 
> Sorry, I quite didn't get you here. fix what?

It's in the text you removed above. Access values under lock.

>  
> > That's 1K on stack - and can become more if we increase
> > VIRTIO_BALLOON_ARRAY_PFNS_MAX.  Probably too much - this is the reason
> > we use vb->pfns.
> >
> If we want to use vb->pfns we'll have to make leak_balloon mutual exclusive with
> page migration (as it was before), but that will inevictably bring us back to
> the discussion on breaking the loop when isolated pages make leak_balloon find
> less pages than it wants to release at each leak round.
> 

I don't think this is an issue. The issue was busy waiting in that case.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages
  2012-08-28 15:54       ` Michael S. Tsirkin
@ 2012-08-28 17:37         ` Rafael Aquini
  2012-08-28 17:57           ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael Aquini @ 2012-08-28 17:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Tue, Aug 28, 2012 at 06:54:10PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 27, 2012 at 04:47:13PM -0300, Rafael Aquini wrote:
> > On Sun, Aug 26, 2012 at 10:42:44AM +0300, Michael S. Tsirkin wrote:
> > > 
> > > Reading two atomics and doing math? Result can even be negative.
> > > I did not look at use closely but it looks suspicious.
> > Doc on atomic_read says:
> > "
> > The read is atomic in that the return value is guaranteed to be one of the
> > values initialized or modified with the interface operations if a proper
> > implicit or explicit memory barrier is used after possible runtime
> > initialization by any other thread and the value is modified only with the
> > interface operations.
> > "
> > 
> > There's no runtime init by other thread than balloon's itself at device register,
> > and the operations (inc, dec) are made by the proper interface operations
> > only when protected by the spinlock pages_lock. It does not look suspicious, IMHO.
> 
> Any use of multiple atomics is suspicious.
> Please just avoid it if you can. What's wrong with locking?
> 
> > I'm failing to see how it could become a negative on that case, since you cannot
> > isolate more pages than what was previoulsy inflated to balloon's list.
> 
> There is no order guarantee. So in
> A - B you can read B long after both A and B has been incremented.
> Maybe it is safe in this case but it needs careful documentation
> to explain how ordering works. Much easier to keep it all simple.
> 
> > 
> > > It's already the case everywhere except __wait_on_isolated_pages,
> > > so just fix that, and then we can keep using int instead of atomics.
> > > 
> > Sorry, I quite didn't get you here. fix what?
> 
> It's in the text you removed above. Access values under lock.
>

So, you prefer this way:

/*
 * __wait_on_isolated_pages - check if leak_balloon() must wait on isolated
 *                            pages before proceeding with the page release.
 * @vb         : pointer to the struct virtio_balloon describing this device.
 * @leak_target: how many pages we are attempting to release this round.
 */
static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
                                            size_t leak_target)
{
        unsigned int num_pages, isolated_pages;
        spin_lock(&vb->pages_lock);
        num_pages = vb->num_pages;
        isolated_pages = vb->num_isolated_pages;
        spin_unlock(&vb->pages_lock);
        /*
         * If isolated pages are making our leak target bigger than the
         * total pages that we can release this round. Let's wait for
         * migration returning enough pages back to balloon's list.
         */
        wait_event(vb->config_change,
                   (!isolated_pages ||
                    leak_target <= (num_pages - isolated_pages)));
}

?

> >  
> > > That's 1K on stack - and can become more if we increase
> > > VIRTIO_BALLOON_ARRAY_PFNS_MAX.  Probably too much - this is the reason
> > > we use vb->pfns.
> > >
> > If we want to use vb->pfns we'll have to make leak_balloon mutual exclusive with
> > page migration (as it was before), but that will inevictably bring us back to
> > the discussion on breaking the loop when isolated pages make leak_balloon find
> > less pages than it wants to release at each leak round.
> > 
> 
> I don't think this is an issue. The issue was busy waiting in that case.
>
But, in fact, it is. 
As we couldn't drop the mutex that prevents migration from happening, otherwise
the migration threads would screw up with our vb->pfns array, there will be no point
on keep waiting for isolated pages being reinserted on balloon's list, cause the
migration threads that will accomplish that task are also waiting on us dropping
the mutex.

You may argue that we could flag virtballoon_migratepage() to give up and return
before even trying to aquire the mutex, if a leak is ongoing -- deferring work
to virtballoon_putbackpage(). However, I'm eager to think that for this case,
the CPU time we spent isolating pages for compaction would be simply wasted and,
 perhaps, no effective compaction was even reached.
And that makes me think it would have been better to stick with the old logics of
breaking the loop since leak_balloon(), originally, also remains busy waiting
while pursuing its target, anyway.

That's the trade here, IMO. If one really wants to wait on potentially isolated
pages getting back to the list before proceeding, we'll have to burn a little
more stack space with local variables, unfortunately.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages
  2012-08-28 17:37         ` Rafael Aquini
@ 2012-08-28 17:57           ` Michael S. Tsirkin
  2012-08-28 18:05             ` Rafael Aquini
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-08-28 17:57 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Tue, Aug 28, 2012 at 02:37:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 28, 2012 at 06:54:10PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Aug 27, 2012 at 04:47:13PM -0300, Rafael Aquini wrote:
> > > On Sun, Aug 26, 2012 at 10:42:44AM +0300, Michael S. Tsirkin wrote:
> > > > 
> > > > Reading two atomics and doing math? Result can even be negative.
> > > > I did not look at use closely but it looks suspicious.
> > > Doc on atomic_read says:
> > > "
> > > The read is atomic in that the return value is guaranteed to be one of the
> > > values initialized or modified with the interface operations if a proper
> > > implicit or explicit memory barrier is used after possible runtime
> > > initialization by any other thread and the value is modified only with the
> > > interface operations.
> > > "
> > > 
> > > There's no runtime init by other thread than balloon's itself at device register,
> > > and the operations (inc, dec) are made by the proper interface operations
> > > only when protected by the spinlock pages_lock. It does not look suspicious, IMHO.
> > 
> > Any use of multiple atomics is suspicious.
> > Please just avoid it if you can. What's wrong with locking?
> > 
> > > I'm failing to see how it could become a negative on that case, since you cannot
> > > isolate more pages than what was previoulsy inflated to balloon's list.
> > 
> > There is no order guarantee. So in
> > A - B you can read B long after both A and B has been incremented.
> > Maybe it is safe in this case but it needs careful documentation
> > to explain how ordering works. Much easier to keep it all simple.
> > 
> > > 
> > > > It's already the case everywhere except __wait_on_isolated_pages,
> > > > so just fix that, and then we can keep using int instead of atomics.
> > > > 
> > > Sorry, I quite didn't get you here. fix what?
> > 
> > It's in the text you removed above. Access values under lock.
> >
> 
> So, you prefer this way:
> 
> /*
>  * __wait_on_isolated_pages - check if leak_balloon() must wait on isolated
>  *                            pages before proceeding with the page release.
>  * @vb         : pointer to the struct virtio_balloon describing this device.
>  * @leak_target: how many pages we are attempting to release this round.
>  */
> static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
>                                             size_t leak_target)
> {
>         unsigned int num_pages, isolated_pages;
>         spin_lock(&vb->pages_lock);
>         num_pages = vb->num_pages;
>         isolated_pages = vb->num_isolated_pages;
>         spin_unlock(&vb->pages_lock);
>         /*
>          * If isolated pages are making our leak target bigger than the
>          * total pages that we can release this round. Let's wait for
>          * migration returning enough pages back to balloon's list.
>          */
>         wait_event(vb->config_change,
>                    (!isolated_pages ||
>                     leak_target <= (num_pages - isolated_pages)));

This logic looks strange too - it does not 100% match the comment.

> }
> 
> ?

Except that it does not work. You need to do the lock/unlock
dance and retest within wait_event.


> > >  
> > > > That's 1K on stack - and can become more if we increase
> > > > VIRTIO_BALLOON_ARRAY_PFNS_MAX.  Probably too much - this is the reason
> > > > we use vb->pfns.
> > > >
> > > If we want to use vb->pfns we'll have to make leak_balloon mutual exclusive with
> > > page migration (as it was before), but that will inevictably bring us back to
> > > the discussion on breaking the loop when isolated pages make leak_balloon find
> > > less pages than it wants to release at each leak round.
> > > 
> > 
> > I don't think this is an issue. The issue was busy waiting in that case.
> >
> But, in fact, it is. 
> As we couldn't drop the mutex that prevents migration from happening, otherwise
> the migration threads would screw up with our vb->pfns array, there will be no point
> on keep waiting for isolated pages being reinserted on balloon's list, cause the
> migration threads that will accomplish that task are also waiting on us dropping
> the mutex.
> 
> You may argue that we could flag virtballoon_migratepage() to give up and return
> before even trying to aquire the mutex, if a leak is ongoing -- deferring work
> to virtballoon_putbackpage(). However, I'm eager to think that for this case,
> the CPU time we spent isolating pages for compaction would be simply wasted and,
>  perhaps, no effective compaction was even reached.
> And that makes me think it would have been better to stick with the old logics of
> breaking the loop since leak_balloon(), originally, also remains busy waiting
> while pursuing its target, anyway.
> 
> That's the trade here, IMO. If one really wants to wait on potentially isolated
> pages getting back to the list before proceeding, we'll have to burn a little
> more stack space with local variables, unfortunately.


Sorry I do not understand what you are saying here. So find
a different locking strategy.

For example something like:

         wait_event(vb->config_change,
		({ 
		   lock
		   if (target <= (num_pages - isolated_pages))
			   leak balloon
		   cond = target <= (num_pages - isolated_pages));
		   unlock;
		   cond;
		})
		)

seems to have no issues?

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages
  2012-08-28 17:57           ` Michael S. Tsirkin
@ 2012-08-28 18:05             ` Rafael Aquini
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael Aquini @ 2012-08-28 18:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, linux-kernel, virtualization, Rusty Russell,
	Rik van Riel, Mel Gorman, Andi Kleen, Andrew Morton,
	Konrad Rzeszutek Wilk, Minchan Kim, Peter Zijlstra,
	Paul E. McKenney

On Tue, Aug 28, 2012 at 08:57:16PM +0300, Michael S. Tsirkin wrote:
> Sorry I do not understand what you are saying here. So find
> a different locking strategy.
> 
> For example something like:
> 
>          wait_event(vb->config_change,
> 		({ 
> 		   lock
> 		   if (target <= (num_pages - isolated_pages))
> 			   leak balloon
> 		   cond = target <= (num_pages - isolated_pages));
> 		   unlock;
> 		   cond;
> 		})
> 		)
> 
> seems to have no issues?

Ok, I see what you mean. I'll change it.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-08-28 18:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-25  5:24 [PATCH v9 0/5] make balloon pages movable by compaction Rafael Aquini
2012-08-25  5:24 ` [PATCH v9 1/5] mm: introduce a common interface for balloon pages mobility Rafael Aquini
2012-08-26  7:55   ` Michael S. Tsirkin
2012-08-27 20:28     ` Rafael Aquini
2012-08-28 15:23       ` Michael S. Tsirkin
2012-08-25  5:24 ` [PATCH v9 2/5] mm: introduce compaction and migration for ballooned pages Rafael Aquini
2012-08-25  5:24 ` [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages Rafael Aquini
2012-08-26  7:42   ` Michael S. Tsirkin
2012-08-27 19:47     ` Rafael Aquini
2012-08-28 15:54       ` Michael S. Tsirkin
2012-08-28 17:37         ` Rafael Aquini
2012-08-28 17:57           ` Michael S. Tsirkin
2012-08-28 18:05             ` Rafael Aquini
2012-08-25  5:24 ` [PATCH v9 4/5] mm: introduce putback_movable_pages() Rafael Aquini
2012-08-25  5:25 ` [PATCH v9 5/5] mm: add vm event counters for balloon pages compaction Rafael Aquini
2012-08-26  7:58 ` [PATCH v9 0/5] make balloon pages movable by compaction Michael S. Tsirkin
2012-08-26 14:40   ` Rik van Riel
2012-08-26 15:44     ` Michael S. Tsirkin
2012-08-27 20:22       ` Rafael Aquini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).