All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] vmw_balloon: compaction and shrinker support
@ 2019-03-28  1:07 Nadav Amit
  2019-03-28  1:07 ` [PATCH v2 1/4] mm/balloon_compaction: list interfaces Nadav Amit
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Nadav Amit @ 2019-03-28  1:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann
  Cc: Michael S. Tsirkin, Jason Wang, virtualization, linux-mm,
	linux-kernel, VMware, Inc.,
	Julien Freche, Nadav Amit, Nadav Amit

VMware balloon enhancements: adding support for memory compaction,
memory shrinker (to prevent OOM) and splitting of refused pages to
prevent recurring inflations.

Patches 1-2: Support for compaction
Patch 3: Support for memory shrinker - disabled by default
Patch 4: Split refused pages to improve performance

v1->v2:
* Return number of pages in list enqueue/dequeue interfaces [Michael]
* Removed first two patches which were already merged

Nadav Amit (4):
  mm/balloon_compaction: list interfaces
  vmw_balloon: compaction support
  vmw_balloon: add memory shrinker
  vmw_balloon: split refused pages

 drivers/misc/Kconfig               |   1 +
 drivers/misc/vmw_balloon.c         | 489 ++++++++++++++++++++++++++---
 include/linux/balloon_compaction.h |   4 +
 mm/balloon_compaction.c            | 145 ++++++---
 4 files changed, 554 insertions(+), 85 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-03-28  1:07 [PATCH v2 0/4] vmw_balloon: compaction and shrinker support Nadav Amit
@ 2019-03-28  1:07 ` Nadav Amit
  2019-04-08 17:35     ` Nadav Amit
                     ` (2 more replies)
  2019-03-28  1:07 ` [PATCH v2 2/4] vmw_balloon: compaction support Nadav Amit
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 19+ messages in thread
From: Nadav Amit @ 2019-03-28  1:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann
  Cc: Michael S. Tsirkin, Jason Wang, virtualization, linux-mm,
	linux-kernel, VMware, Inc.,
	Julien Freche, Nadav Amit, Nadav Amit

Introduce interfaces for ballooning enqueueing and dequeueing of a list
of pages. These interfaces reduce the overhead of storing and restoring
IRQs by batching the operations. In addition they do not panic if the
list of pages is empty.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: linux-mm@kvack.org
Cc: virtualization@lists.linux-foundation.org
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 include/linux/balloon_compaction.h |   4 +
 mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
 2 files changed, 111 insertions(+), 38 deletions(-)

diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index f111c780ef1d..1da79edadb69 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
 extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
 				 struct page *page);
 extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
+extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
+				      struct list_head *pages);
+extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
+				     struct list_head *pages, int n_req_pages);
 
 static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
 {
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index ef858d547e2d..88d5d9a01072 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -10,6 +10,106 @@
 #include <linux/export.h>
 #include <linux/balloon_compaction.h>
 
+static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
+				     struct page *page)
+{
+	/*
+	 * Block others from accessing the 'page' when we get around to
+	 * establishing additional references. We should be the only one
+	 * holding a reference to the 'page' at this point.
+	 */
+	if (!trylock_page(page)) {
+		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
+		return -EFAULT;
+	}
+	list_del(&page->lru);
+	balloon_page_insert(b_dev_info, page);
+	unlock_page(page);
+	__count_vm_event(BALLOON_INFLATE);
+	return 0;
+}
+
+/**
+ * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
+ *				 list.
+ * @b_dev_info: balloon device descriptor where we will insert a new page to
+ * @pages: pages to enqueue - allocated using balloon_page_alloc.
+ *
+ * Driver must call it to properly enqueue a balloon pages before definitively
+ * removing it from the guest system.
+ *
+ * Return: number of pages that were enqueued.
+ */
+size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
+			       struct list_head *pages)
+{
+	struct page *page, *tmp;
+	unsigned long flags;
+	size_t n_pages = 0;
+
+	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+	list_for_each_entry_safe(page, tmp, pages, lru) {
+		balloon_page_enqueue_one(b_dev_info, page);
+		n_pages++;
+	}
+	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+	return n_pages;
+}
+EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
+
+/**
+ * balloon_page_list_dequeue() - removes pages from balloon's page list and
+ *				 returns a list of the pages.
+ * @b_dev_info: balloon device decriptor where we will grab a page from.
+ * @pages: pointer to the list of pages that would be returned to the caller.
+ * @n_req_pages: number of requested pages.
+ *
+ * Driver must call it to properly de-allocate a previous enlisted balloon pages
+ * before definetively releasing it back to the guest system. This function
+ * tries to remove @n_req_pages from the ballooned pages and return it to the
+ * caller in the @pages list.
+ *
+ * Note that this function may fail to dequeue some pages temporarily empty due
+ * to compaction isolated pages.
+ *
+ * Return: number of pages that were added to the @pages list.
+ */
+size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
+				 struct list_head *pages, int n_req_pages)
+{
+	struct page *page, *tmp;
+	unsigned long flags;
+	size_t n_pages = 0;
+
+	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
+		/*
+		 * Block others from accessing the 'page' while we get around
+		 * establishing additional references and preparing the 'page'
+		 * to be released by the balloon driver.
+		 */
+		if (!trylock_page(page))
+			continue;
+
+		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
+		    PageIsolated(page)) {
+			/* raced with isolation */
+			unlock_page(page);
+			continue;
+		}
+		balloon_page_delete(page);
+		__count_vm_event(BALLOON_DEFLATE);
+		unlock_page(page);
+		list_add(&page->lru, pages);
+		if (++n_pages >= n_req_pages)
+			break;
+	}
+	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+
+	return n_pages;
+}
+EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
+
 /*
  * balloon_page_alloc - allocates a new page for insertion into the balloon
  *			  page list.
@@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
 {
 	unsigned long flags;
 
-	/*
-	 * Block others from accessing the 'page' when we get around to
-	 * establishing additional references. We should be the only one
-	 * holding a reference to the 'page' at this point.
-	 */
-	BUG_ON(!trylock_page(page));
 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
-	balloon_page_insert(b_dev_info, page);
-	__count_vm_event(BALLOON_INFLATE);
+	balloon_page_enqueue_one(b_dev_info, page);
 	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
-	unlock_page(page);
 }
 EXPORT_SYMBOL_GPL(balloon_page_enqueue);
 
@@ -70,36 +162,13 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
  */
 struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 {
-	struct page *page, *tmp;
 	unsigned long flags;
-	bool dequeued_page;
+	LIST_HEAD(pages);
+	int n_pages;
 
-	dequeued_page = false;
-	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
-	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
-		/*
-		 * Block others from accessing the 'page' while we get around
-		 * establishing additional references and preparing the 'page'
-		 * to be released by the balloon driver.
-		 */
-		if (trylock_page(page)) {
-#ifdef CONFIG_BALLOON_COMPACTION
-			if (PageIsolated(page)) {
-				/* raced with isolation */
-				unlock_page(page);
-				continue;
-			}
-#endif
-			balloon_page_delete(page);
-			__count_vm_event(BALLOON_DEFLATE);
-			unlock_page(page);
-			dequeued_page = true;
-			break;
-		}
-	}
-	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+	n_pages = balloon_page_list_dequeue(b_dev_info, &pages, 1);
 
-	if (!dequeued_page) {
+	if (n_pages != 1) {
 		/*
 		 * If we are unable to dequeue a balloon page because the page
 		 * list is empty and there is no isolated pages, then something
@@ -112,9 +181,9 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 			     !b_dev_info->isolated_pages))
 			BUG();
 		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
-		page = NULL;
+		return NULL;
 	}
-	return page;
+	return list_first_entry(&pages, struct page, lru);
 }
 EXPORT_SYMBOL_GPL(balloon_page_dequeue);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/4] vmw_balloon: compaction support
  2019-03-28  1:07 [PATCH v2 0/4] vmw_balloon: compaction and shrinker support Nadav Amit
  2019-03-28  1:07 ` [PATCH v2 1/4] mm/balloon_compaction: list interfaces Nadav Amit
@ 2019-03-28  1:07 ` Nadav Amit
  2019-03-28  1:07 ` [PATCH v2 3/4] vmw_balloon: add memory shrinker Nadav Amit
  2019-03-28  1:07 ` [PATCH v2 4/4] vmw_balloon: split refused pages Nadav Amit
  3 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-03-28  1:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann
  Cc: Michael S. Tsirkin, Jason Wang, virtualization, linux-mm,
	linux-kernel, VMware, Inc.,
	Julien Freche, Nadav Amit, Nadav Amit

Add support for compaction for VMware balloon. Since unlike the virtio
balloon, we also support huge-pages, which are not going through
compaction, we keep these pages in vmballoon and handle this list
separately. We use the same lock to protect both lists, as this lock is
not supposed to be contended.

Doing so also eliminates the need for the page_size lists. We update the
accounting as needed to reflect inflation, deflation and migration to be
reflected in vmstat.

Since VMware balloon now provides statistics for inflation, deflation
and migration in vmstat, select MEMORY_BALLOON in Kconfig.

Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 drivers/misc/Kconfig       |   1 +
 drivers/misc/vmw_balloon.c | 301 ++++++++++++++++++++++++++++++++-----
 2 files changed, 264 insertions(+), 38 deletions(-)

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 42ab8ec92a04..427cf10579b4 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -420,6 +420,7 @@ config SPEAR13XX_PCIE_GADGET
 config VMWARE_BALLOON
 	tristate "VMware Balloon Driver"
 	depends on VMWARE_VMCI && X86 && HYPERVISOR_GUEST
+	select MEMORY_BALLOON
 	help
 	  This is VMware physical memory management driver which acts
 	  like a "balloon" that can be inflated to reclaim physical pages
diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index ad807d5a3141..2136f6ad97d3 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -28,6 +28,8 @@
 #include <linux/rwsem.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/mount.h>
+#include <linux/balloon_compaction.h>
 #include <linux/vmw_vmci_defs.h>
 #include <linux/vmw_vmci_api.h>
 #include <asm/hypervisor.h>
@@ -38,25 +40,11 @@ MODULE_ALIAS("dmi:*:svnVMware*:*");
 MODULE_ALIAS("vmware_vmmemctl");
 MODULE_LICENSE("GPL");
 
-/*
- * Use __GFP_HIGHMEM to allow pages from HIGHMEM zone. We don't allow wait
- * (__GFP_RECLAIM) for huge page allocations. Use __GFP_NOWARN, to suppress page
- * allocation failure warnings. Disallow access to emergency low-memory pools.
- */
-#define VMW_HUGE_PAGE_ALLOC_FLAGS	(__GFP_HIGHMEM|__GFP_NOWARN|	\
-					 __GFP_NOMEMALLOC)
-
-/*
- * Use __GFP_HIGHMEM to allow pages from HIGHMEM zone. We allow lightweight
- * reclamation (__GFP_NORETRY). Use __GFP_NOWARN, to suppress page allocation
- * failure warnings. Disallow access to emergency low-memory pools.
- */
-#define VMW_PAGE_ALLOC_FLAGS		(__GFP_HIGHMEM|__GFP_NOWARN|	\
-					 __GFP_NOMEMALLOC|__GFP_NORETRY)
-
-/* Maximum number of refused pages we accumulate during inflation cycle */
 #define VMW_BALLOON_MAX_REFUSED		16
 
+/* Magic number for the balloon mount-point */
+#define BALLOON_VMW_MAGIC		0x0ba11007
+
 /*
  * Hypervisor communication port definitions.
  */
@@ -247,11 +235,6 @@ struct vmballoon_ctl {
 	enum vmballoon_op op;
 };
 
-struct vmballoon_page_size {
-	/* list of reserved physical pages */
-	struct list_head pages;
-};
-
 /**
  * struct vmballoon_batch_entry - a batch entry for lock or unlock.
  *
@@ -266,8 +249,6 @@ struct vmballoon_batch_entry {
 } __packed;
 
 struct vmballoon {
-	struct vmballoon_page_size page_sizes[VMW_BALLOON_NUM_PAGE_SIZES];
-
 	/**
 	 * @max_page_size: maximum supported page size for ballooning.
 	 *
@@ -348,8 +329,20 @@ struct vmballoon {
 	struct dentry *dbg_entry;
 #endif
 
+	/**
+	 * @b_dev_info: balloon device information descriptor.
+	 */
+	struct balloon_dev_info b_dev_info;
+
 	struct delayed_work dwork;
 
+	/**
+	 * @huge_pages - list of the inflated 2MB pages.
+	 *
+	 * Protected by @b_dev_info.pages_lock .
+	 */
+	struct list_head huge_pages;
+
 	/**
 	 * @vmci_doorbell.
 	 *
@@ -643,10 +636,10 @@ static int vmballoon_alloc_page_list(struct vmballoon *b,
 
 	for (i = 0; i < req_n_pages; i++) {
 		if (ctl->page_size == VMW_BALLOON_2M_PAGE)
-			page = alloc_pages(VMW_HUGE_PAGE_ALLOC_FLAGS,
-					   VMW_BALLOON_2M_ORDER);
+			page = alloc_pages(__GFP_HIGHMEM|__GFP_NOWARN|
+					__GFP_NOMEMALLOC, VMW_BALLOON_2M_ORDER);
 		else
-			page = alloc_page(VMW_PAGE_ALLOC_FLAGS);
+			page = balloon_page_alloc();
 
 		/* Update statistics */
 		vmballoon_stats_page_inc(b, VMW_BALLOON_PAGE_STAT_ALLOC,
@@ -961,9 +954,22 @@ static void vmballoon_enqueue_page_list(struct vmballoon *b,
 					unsigned int *n_pages,
 					enum vmballoon_page_size_type page_size)
 {
-	struct vmballoon_page_size *page_size_info = &b->page_sizes[page_size];
+	unsigned long flags;
+
+	if (page_size == VMW_BALLOON_4K_PAGE) {
+		balloon_page_list_enqueue(&b->b_dev_info, pages);
+	} else {
+		/*
+		 * Keep the huge pages in a local list which is not available
+		 * for the balloon compaction mechanism.
+		 */
+		spin_lock_irqsave(&b->b_dev_info.pages_lock, flags);
+		list_splice_init(pages, &b->huge_pages);
+		__count_vm_events(BALLOON_INFLATE, *n_pages *
+				  vmballoon_page_in_frames(VMW_BALLOON_2M_PAGE));
+		spin_unlock_irqrestore(&b->b_dev_info.pages_lock, flags);
+	}
 
-	list_splice_init(pages, &page_size_info->pages);
 	*n_pages = 0;
 }
 
@@ -986,15 +992,28 @@ static void vmballoon_dequeue_page_list(struct vmballoon *b,
 					enum vmballoon_page_size_type page_size,
 					unsigned int n_req_pages)
 {
-	struct vmballoon_page_size *page_size_info = &b->page_sizes[page_size];
 	struct page *page, *tmp;
 	unsigned int i = 0;
+	unsigned long flags;
 
-	list_for_each_entry_safe(page, tmp, &page_size_info->pages, lru) {
+	/* In the case of 4k pages, use the compaction infrastructure */
+	if (page_size == VMW_BALLOON_4K_PAGE) {
+		*n_pages = balloon_page_list_dequeue(&b->b_dev_info, pages,
+						     n_req_pages);
+		return;
+	}
+
+	/* 2MB pages */
+	spin_lock_irqsave(&b->b_dev_info.pages_lock, flags);
+	list_for_each_entry_safe(page, tmp, &b->huge_pages, lru) {
 		list_move(&page->lru, pages);
 		if (++i == n_req_pages)
 			break;
 	}
+
+	__count_vm_events(BALLOON_DEFLATE,
+			  i * vmballoon_page_in_frames(VMW_BALLOON_2M_PAGE));
+	spin_unlock_irqrestore(&b->b_dev_info.pages_lock, flags);
 	*n_pages = i;
 }
 
@@ -1552,9 +1571,204 @@ static inline void vmballoon_debugfs_exit(struct vmballoon *b)
 
 #endif	/* CONFIG_DEBUG_FS */
 
+
+#ifdef CONFIG_BALLOON_COMPACTION
+
+static struct dentry *vmballoon_mount(struct file_system_type *fs_type,
+				      int flags, const char *dev_name,
+				      void *data)
+{
+	static const struct dentry_operations ops = {
+		.d_dname = simple_dname,
+	};
+
+	return mount_pseudo(fs_type, "balloon-vmware:", NULL, &ops,
+			    BALLOON_VMW_MAGIC);
+}
+
+static struct file_system_type vmballoon_fs = {
+	.name           = "balloon-vmware",
+	.mount          = vmballoon_mount,
+	.kill_sb        = kill_anon_super,
+};
+
+static struct vfsmount *vmballoon_mnt;
+
+/**
+ * vmballoon_migratepage() - migrates a balloon page.
+ * @b_dev_info: balloon device information descriptor.
+ * @newpage: the page to which @page should be migrated.
+ * @page: a ballooned page that should be migrated.
+ * @mode: migration mode, ignored.
+ *
+ * This function is really open-coded, but that is according to the interface
+ * that balloon_compaction provides.
+ *
+ * Return: zero on success, -EAGAIN when migration cannot be performed
+ *	   momentarily, and -EBUSY if migration failed and should be retried
+ *	   with that specific page.
+ */
+static int vmballoon_migratepage(struct balloon_dev_info *b_dev_info,
+				 struct page *newpage, struct page *page,
+				 enum migrate_mode mode)
+{
+	unsigned long status, flags;
+	struct vmballoon *b;
+	int ret;
+
+	b = container_of(b_dev_info, struct vmballoon, b_dev_info);
+
+	/*
+	 * If the semaphore is taken, there is ongoing configuration change
+	 * (i.e., balloon reset), so try again.
+	 */
+	if (!down_read_trylock(&b->conf_sem))
+		return -EAGAIN;
+
+	spin_lock(&b->comm_lock);
+	/*
+	 * We must start by deflating and not inflating, as otherwise the
+	 * hypervisor may tell us that it has enough memory and the new page is
+	 * not needed. Since the old page is isolated, we cannot use the list
+	 * interface to unlock it, as the LRU field is used for isolation.
+	 * Instead, we use the native interface directly.
+	 */
+	vmballoon_add_page(b, 0, page);
+	status = vmballoon_lock_op(b, 1, VMW_BALLOON_4K_PAGE,
+				   VMW_BALLOON_DEFLATE);
+
+	if (status == VMW_BALLOON_SUCCESS)
+		status = vmballoon_status_page(b, 0, &page);
+
+	/*
+	 * If a failure happened, let the migration mechanism know that it
+	 * should not retry.
+	 */
+	if (status != VMW_BALLOON_SUCCESS) {
+		spin_unlock(&b->comm_lock);
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	/*
+	 * The page is isolated, so it is safe to delete it without holding
+	 * @pages_lock . We keep holding @comm_lock since we will need it in a
+	 * second.
+	 */
+	balloon_page_delete(page);
+
+	put_page(page);
+
+	/* Inflate */
+	vmballoon_add_page(b, 0, newpage);
+	status = vmballoon_lock_op(b, 1, VMW_BALLOON_4K_PAGE,
+				   VMW_BALLOON_INFLATE);
+
+	if (status == VMW_BALLOON_SUCCESS)
+		status = vmballoon_status_page(b, 0, &newpage);
+
+	spin_unlock(&b->comm_lock);
+
+	if (status != VMW_BALLOON_SUCCESS) {
+		/*
+		 * A failure happened. While we can deflate the page we just
+		 * inflated, this deflation can also encounter an error. Instead
+		 * we will decrease the size of the balloon to reflect the
+		 * change and report failure.
+		 */
+		atomic64_dec(&b->size);
+		ret = -EBUSY;
+	} else {
+		/*
+		 * Success. Take a reference for the page, and we will add it to
+		 * the list after acquiring the lock.
+		 */
+		get_page(newpage);
+		ret = MIGRATEPAGE_SUCCESS;
+	}
+
+	/* Update the balloon list under the @pages_lock */
+	spin_lock_irqsave(&b->b_dev_info.pages_lock, flags);
+
+	/*
+	 * On inflation success, we already took a reference for the @newpage.
+	 * If we succeed just insert it to the list and update the statistics
+	 * under the lock.
+	 */
+	if (ret == MIGRATEPAGE_SUCCESS) {
+		balloon_page_insert(&b->b_dev_info, newpage);
+		__count_vm_event(BALLOON_MIGRATE);
+	}
+
+	/*
+	 * We deflated successfully, so regardless to the inflation success, we
+	 * need to reduce the number of isolated_pages.
+	 */
+	b->b_dev_info.isolated_pages--;
+	spin_unlock_irqrestore(&b->b_dev_info.pages_lock, flags);
+
+out_unlock:
+	up_read(&b->conf_sem);
+	return ret;
+}
+
+/**
+ * vmballoon_compaction_deinit() - removes compaction related data.
+ *
+ * @b: pointer to the balloon.
+ */
+static void vmballoon_compaction_deinit(struct vmballoon *b)
+{
+	if (!IS_ERR(b->b_dev_info.inode))
+		iput(b->b_dev_info.inode);
+
+	b->b_dev_info.inode = NULL;
+	kern_unmount(vmballoon_mnt);
+	vmballoon_mnt = NULL;
+}
+
+/**
+ * vmballoon_compaction_init() - initialized compaction for the balloon.
+ *
+ * @b: pointer to the balloon.
+ *
+ * If during the initialization a failure occurred, this function does not
+ * perform cleanup. The caller must call vmballoon_compaction_deinit() in this
+ * case.
+ *
+ * Return: zero on success or error code on failure.
+ */
+static __init int vmballoon_compaction_init(struct vmballoon *b)
+{
+	vmballoon_mnt = kern_mount(&vmballoon_fs);
+	if (IS_ERR(vmballoon_mnt))
+		return PTR_ERR(vmballoon_mnt);
+
+	b->b_dev_info.migratepage = vmballoon_migratepage;
+	b->b_dev_info.inode = alloc_anon_inode(vmballoon_mnt->mnt_sb);
+
+	if (IS_ERR(b->b_dev_info.inode))
+		return PTR_ERR(b->b_dev_info.inode);
+
+	b->b_dev_info.inode->i_mapping->a_ops = &balloon_aops;
+	return 0;
+}
+
+#else /* CONFIG_BALLOON_COMPACTION */
+
+static void vmballoon_compaction_deinit(struct vmballoon *b)
+{
+}
+
+static int vmballoon_compaction_init(struct vmballoon *b)
+{
+	return 0;
+}
+
+#endif /* CONFIG_BALLOON_COMPACTION */
+
 static int __init vmballoon_init(void)
 {
-	enum vmballoon_page_size_type page_size;
 	int error;
 
 	/*
@@ -1564,17 +1778,22 @@ static int __init vmballoon_init(void)
 	if (x86_hyper_type != X86_HYPER_VMWARE)
 		return -ENODEV;
 
-	for (page_size = VMW_BALLOON_4K_PAGE;
-	     page_size <= VMW_BALLOON_LAST_SIZE; page_size++)
-		INIT_LIST_HEAD(&balloon.page_sizes[page_size].pages);
-
-
 	INIT_DELAYED_WORK(&balloon.dwork, vmballoon_work);
 
 	error = vmballoon_debugfs_init(&balloon);
 	if (error)
-		return error;
+		goto fail;
 
+	/*
+	 * Initialization of compaction must be done after the call to
+	 * balloon_devinfo_init() .
+	 */
+	balloon_devinfo_init(&balloon.b_dev_info);
+	error = vmballoon_compaction_init(&balloon);
+	if (error)
+		goto fail;
+
+	INIT_LIST_HEAD(&balloon.huge_pages);
 	spin_lock_init(&balloon.comm_lock);
 	init_rwsem(&balloon.conf_sem);
 	balloon.vmci_doorbell = VMCI_INVALID_HANDLE;
@@ -1585,6 +1804,9 @@ static int __init vmballoon_init(void)
 	queue_delayed_work(system_freezable_wq, &balloon.dwork, 0);
 
 	return 0;
+fail:
+	vmballoon_compaction_deinit(&balloon);
+	return error;
 }
 
 /*
@@ -1609,5 +1831,8 @@ static void __exit vmballoon_exit(void)
 	 */
 	vmballoon_send_start(&balloon, 0);
 	vmballoon_pop(&balloon);
+
+	/* Only once we popped the balloon, compaction can be deinit */
+	vmballoon_compaction_deinit(&balloon);
 }
 module_exit(vmballoon_exit);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/4] vmw_balloon: add memory shrinker
  2019-03-28  1:07 [PATCH v2 0/4] vmw_balloon: compaction and shrinker support Nadav Amit
  2019-03-28  1:07 ` [PATCH v2 1/4] mm/balloon_compaction: list interfaces Nadav Amit
  2019-03-28  1:07 ` [PATCH v2 2/4] vmw_balloon: compaction support Nadav Amit
@ 2019-03-28  1:07 ` Nadav Amit
  2019-03-28  1:07 ` [PATCH v2 4/4] vmw_balloon: split refused pages Nadav Amit
  3 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-03-28  1:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann
  Cc: Michael S. Tsirkin, Jason Wang, virtualization, linux-mm,
	linux-kernel, VMware, Inc.,
	Julien Freche, Nadav Amit, Nadav Amit

Add a shrinker to the VMware balloon to prevent out-of-memory events.
We reuse the deflate logic for this matter. Deadlocks should not happen,
as no memory allocation is performed while the locks of the
communication (batch/page) and page-list are taken. In the unlikely
event in which the configuration semaphore is taken for write we bail
out and fail gracefully (causing processes to be killed).

Once the shrinker is called, inflation is postponed for few seconds.
The timeout is updated without any lock, but this should not cause any
races, as it is written and read atomically.

This feature is disabled by default, since it might cause performance
degradation.

Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 drivers/misc/vmw_balloon.c | 133 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 131 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 2136f6ad97d3..59d3c0202dcc 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -40,6 +40,15 @@ MODULE_ALIAS("dmi:*:svnVMware*:*");
 MODULE_ALIAS("vmware_vmmemctl");
 MODULE_LICENSE("GPL");
 
+bool __read_mostly vmwballoon_shrinker_enable;
+module_param(vmwballoon_shrinker_enable, bool, 0444);
+MODULE_PARM_DESC(vmwballoon_shrinker_enable,
+	"Enable non-cooperative out-of-memory protection. Disabled by default as it may degrade performance.");
+
+/* Delay in seconds after shrink before inflation. */
+#define VMBALLOON_SHRINK_DELAY		(5)
+
+/* Maximum number of refused pages we accumulate during inflation cycle */
 #define VMW_BALLOON_MAX_REFUSED		16
 
 /* Magic number for the balloon mount-point */
@@ -217,12 +226,13 @@ enum vmballoon_stat_general {
 	VMW_BALLOON_STAT_TIMER,
 	VMW_BALLOON_STAT_DOORBELL,
 	VMW_BALLOON_STAT_RESET,
-	VMW_BALLOON_STAT_LAST = VMW_BALLOON_STAT_RESET
+	VMW_BALLOON_STAT_SHRINK,
+	VMW_BALLOON_STAT_SHRINK_FREE,
+	VMW_BALLOON_STAT_LAST = VMW_BALLOON_STAT_SHRINK_FREE
 };
 
 #define VMW_BALLOON_STAT_NUM		(VMW_BALLOON_STAT_LAST + 1)
 
-
 static DEFINE_STATIC_KEY_TRUE(vmw_balloon_batching);
 static DEFINE_STATIC_KEY_FALSE(balloon_stat_enabled);
 
@@ -321,6 +331,15 @@ struct vmballoon {
 	 */
 	struct page *page;
 
+	/**
+	 * @shrink_timeout: timeout until the next inflation.
+	 *
+	 * After an shrink event, indicates the time in jiffies after which
+	 * inflation is allowed again. Can be written concurrently with reads,
+	 * so must use READ_ONCE/WRITE_ONCE when accessing.
+	 */
+	unsigned long shrink_timeout;
+
 	/* statistics */
 	struct vmballoon_stats *stats;
 
@@ -361,6 +380,20 @@ struct vmballoon {
 	 * Lock ordering: @conf_sem -> @comm_lock .
 	 */
 	spinlock_t comm_lock;
+
+	/**
+	 * @shrinker: shrinker interface that is used to avoid over-inflation.
+	 */
+	struct shrinker shrinker;
+
+	/**
+	 * @shrinker_registered: whether the shrinker was registered.
+	 *
+	 * The shrinker interface does not handle gracefully the removal of
+	 * shrinker that was not registered before. This indication allows to
+	 * simplify the unregistration process.
+	 */
+	bool shrinker_registered;
 };
 
 static struct vmballoon balloon;
@@ -935,6 +968,10 @@ static int64_t vmballoon_change(struct vmballoon *b)
 	    size - target < vmballoon_page_in_frames(VMW_BALLOON_2M_PAGE))
 		return 0;
 
+	/* If an out-of-memory recently occurred, inflation is disallowed. */
+	if (target > size && time_before(jiffies, READ_ONCE(b->shrink_timeout)))
+		return 0;
+
 	return target - size;
 }
 
@@ -1430,6 +1467,90 @@ static void vmballoon_work(struct work_struct *work)
 
 }
 
+/**
+ * vmballoon_shrinker_scan() - deflate the balloon due to memory pressure.
+ * @shrinker: pointer to the balloon shrinker.
+ * @sc: page reclaim information.
+ *
+ * Returns: number of pages that were freed during deflation.
+ */
+static unsigned long vmballoon_shrinker_scan(struct shrinker *shrinker,
+					     struct shrink_control *sc)
+{
+	struct vmballoon *b = &balloon;
+	unsigned long deflated_frames;
+
+	pr_debug("%s - size: %llu", __func__, atomic64_read(&b->size));
+
+	vmballoon_stats_gen_inc(b, VMW_BALLOON_STAT_SHRINK);
+
+	/*
+	 * If the lock is also contended for read, we cannot easily reclaim and
+	 * we bail out.
+	 */
+	if (!down_read_trylock(&b->conf_sem))
+		return 0;
+
+	deflated_frames = vmballoon_deflate(b, sc->nr_to_scan, true);
+
+	vmballoon_stats_gen_add(b, VMW_BALLOON_STAT_SHRINK_FREE,
+				deflated_frames);
+
+	/*
+	 * Delay future inflation for some time to mitigate the situations in
+	 * which balloon continuously grows and shrinks. Use WRITE_ONCE() since
+	 * the access is asynchronous.
+	 */
+	WRITE_ONCE(b->shrink_timeout, jiffies + HZ * VMBALLOON_SHRINK_DELAY);
+
+	up_read(&b->conf_sem);
+
+	return deflated_frames;
+}
+
+/**
+ * vmballoon_shrinker_count() - return the number of ballooned pages.
+ * @shrinker: pointer to the balloon shrinker.
+ * @sc: page reclaim information.
+ *
+ * Returns: number of 4k pages that are allocated for the balloon and can
+ *	    therefore be reclaimed under pressure.
+ */
+static unsigned long vmballoon_shrinker_count(struct shrinker *shrinker,
+					      struct shrink_control *sc)
+{
+	struct vmballoon *b = &balloon;
+
+	return atomic64_read(&b->size);
+}
+
+static void vmballoon_unregister_shrinker(struct vmballoon *b)
+{
+	if (b->shrinker_registered)
+		unregister_shrinker(&b->shrinker);
+	b->shrinker_registered = false;
+}
+
+static int vmballoon_register_shrinker(struct vmballoon *b)
+{
+	int r;
+
+	/* Do nothing if the shrinker is not enabled */
+	if (!vmwballoon_shrinker_enable)
+		return 0;
+
+	b->shrinker.scan_objects = vmballoon_shrinker_scan;
+	b->shrinker.count_objects = vmballoon_shrinker_count;
+	b->shrinker.seeks = DEFAULT_SEEKS;
+
+	r = register_shrinker(&b->shrinker);
+
+	if (r == 0)
+		b->shrinker_registered = true;
+
+	return r;
+}
+
 /*
  * DEBUGFS Interface
  */
@@ -1447,6 +1568,8 @@ static const char * const vmballoon_stat_names[] = {
 	[VMW_BALLOON_STAT_TIMER]		= "timer",
 	[VMW_BALLOON_STAT_DOORBELL]		= "doorbell",
 	[VMW_BALLOON_STAT_RESET]		= "reset",
+	[VMW_BALLOON_STAT_SHRINK]		= "shrink",
+	[VMW_BALLOON_STAT_SHRINK_FREE]		= "shrinkFree"
 };
 
 static int vmballoon_enable_stats(struct vmballoon *b)
@@ -1780,6 +1903,10 @@ static int __init vmballoon_init(void)
 
 	INIT_DELAYED_WORK(&balloon.dwork, vmballoon_work);
 
+	error = vmballoon_register_shrinker(&balloon);
+	if (error)
+		goto fail;
+
 	error = vmballoon_debugfs_init(&balloon);
 	if (error)
 		goto fail;
@@ -1805,6 +1932,7 @@ static int __init vmballoon_init(void)
 
 	return 0;
 fail:
+	vmballoon_unregister_shrinker(&balloon);
 	vmballoon_compaction_deinit(&balloon);
 	return error;
 }
@@ -1819,6 +1947,7 @@ late_initcall(vmballoon_init);
 
 static void __exit vmballoon_exit(void)
 {
+	vmballoon_unregister_shrinker(&balloon);
 	vmballoon_vmci_cleanup(&balloon);
 	cancel_delayed_work_sync(&balloon.dwork);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/4] vmw_balloon: split refused pages
  2019-03-28  1:07 [PATCH v2 0/4] vmw_balloon: compaction and shrinker support Nadav Amit
                   ` (2 preceding siblings ...)
  2019-03-28  1:07 ` [PATCH v2 3/4] vmw_balloon: add memory shrinker Nadav Amit
@ 2019-03-28  1:07 ` Nadav Amit
  3 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-03-28  1:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann
  Cc: Michael S. Tsirkin, Jason Wang, virtualization, linux-mm,
	linux-kernel, VMware, Inc.,
	Julien Freche, Nadav Amit, Nadav Amit

The hypervisor might refuse to inflate pages. While the balloon driver
handles this scenario correctly, a refusal to inflate a 2MB pages might
cause the same page to be allocated again later just for its inflation
to be refused again. This wastes energy and time.

To avoid this situation, split the 2MB page to 4KB pages, and then try
to inflate each one individually. Most of the 4KB pages out of the 2MB
should be inflated successfully, and the balloon is likely to prevent
the scenario of repeated refused inflation.

Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 drivers/misc/vmw_balloon.c | 63 +++++++++++++++++++++++++++++++-------
 1 file changed, 52 insertions(+), 11 deletions(-)

diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 59d3c0202dcc..65ce8b41cd66 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -239,6 +239,7 @@ static DEFINE_STATIC_KEY_FALSE(balloon_stat_enabled);
 struct vmballoon_ctl {
 	struct list_head pages;
 	struct list_head refused_pages;
+	struct list_head prealloc_pages;
 	unsigned int n_refused_pages;
 	unsigned int n_pages;
 	enum vmballoon_page_size_type page_size;
@@ -668,15 +669,25 @@ static int vmballoon_alloc_page_list(struct vmballoon *b,
 	unsigned int i;
 
 	for (i = 0; i < req_n_pages; i++) {
-		if (ctl->page_size == VMW_BALLOON_2M_PAGE)
-			page = alloc_pages(__GFP_HIGHMEM|__GFP_NOWARN|
+		/*
+		 * First check if we happen to have pages that were allocated
+		 * before. This happens when 2MB page rejected during inflation
+		 * by the hypervisor, and then split into 4KB pages.
+		 */
+		if (!list_empty(&ctl->prealloc_pages)) {
+			page = list_first_entry(&ctl->prealloc_pages,
+						struct page, lru);
+			list_del(&page->lru);
+		} else {
+			if (ctl->page_size == VMW_BALLOON_2M_PAGE)
+				page = alloc_pages(__GFP_HIGHMEM|__GFP_NOWARN|
 					__GFP_NOMEMALLOC, VMW_BALLOON_2M_ORDER);
-		else
-			page = balloon_page_alloc();
+			else
+				page = balloon_page_alloc();
 
-		/* Update statistics */
-		vmballoon_stats_page_inc(b, VMW_BALLOON_PAGE_STAT_ALLOC,
-					 ctl->page_size);
+			vmballoon_stats_page_inc(b, VMW_BALLOON_PAGE_STAT_ALLOC,
+						 ctl->page_size);
+		}
 
 		if (page) {
 			vmballoon_mark_page_offline(page, ctl->page_size);
@@ -922,7 +933,8 @@ static void vmballoon_release_page_list(struct list_head *page_list,
 		__free_pages(page, vmballoon_page_order(page_size));
 	}
 
-	*n_pages = 0;
+	if (n_pages)
+		*n_pages = 0;
 }
 
 
@@ -1054,6 +1066,32 @@ static void vmballoon_dequeue_page_list(struct vmballoon *b,
 	*n_pages = i;
 }
 
+/**
+ * vmballoon_split_refused_pages() - Split the 2MB refused pages to 4k.
+ *
+ * If inflation of 2MB pages was denied by the hypervisor, it is likely to be
+ * due to one or few 4KB pages. These 2MB pages may keep being allocated and
+ * then being refused. To prevent this case, this function splits the refused
+ * pages into 4KB pages and adds them into @prealloc_pages list.
+ *
+ * @ctl: pointer for the %struct vmballoon_ctl, which defines the operation.
+ */
+static void vmballoon_split_refused_pages(struct vmballoon_ctl *ctl)
+{
+	struct page *page, *tmp;
+	unsigned int i, order;
+
+	order = vmballoon_page_order(ctl->page_size);
+
+	list_for_each_entry_safe(page, tmp, &ctl->refused_pages, lru) {
+		list_del(&page->lru);
+		split_page(page, order);
+		for (i = 0; i < (1 << order); i++)
+			list_add(&page[i].lru, &ctl->prealloc_pages);
+	}
+	ctl->n_refused_pages = 0;
+}
+
 /**
  * vmballoon_inflate() - Inflate the balloon towards its target size.
  *
@@ -1065,6 +1103,7 @@ static void vmballoon_inflate(struct vmballoon *b)
 	struct vmballoon_ctl ctl = {
 		.pages = LIST_HEAD_INIT(ctl.pages),
 		.refused_pages = LIST_HEAD_INIT(ctl.refused_pages),
+		.prealloc_pages = LIST_HEAD_INIT(ctl.prealloc_pages),
 		.page_size = b->max_page_size,
 		.op = VMW_BALLOON_INFLATE
 	};
@@ -1112,10 +1151,10 @@ static void vmballoon_inflate(struct vmballoon *b)
 				break;
 
 			/*
-			 * Ignore errors from locking as we now switch to 4k
-			 * pages and we might get different errors.
+			 * Split the refused pages to 4k. This will also empty
+			 * the refused pages list.
 			 */
-			vmballoon_release_refused_pages(b, &ctl);
+			vmballoon_split_refused_pages(&ctl);
 			ctl.page_size--;
 		}
 
@@ -1129,6 +1168,8 @@ static void vmballoon_inflate(struct vmballoon *b)
 	 */
 	if (ctl.n_refused_pages != 0)
 		vmballoon_release_refused_pages(b, &ctl);
+
+	vmballoon_release_page_list(&ctl.prealloc_pages, NULL, ctl.page_size);
 }
 
 /**
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-03-28  1:07 ` [PATCH v2 1/4] mm/balloon_compaction: list interfaces Nadav Amit
@ 2019-04-08 17:35     ` Nadav Amit
  2019-04-19 22:07   ` Michael S. Tsirkin
  2019-04-19 22:07   ` Michael S. Tsirkin
  2 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-08 17:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, Linux-MM, Linux List Kernel Mailing,
	Pv-drivers, Julien Freche, Greg Kroah-Hartman, Arnd Bergmann

> On Mar 27, 2019, at 6:07 PM, Nadav Amit <namit@vmware.com> wrote:
> 
> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> of pages. These interfaces reduce the overhead of storing and restoring
> IRQs by batching the operations. In addition they do not panic if the
> list of pages is empty.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>

Michael, may I ping for your ack?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
@ 2019-04-08 17:35     ` Nadav Amit
  0 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-08 17:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, Linux-MM, Linux List Kernel Mailing,
	Pv-drivers, Julien Freche, Greg Kroah-Hartman, Arnd Bergmann

> On Mar 27, 2019, at 6:07 PM, Nadav Amit <namit@vmware.com> wrote:
> 
> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> of pages. These interfaces reduce the overhead of storing and restoring
> IRQs by batching the operations. In addition they do not panic if the
> list of pages is empty.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>

Michael, may I ping for your ack?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-04-08 17:35     ` Nadav Amit
@ 2019-04-19 21:41       ` Nadav Amit
  -1 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-19 21:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, Linux-MM, Linux List Kernel Mailing,
	Pv-drivers, Julien Freche, Greg Kroah-Hartman, Arnd Bergmann,
	Nadav Amit

> On Apr 8, 2019, at 10:35 AM, Nadav Amit <namit@vmware.com> wrote:
> 
>> On Mar 27, 2019, at 6:07 PM, Nadav Amit <namit@vmware.com> wrote:
>> 
>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>> of pages. These interfaces reduce the overhead of storing and restoring
>> IRQs by batching the operations. In addition they do not panic if the
>> list of pages is empty.
>> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> 
> Michael, may I ping for your ack?

Ping again?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
@ 2019-04-19 21:41       ` Nadav Amit
  0 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-19 21:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, Linux-MM, Linux List Kernel Mailing,
	Pv-drivers, Julien Freche, Greg Kroah-Hartman, Arnd Bergmann,
	Nadav Amit

> On Apr 8, 2019, at 10:35 AM, Nadav Amit <namit@vmware.com> wrote:
> 
>> On Mar 27, 2019, at 6:07 PM, Nadav Amit <namit@vmware.com> wrote:
>> 
>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>> of pages. These interfaces reduce the overhead of storing and restoring
>> IRQs by batching the operations. In addition they do not panic if the
>> list of pages is empty.
>> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> 
> Michael, may I ping for your ack?

Ping again?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-04-08 17:35     ` Nadav Amit
  (?)
@ 2019-04-19 21:41     ` Nadav Amit
  -1 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-19 21:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Arnd Bergmann, Pv-drivers, Greg Kroah-Hartman,
	Linux List Kernel Mailing, virtualization, Linux-MM,
	Julien Freche, Nadav Amit

> On Apr 8, 2019, at 10:35 AM, Nadav Amit <namit@vmware.com> wrote:
> 
>> On Mar 27, 2019, at 6:07 PM, Nadav Amit <namit@vmware.com> wrote:
>> 
>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>> of pages. These interfaces reduce the overhead of storing and restoring
>> IRQs by batching the operations. In addition they do not panic if the
>> list of pages is empty.
>> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> 
> Michael, may I ping for your ack?

Ping again?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-03-28  1:07 ` [PATCH v2 1/4] mm/balloon_compaction: list interfaces Nadav Amit
  2019-04-08 17:35     ` Nadav Amit
@ 2019-04-19 22:07   ` Michael S. Tsirkin
  2019-04-19 22:34       ` Nadav Amit
  2019-04-19 22:07   ` Michael S. Tsirkin
  2 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2019-04-19 22:07 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Jason Wang, virtualization,
	linux-mm, linux-kernel, VMware, Inc.,
	Julien Freche, Nadav Amit

On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> of pages. These interfaces reduce the overhead of storing and restoring
> IRQs by batching the operations. In addition they do not panic if the
> list of pages is empty.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: virtualization@lists.linux-foundation.org
> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  include/linux/balloon_compaction.h |   4 +
>  mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>  2 files changed, 111 insertions(+), 38 deletions(-)
> 
> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> index f111c780ef1d..1da79edadb69 100644
> --- a/include/linux/balloon_compaction.h
> +++ b/include/linux/balloon_compaction.h
> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>  extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  				 struct page *page);
>  extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> +				      struct list_head *pages);
> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> +				     struct list_head *pages, int n_req_pages);
>  

Why size_t I wonder? It can never be > n_req_pages which is int.
Callers also seem to assume int.

>  static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>  {


> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index ef858d547e2d..88d5d9a01072 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -10,6 +10,106 @@
>  #include <linux/export.h>
>  #include <linux/balloon_compaction.h>
>  
> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
> +				     struct page *page)
> +{
> +	/*
> +	 * Block others from accessing the 'page' when we get around to
> +	 * establishing additional references. We should be the only one
> +	 * holding a reference to the 'page' at this point.
> +	 */
> +	if (!trylock_page(page)) {
> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
> +		return -EFAULT;

Looks like all callers bug on a failure. So let's just do it here,
and then make this void?

> +	}
> +	list_del(&page->lru);
> +	balloon_page_insert(b_dev_info, page);
> +	unlock_page(page);
> +	__count_vm_event(BALLOON_INFLATE);
> +	return 0;
> +}
> +
> +/**
> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
> + *				 list.
> + * @b_dev_info: balloon device descriptor where we will insert a new page to
> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
> + *
> + * Driver must call it to properly enqueue a balloon pages before definitively
> + * removing it from the guest system.

A bunch of grammar error here. Pls fix for clarify.
Also - document that nothing must lock the pages? More assumptions?
What is "it" in this context? All pages? And what does removing from
guest mean? Really adding to the balloon?

> + *
> + * Return: number of pages that were enqueued.
> + */
> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> +			       struct list_head *pages)
> +{
> +	struct page *page, *tmp;
> +	unsigned long flags;
> +	size_t n_pages = 0;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	list_for_each_entry_safe(page, tmp, pages, lru) {
> +		balloon_page_enqueue_one(b_dev_info, page);

Do we want to do something about an error here?

> +		n_pages++;
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +	return n_pages;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
> +
> +/**
> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
> + *				 returns a list of the pages.
> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> + * @pages: pointer to the list of pages that would be returned to the caller.
> + * @n_req_pages: number of requested pages.
> + *
> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
> + * before definetively releasing it back to the guest system. This function
> + * tries to remove @n_req_pages from the ballooned pages and return it to the
> + * caller in the @pages list.
> + *
> + * Note that this function may fail to dequeue some pages temporarily empty due
> + * to compaction isolated pages.
> + *
> + * Return: number of pages that were added to the @pages list.
> + */
> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> +				 struct list_head *pages, int n_req_pages)
> +{
> +	struct page *page, *tmp;
> +	unsigned long flags;
> +	size_t n_pages = 0;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> +		/*
> +		 * Block others from accessing the 'page' while we get around
> +		 * establishing additional references and preparing the 'page'
> +		 * to be released by the balloon driver.
> +		 */
> +		if (!trylock_page(page))
> +			continue;
> +
> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
> +		    PageIsolated(page)) {
> +			/* raced with isolation */
> +			unlock_page(page);
> +			continue;
> +		}
> +		balloon_page_delete(page);
> +		__count_vm_event(BALLOON_DEFLATE);
> +		unlock_page(page);
> +		list_add(&page->lru, pages);
> +		if (++n_pages >= n_req_pages)
> +			break;
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
> +	return n_pages;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> +
>  /*
>   * balloon_page_alloc - allocates a new page for insertion into the balloon
>   *			  page list.
> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  {
>  	unsigned long flags;
>  
> -	/*
> -	 * Block others from accessing the 'page' when we get around to
> -	 * establishing additional references. We should be the only one
> -	 * holding a reference to the 'page' at this point.
> -	 */
> -	BUG_ON(!trylock_page(page));
>  	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> -	balloon_page_insert(b_dev_info, page);
> -	__count_vm_event(BALLOON_INFLATE);
> +	balloon_page_enqueue_one(b_dev_info, page);

We used to bug on failure to lock page, now we
silently ignore this error. Why?


>  	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> -	unlock_page(page);
>  }
>  EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>  
> @@ -70,36 +162,13 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>   */
>  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  {
> -	struct page *page, *tmp;
>  	unsigned long flags;
> -	bool dequeued_page;
> +	LIST_HEAD(pages);
> +	int n_pages;
>  
> -	dequeued_page = false;
> -	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> -		/*
> -		 * Block others from accessing the 'page' while we get around
> -		 * establishing additional references and preparing the 'page'
> -		 * to be released by the balloon driver.
> -		 */
> -		if (trylock_page(page)) {
> -#ifdef CONFIG_BALLOON_COMPACTION
> -			if (PageIsolated(page)) {
> -				/* raced with isolation */
> -				unlock_page(page);
> -				continue;
> -			}
> -#endif
> -			balloon_page_delete(page);
> -			__count_vm_event(BALLOON_DEFLATE);
> -			unlock_page(page);
> -			dequeued_page = true;
> -			break;
> -		}
> -	}
> -	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +	n_pages = balloon_page_list_dequeue(b_dev_info, &pages, 1);
>  
> -	if (!dequeued_page) {
> +	if (n_pages != 1) {
>  		/*
>  		 * If we are unable to dequeue a balloon page because the page
>  		 * list is empty and there is no isolated pages, then something
> @@ -112,9 +181,9 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			     !b_dev_info->isolated_pages))
>  			BUG();
>  		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> -		page = NULL;
> +		return NULL;
>  	}
> -	return page;
> +	return list_first_entry(&pages, struct page, lru);
>  }
>  EXPORT_SYMBOL_GPL(balloon_page_dequeue);
>  
> -- 
> 2.19.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-03-28  1:07 ` [PATCH v2 1/4] mm/balloon_compaction: list interfaces Nadav Amit
  2019-04-08 17:35     ` Nadav Amit
  2019-04-19 22:07   ` Michael S. Tsirkin
@ 2019-04-19 22:07   ` Michael S. Tsirkin
  2 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2019-04-19 22:07 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, VMware, Inc.,
	Greg Kroah-Hartman, linux-kernel, virtualization, linux-mm,
	Julien Freche, Nadav Amit

On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> of pages. These interfaces reduce the overhead of storing and restoring
> IRQs by batching the operations. In addition they do not panic if the
> list of pages is empty.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: virtualization@lists.linux-foundation.org
> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  include/linux/balloon_compaction.h |   4 +
>  mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>  2 files changed, 111 insertions(+), 38 deletions(-)
> 
> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> index f111c780ef1d..1da79edadb69 100644
> --- a/include/linux/balloon_compaction.h
> +++ b/include/linux/balloon_compaction.h
> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>  extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  				 struct page *page);
>  extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> +				      struct list_head *pages);
> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> +				     struct list_head *pages, int n_req_pages);
>  

Why size_t I wonder? It can never be > n_req_pages which is int.
Callers also seem to assume int.

>  static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>  {


> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index ef858d547e2d..88d5d9a01072 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -10,6 +10,106 @@
>  #include <linux/export.h>
>  #include <linux/balloon_compaction.h>
>  
> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
> +				     struct page *page)
> +{
> +	/*
> +	 * Block others from accessing the 'page' when we get around to
> +	 * establishing additional references. We should be the only one
> +	 * holding a reference to the 'page' at this point.
> +	 */
> +	if (!trylock_page(page)) {
> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
> +		return -EFAULT;

Looks like all callers bug on a failure. So let's just do it here,
and then make this void?

> +	}
> +	list_del(&page->lru);
> +	balloon_page_insert(b_dev_info, page);
> +	unlock_page(page);
> +	__count_vm_event(BALLOON_INFLATE);
> +	return 0;
> +}
> +
> +/**
> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
> + *				 list.
> + * @b_dev_info: balloon device descriptor where we will insert a new page to
> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
> + *
> + * Driver must call it to properly enqueue a balloon pages before definitively
> + * removing it from the guest system.

A bunch of grammar error here. Pls fix for clarify.
Also - document that nothing must lock the pages? More assumptions?
What is "it" in this context? All pages? And what does removing from
guest mean? Really adding to the balloon?

> + *
> + * Return: number of pages that were enqueued.
> + */
> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> +			       struct list_head *pages)
> +{
> +	struct page *page, *tmp;
> +	unsigned long flags;
> +	size_t n_pages = 0;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	list_for_each_entry_safe(page, tmp, pages, lru) {
> +		balloon_page_enqueue_one(b_dev_info, page);

Do we want to do something about an error here?

> +		n_pages++;
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +	return n_pages;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
> +
> +/**
> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
> + *				 returns a list of the pages.
> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> + * @pages: pointer to the list of pages that would be returned to the caller.
> + * @n_req_pages: number of requested pages.
> + *
> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
> + * before definetively releasing it back to the guest system. This function
> + * tries to remove @n_req_pages from the ballooned pages and return it to the
> + * caller in the @pages list.
> + *
> + * Note that this function may fail to dequeue some pages temporarily empty due
> + * to compaction isolated pages.
> + *
> + * Return: number of pages that were added to the @pages list.
> + */
> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> +				 struct list_head *pages, int n_req_pages)
> +{
> +	struct page *page, *tmp;
> +	unsigned long flags;
> +	size_t n_pages = 0;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> +		/*
> +		 * Block others from accessing the 'page' while we get around
> +		 * establishing additional references and preparing the 'page'
> +		 * to be released by the balloon driver.
> +		 */
> +		if (!trylock_page(page))
> +			continue;
> +
> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
> +		    PageIsolated(page)) {
> +			/* raced with isolation */
> +			unlock_page(page);
> +			continue;
> +		}
> +		balloon_page_delete(page);
> +		__count_vm_event(BALLOON_DEFLATE);
> +		unlock_page(page);
> +		list_add(&page->lru, pages);
> +		if (++n_pages >= n_req_pages)
> +			break;
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
> +	return n_pages;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> +
>  /*
>   * balloon_page_alloc - allocates a new page for insertion into the balloon
>   *			  page list.
> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  {
>  	unsigned long flags;
>  
> -	/*
> -	 * Block others from accessing the 'page' when we get around to
> -	 * establishing additional references. We should be the only one
> -	 * holding a reference to the 'page' at this point.
> -	 */
> -	BUG_ON(!trylock_page(page));
>  	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> -	balloon_page_insert(b_dev_info, page);
> -	__count_vm_event(BALLOON_INFLATE);
> +	balloon_page_enqueue_one(b_dev_info, page);

We used to bug on failure to lock page, now we
silently ignore this error. Why?


>  	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> -	unlock_page(page);
>  }
>  EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>  
> @@ -70,36 +162,13 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>   */
>  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  {
> -	struct page *page, *tmp;
>  	unsigned long flags;
> -	bool dequeued_page;
> +	LIST_HEAD(pages);
> +	int n_pages;
>  
> -	dequeued_page = false;
> -	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> -		/*
> -		 * Block others from accessing the 'page' while we get around
> -		 * establishing additional references and preparing the 'page'
> -		 * to be released by the balloon driver.
> -		 */
> -		if (trylock_page(page)) {
> -#ifdef CONFIG_BALLOON_COMPACTION
> -			if (PageIsolated(page)) {
> -				/* raced with isolation */
> -				unlock_page(page);
> -				continue;
> -			}
> -#endif
> -			balloon_page_delete(page);
> -			__count_vm_event(BALLOON_DEFLATE);
> -			unlock_page(page);
> -			dequeued_page = true;
> -			break;
> -		}
> -	}
> -	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +	n_pages = balloon_page_list_dequeue(b_dev_info, &pages, 1);
>  
> -	if (!dequeued_page) {
> +	if (n_pages != 1) {
>  		/*
>  		 * If we are unable to dequeue a balloon page because the page
>  		 * list is empty and there is no isolated pages, then something
> @@ -112,9 +181,9 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			     !b_dev_info->isolated_pages))
>  			BUG();
>  		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> -		page = NULL;
> +		return NULL;
>  	}
> -	return page;
> +	return list_first_entry(&pages, struct page, lru);
>  }
>  EXPORT_SYMBOL_GPL(balloon_page_dequeue);
>  
> -- 
> 2.19.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-04-19 22:07   ` Michael S. Tsirkin
@ 2019-04-19 22:34       ` Nadav Amit
  0 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-19 22:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Jason Wang, virtualization,
	linux-mm, linux-kernel, Pv-drivers, Julien Freche

> On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>> of pages. These interfaces reduce the overhead of storing and restoring
>> IRQs by batching the operations. In addition they do not panic if the
>> list of pages is empty.
>> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: linux-mm@kvack.org
>> Cc: virtualization@lists.linux-foundation.org
>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> include/linux/balloon_compaction.h |   4 +
>> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>> 2 files changed, 111 insertions(+), 38 deletions(-)
>> 
>> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
>> index f111c780ef1d..1da79edadb69 100644
>> --- a/include/linux/balloon_compaction.h
>> +++ b/include/linux/balloon_compaction.h
>> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> 				 struct page *page);
>> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>> +				      struct list_head *pages);
>> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> +				     struct list_head *pages, int n_req_pages);
> 
> Why size_t I wonder? It can never be > n_req_pages which is int.
> Callers also seem to assume int.

Only because on the previous iteration
( https://lkml.org/lkml/2019/2/6/912 ) you said:

> Are we sure this int never overflows? Why not just use u64
> or size_t straight away?

I am ok either way, but please be consistent.

> 
>> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>> {
> 
> 
>> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
>> index ef858d547e2d..88d5d9a01072 100644
>> --- a/mm/balloon_compaction.c
>> +++ b/mm/balloon_compaction.c
>> @@ -10,6 +10,106 @@
>> #include <linux/export.h>
>> #include <linux/balloon_compaction.h>
>> 
>> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
>> +				     struct page *page)
>> +{
>> +	/*
>> +	 * Block others from accessing the 'page' when we get around to
>> +	 * establishing additional references. We should be the only one
>> +	 * holding a reference to the 'page' at this point.
>> +	 */
>> +	if (!trylock_page(page)) {
>> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
>> +		return -EFAULT;
> 
> Looks like all callers bug on a failure. So let's just do it here,
> and then make this void?

As you noted below, actually balloon_page_list_enqueue() does not do
anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
I always get pushed back on such things. Yes, this might lead to memory
leak, but there is no reason to crash the system.

>> +	}
>> +	list_del(&page->lru);
>> +	balloon_page_insert(b_dev_info, page);
>> +	unlock_page(page);
>> +	__count_vm_event(BALLOON_INFLATE);
>> +	return 0;
>> +}
>> +
>> +/**
>> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
>> + *				 list.
>> + * @b_dev_info: balloon device descriptor where we will insert a new page to
>> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
>> + *
>> + * Driver must call it to properly enqueue a balloon pages before definitively
>> + * removing it from the guest system.
> 
> A bunch of grammar error here. Pls fix for clarify.
> Also - document that nothing must lock the pages? More assumptions?
> What is "it" in this context? All pages? And what does removing from
> guest mean? Really adding to the balloon?

I pretty much copy-pasted this description from balloon_page_enqueue(). I
see that you edited this message in the past at least couple of times (e.g.,
c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.

So maybe all of the comments in this file need a rework, but I don’t think
this patch-set needs to do it.

>> + *
>> + * Return: number of pages that were enqueued.
>> + */
>> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>> +			       struct list_head *pages)
>> +{
>> +	struct page *page, *tmp;
>> +	unsigned long flags;
>> +	size_t n_pages = 0;
>> +
>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> +	list_for_each_entry_safe(page, tmp, pages, lru) {
>> +		balloon_page_enqueue_one(b_dev_info, page);
> 
> Do we want to do something about an error here?

Hmm… This is really something that should never happen, but I still prefer
to avoid BUG_ON(), as I said before. I will just not count the page.

> 
>> +		n_pages++;
>> +	}
>> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>> +	return n_pages;
>> +}
>> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
>> +
>> +/**
>> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
>> + *				 returns a list of the pages.
>> + * @b_dev_info: balloon device decriptor where we will grab a page from.
>> + * @pages: pointer to the list of pages that would be returned to the caller.
>> + * @n_req_pages: number of requested pages.
>> + *
>> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
>> + * before definetively releasing it back to the guest system. This function
>> + * tries to remove @n_req_pages from the ballooned pages and return it to the
>> + * caller in the @pages list.
>> + *
>> + * Note that this function may fail to dequeue some pages temporarily empty due
>> + * to compaction isolated pages.
>> + *
>> + * Return: number of pages that were added to the @pages list.
>> + */
>> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> +				 struct list_head *pages, int n_req_pages)
>> +{
>> +	struct page *page, *tmp;
>> +	unsigned long flags;
>> +	size_t n_pages = 0;
>> +
>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
>> +		/*
>> +		 * Block others from accessing the 'page' while we get around
>> +		 * establishing additional references and preparing the 'page'
>> +		 * to be released by the balloon driver.
>> +		 */
>> +		if (!trylock_page(page))
>> +			continue;
>> +
>> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
>> +		    PageIsolated(page)) {
>> +			/* raced with isolation */
>> +			unlock_page(page);
>> +			continue;
>> +		}
>> +		balloon_page_delete(page);
>> +		__count_vm_event(BALLOON_DEFLATE);
>> +		unlock_page(page);
>> +		list_add(&page->lru, pages);
>> +		if (++n_pages >= n_req_pages)
>> +			break;
>> +	}
>> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>> +
>> +	return n_pages;
>> +}
>> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>> +
>> /*
>>  * balloon_page_alloc - allocates a new page for insertion into the balloon
>>  *			  page list.
>> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> {
>> 	unsigned long flags;
>> 
>> -	/*
>> -	 * Block others from accessing the 'page' when we get around to
>> -	 * establishing additional references. We should be the only one
>> -	 * holding a reference to the 'page' at this point.
>> -	 */
>> -	BUG_ON(!trylock_page(page));
>> 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> -	balloon_page_insert(b_dev_info, page);
>> -	__count_vm_event(BALLOON_INFLATE);
>> +	balloon_page_enqueue_one(b_dev_info, page);
> 
> We used to bug on failure to lock page, now we
> silently ignore this error. Why?

That’s a mistake. I’ll add a BUG_ON() if balloon_page_enqueue_one() fails.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
@ 2019-04-19 22:34       ` Nadav Amit
  0 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-19 22:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Jason Wang, virtualization,
	linux-mm, linux-kernel, Pv-drivers, Julien Freche

> On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>> of pages. These interfaces reduce the overhead of storing and restoring
>> IRQs by batching the operations. In addition they do not panic if the
>> list of pages is empty.
>> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: linux-mm@kvack.org
>> Cc: virtualization@lists.linux-foundation.org
>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> include/linux/balloon_compaction.h |   4 +
>> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>> 2 files changed, 111 insertions(+), 38 deletions(-)
>> 
>> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
>> index f111c780ef1d..1da79edadb69 100644
>> --- a/include/linux/balloon_compaction.h
>> +++ b/include/linux/balloon_compaction.h
>> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> 				 struct page *page);
>> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>> +				      struct list_head *pages);
>> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> +				     struct list_head *pages, int n_req_pages);
> 
> Why size_t I wonder? It can never be > n_req_pages which is int.
> Callers also seem to assume int.

Only because on the previous iteration
( https://lkml.org/lkml/2019/2/6/912 ) you said:

> Are we sure this int never overflows? Why not just use u64
> or size_t straight away?

I am ok either way, but please be consistent.

> 
>> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>> {
> 
> 
>> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
>> index ef858d547e2d..88d5d9a01072 100644
>> --- a/mm/balloon_compaction.c
>> +++ b/mm/balloon_compaction.c
>> @@ -10,6 +10,106 @@
>> #include <linux/export.h>
>> #include <linux/balloon_compaction.h>
>> 
>> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
>> +				     struct page *page)
>> +{
>> +	/*
>> +	 * Block others from accessing the 'page' when we get around to
>> +	 * establishing additional references. We should be the only one
>> +	 * holding a reference to the 'page' at this point.
>> +	 */
>> +	if (!trylock_page(page)) {
>> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
>> +		return -EFAULT;
> 
> Looks like all callers bug on a failure. So let's just do it here,
> and then make this void?

As you noted below, actually balloon_page_list_enqueue() does not do
anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
I always get pushed back on such things. Yes, this might lead to memory
leak, but there is no reason to crash the system.

>> +	}
>> +	list_del(&page->lru);
>> +	balloon_page_insert(b_dev_info, page);
>> +	unlock_page(page);
>> +	__count_vm_event(BALLOON_INFLATE);
>> +	return 0;
>> +}
>> +
>> +/**
>> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
>> + *				 list.
>> + * @b_dev_info: balloon device descriptor where we will insert a new page to
>> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
>> + *
>> + * Driver must call it to properly enqueue a balloon pages before definitively
>> + * removing it from the guest system.
> 
> A bunch of grammar error here. Pls fix for clarify.
> Also - document that nothing must lock the pages? More assumptions?
> What is "it" in this context? All pages? And what does removing from
> guest mean? Really adding to the balloon?

I pretty much copy-pasted this description from balloon_page_enqueue(). I
see that you edited this message in the past at least couple of times (e.g.,
c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.

So maybe all of the comments in this file need a rework, but I don’t think
this patch-set needs to do it.

>> + *
>> + * Return: number of pages that were enqueued.
>> + */
>> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>> +			       struct list_head *pages)
>> +{
>> +	struct page *page, *tmp;
>> +	unsigned long flags;
>> +	size_t n_pages = 0;
>> +
>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> +	list_for_each_entry_safe(page, tmp, pages, lru) {
>> +		balloon_page_enqueue_one(b_dev_info, page);
> 
> Do we want to do something about an error here?

Hmm… This is really something that should never happen, but I still prefer
to avoid BUG_ON(), as I said before. I will just not count the page.

> 
>> +		n_pages++;
>> +	}
>> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>> +	return n_pages;
>> +}
>> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
>> +
>> +/**
>> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
>> + *				 returns a list of the pages.
>> + * @b_dev_info: balloon device decriptor where we will grab a page from.
>> + * @pages: pointer to the list of pages that would be returned to the caller.
>> + * @n_req_pages: number of requested pages.
>> + *
>> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
>> + * before definetively releasing it back to the guest system. This function
>> + * tries to remove @n_req_pages from the ballooned pages and return it to the
>> + * caller in the @pages list.
>> + *
>> + * Note that this function may fail to dequeue some pages temporarily empty due
>> + * to compaction isolated pages.
>> + *
>> + * Return: number of pages that were added to the @pages list.
>> + */
>> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> +				 struct list_head *pages, int n_req_pages)
>> +{
>> +	struct page *page, *tmp;
>> +	unsigned long flags;
>> +	size_t n_pages = 0;
>> +
>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
>> +		/*
>> +		 * Block others from accessing the 'page' while we get around
>> +		 * establishing additional references and preparing the 'page'
>> +		 * to be released by the balloon driver.
>> +		 */
>> +		if (!trylock_page(page))
>> +			continue;
>> +
>> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
>> +		    PageIsolated(page)) {
>> +			/* raced with isolation */
>> +			unlock_page(page);
>> +			continue;
>> +		}
>> +		balloon_page_delete(page);
>> +		__count_vm_event(BALLOON_DEFLATE);
>> +		unlock_page(page);
>> +		list_add(&page->lru, pages);
>> +		if (++n_pages >= n_req_pages)
>> +			break;
>> +	}
>> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>> +
>> +	return n_pages;
>> +}
>> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>> +
>> /*
>>  * balloon_page_alloc - allocates a new page for insertion into the balloon
>>  *			  page list.
>> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> {
>> 	unsigned long flags;
>> 
>> -	/*
>> -	 * Block others from accessing the 'page' when we get around to
>> -	 * establishing additional references. We should be the only one
>> -	 * holding a reference to the 'page' at this point.
>> -	 */
>> -	BUG_ON(!trylock_page(page));
>> 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> -	balloon_page_insert(b_dev_info, page);
>> -	__count_vm_event(BALLOON_INFLATE);
>> +	balloon_page_enqueue_one(b_dev_info, page);
> 
> We used to bug on failure to lock page, now we
> silently ignore this error. Why?

That’s a mistake. I’ll add a BUG_ON() if balloon_page_enqueue_one() fails.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-04-19 22:34       ` Nadav Amit
@ 2019-04-19 22:47         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2019-04-19 22:47 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Jason Wang, virtualization,
	linux-mm, linux-kernel, Pv-drivers, Julien Freche

On Fri, Apr 19, 2019 at 10:34:04PM +0000, Nadav Amit wrote:
> > On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
> >> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> >> of pages. These interfaces reduce the overhead of storing and restoring
> >> IRQs by batching the operations. In addition they do not panic if the
> >> list of pages is empty.
> >> 
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: linux-mm@kvack.org
> >> Cc: virtualization@lists.linux-foundation.org
> >> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >> ---
> >> include/linux/balloon_compaction.h |   4 +
> >> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
> >> 2 files changed, 111 insertions(+), 38 deletions(-)
> >> 
> >> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> >> index f111c780ef1d..1da79edadb69 100644
> >> --- a/include/linux/balloon_compaction.h
> >> +++ b/include/linux/balloon_compaction.h
> >> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
> >> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> 				 struct page *page);
> >> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> >> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +				      struct list_head *pages);
> >> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				     struct list_head *pages, int n_req_pages);
> > 
> > Why size_t I wonder? It can never be > n_req_pages which is int.
> > Callers also seem to assume int.
> 
> Only because on the previous iteration
> ( https://lkml.org/lkml/2019/2/6/912 ) you said:
> 
> > Are we sure this int never overflows? Why not just use u64
> > or size_t straight away?

And the answer is because n_req_pages is an int too?

> 
> I am ok either way, but please be consistent.

I guess n_req_pages should be size_t too then?

> > 
> >> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
> >> {
> > 
> > 
> >> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> >> index ef858d547e2d..88d5d9a01072 100644
> >> --- a/mm/balloon_compaction.c
> >> +++ b/mm/balloon_compaction.c
> >> @@ -10,6 +10,106 @@
> >> #include <linux/export.h>
> >> #include <linux/balloon_compaction.h>
> >> 
> >> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
> >> +				     struct page *page)
> >> +{
> >> +	/*
> >> +	 * Block others from accessing the 'page' when we get around to
> >> +	 * establishing additional references. We should be the only one
> >> +	 * holding a reference to the 'page' at this point.
> >> +	 */
> >> +	if (!trylock_page(page)) {
> >> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
> >> +		return -EFAULT;
> > 
> > Looks like all callers bug on a failure. So let's just do it here,
> > and then make this void?
> 
> As you noted below, actually balloon_page_list_enqueue() does not do
> anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
> I always get pushed back on such things. Yes, this might lead to memory
> leak, but there is no reason to crash the system.

Need to audit callers to make sure they don't misbehave in worse ways.

I think in this case this indicates that someone is using the page so if
one keeps going and adds it into balloon this will lead to corruption down the road.

If you can change the caller code such that it's just a leak,
then a warning is more appropriate. Or even do not warn at all.


> >> +	}
> >> +	list_del(&page->lru);
> >> +	balloon_page_insert(b_dev_info, page);
> >> +	unlock_page(page);
> >> +	__count_vm_event(BALLOON_INFLATE);
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
> >> + *				 list.
> >> + * @b_dev_info: balloon device descriptor where we will insert a new page to
> >> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
> >> + *
> >> + * Driver must call it to properly enqueue a balloon pages before definitively
> >> + * removing it from the guest system.
> > 
> > A bunch of grammar error here. Pls fix for clarify.
> > Also - document that nothing must lock the pages? More assumptions?
> > What is "it" in this context? All pages? And what does removing from
> > guest mean? Really adding to the balloon?
> 
> I pretty much copy-pasted this description from balloon_page_enqueue(). I
> see that you edited this message in the past at least couple of times (e.g.,
> c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.
> 
> So maybe all of the comments in this file need a rework, but I don’t think
> this patch-set needs to do it.

I see.
That one dealt with one page so "it" was the page. This one deals with
many pages so you can't just copy it over without changes.
Makes it look like "it" refers to driver or guest.

> >> + *
> >> + * Return: number of pages that were enqueued.
> >> + */
> >> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +			       struct list_head *pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, pages, lru) {
> >> +		balloon_page_enqueue_one(b_dev_info, page);
> > 
> > Do we want to do something about an error here?
> 
> Hmm… This is really something that should never happen, but I still prefer
> to avoid BUG_ON(), as I said before. I will just not count the page.

Callers can BUG then if they want. That is fine but you then
need to change the callers to do it.

> > 
> >> +		n_pages++;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
> >> +
> >> +/**
> >> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
> >> + *				 returns a list of the pages.
> >> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> >> + * @pages: pointer to the list of pages that would be returned to the caller.
> >> + * @n_req_pages: number of requested pages.
> >> + *
> >> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
> >> + * before definetively releasing it back to the guest system. This function
> >> + * tries to remove @n_req_pages from the ballooned pages and return it to the
> >> + * caller in the @pages list.
> >> + *
> >> + * Note that this function may fail to dequeue some pages temporarily empty due
> >> + * to compaction isolated pages.
> >> + *
> >> + * Return: number of pages that were added to the @pages list.
> >> + */
> >> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				 struct list_head *pages, int n_req_pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> >> +		/*
> >> +		 * Block others from accessing the 'page' while we get around
> >> +		 * establishing additional references and preparing the 'page'
> >> +		 * to be released by the balloon driver.
> >> +		 */
> >> +		if (!trylock_page(page))
> >> +			continue;
> >> +
> >> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
> >> +		    PageIsolated(page)) {
> >> +			/* raced with isolation */
> >> +			unlock_page(page);
> >> +			continue;
> >> +		}
> >> +		balloon_page_delete(page);
> >> +		__count_vm_event(BALLOON_DEFLATE);
> >> +		unlock_page(page);
> >> +		list_add(&page->lru, pages);
> >> +		if (++n_pages >= n_req_pages)
> >> +			break;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> >> +
> >> /*
> >>  * balloon_page_alloc - allocates a new page for insertion into the balloon
> >>  *			  page list.
> >> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> {
> >> 	unsigned long flags;
> >> 
> >> -	/*
> >> -	 * Block others from accessing the 'page' when we get around to
> >> -	 * establishing additional references. We should be the only one
> >> -	 * holding a reference to the 'page' at this point.
> >> -	 */
> >> -	BUG_ON(!trylock_page(page));
> >> 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> -	balloon_page_insert(b_dev_info, page);
> >> -	__count_vm_event(BALLOON_INFLATE);
> >> +	balloon_page_enqueue_one(b_dev_info, page);
> > 
> > We used to bug on failure to lock page, now we
> > silently ignore this error. Why?
> 
> That’s a mistake. I’ll add a BUG_ON() if balloon_page_enqueue_one() fails.
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
@ 2019-04-19 22:47         ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2019-04-19 22:47 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Jason Wang, virtualization,
	linux-mm, linux-kernel, Pv-drivers, Julien Freche

On Fri, Apr 19, 2019 at 10:34:04PM +0000, Nadav Amit wrote:
> > On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
> >> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> >> of pages. These interfaces reduce the overhead of storing and restoring
> >> IRQs by batching the operations. In addition they do not panic if the
> >> list of pages is empty.
> >> 
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: linux-mm@kvack.org
> >> Cc: virtualization@lists.linux-foundation.org
> >> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >> ---
> >> include/linux/balloon_compaction.h |   4 +
> >> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
> >> 2 files changed, 111 insertions(+), 38 deletions(-)
> >> 
> >> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> >> index f111c780ef1d..1da79edadb69 100644
> >> --- a/include/linux/balloon_compaction.h
> >> +++ b/include/linux/balloon_compaction.h
> >> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
> >> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> 				 struct page *page);
> >> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> >> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +				      struct list_head *pages);
> >> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				     struct list_head *pages, int n_req_pages);
> > 
> > Why size_t I wonder? It can never be > n_req_pages which is int.
> > Callers also seem to assume int.
> 
> Only because on the previous iteration
> ( https://lkml.org/lkml/2019/2/6/912 ) you said:
> 
> > Are we sure this int never overflows? Why not just use u64
> > or size_t straight away?

And the answer is because n_req_pages is an int too?

> 
> I am ok either way, but please be consistent.

I guess n_req_pages should be size_t too then?

> > 
> >> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
> >> {
> > 
> > 
> >> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> >> index ef858d547e2d..88d5d9a01072 100644
> >> --- a/mm/balloon_compaction.c
> >> +++ b/mm/balloon_compaction.c
> >> @@ -10,6 +10,106 @@
> >> #include <linux/export.h>
> >> #include <linux/balloon_compaction.h>
> >> 
> >> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
> >> +				     struct page *page)
> >> +{
> >> +	/*
> >> +	 * Block others from accessing the 'page' when we get around to
> >> +	 * establishing additional references. We should be the only one
> >> +	 * holding a reference to the 'page' at this point.
> >> +	 */
> >> +	if (!trylock_page(page)) {
> >> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
> >> +		return -EFAULT;
> > 
> > Looks like all callers bug on a failure. So let's just do it here,
> > and then make this void?
> 
> As you noted below, actually balloon_page_list_enqueue() does not do
> anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
> I always get pushed back on such things. Yes, this might lead to memory
> leak, but there is no reason to crash the system.

Need to audit callers to make sure they don't misbehave in worse ways.

I think in this case this indicates that someone is using the page so if
one keeps going and adds it into balloon this will lead to corruption down the road.

If you can change the caller code such that it's just a leak,
then a warning is more appropriate. Or even do not warn at all.


> >> +	}
> >> +	list_del(&page->lru);
> >> +	balloon_page_insert(b_dev_info, page);
> >> +	unlock_page(page);
> >> +	__count_vm_event(BALLOON_INFLATE);
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
> >> + *				 list.
> >> + * @b_dev_info: balloon device descriptor where we will insert a new page to
> >> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
> >> + *
> >> + * Driver must call it to properly enqueue a balloon pages before definitively
> >> + * removing it from the guest system.
> > 
> > A bunch of grammar error here. Pls fix for clarify.
> > Also - document that nothing must lock the pages? More assumptions?
> > What is "it" in this context? All pages? And what does removing from
> > guest mean? Really adding to the balloon?
> 
> I pretty much copy-pasted this description from balloon_page_enqueue(). I
> see that you edited this message in the past at least couple of times (e.g.,
> c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.
> 
> So maybe all of the comments in this file need a rework, but I don’t think
> this patch-set needs to do it.

I see.
That one dealt with one page so "it" was the page. This one deals with
many pages so you can't just copy it over without changes.
Makes it look like "it" refers to driver or guest.

> >> + *
> >> + * Return: number of pages that were enqueued.
> >> + */
> >> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +			       struct list_head *pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, pages, lru) {
> >> +		balloon_page_enqueue_one(b_dev_info, page);
> > 
> > Do we want to do something about an error here?
> 
> Hmm… This is really something that should never happen, but I still prefer
> to avoid BUG_ON(), as I said before. I will just not count the page.

Callers can BUG then if they want. That is fine but you then
need to change the callers to do it.

> > 
> >> +		n_pages++;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
> >> +
> >> +/**
> >> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
> >> + *				 returns a list of the pages.
> >> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> >> + * @pages: pointer to the list of pages that would be returned to the caller.
> >> + * @n_req_pages: number of requested pages.
> >> + *
> >> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
> >> + * before definetively releasing it back to the guest system. This function
> >> + * tries to remove @n_req_pages from the ballooned pages and return it to the
> >> + * caller in the @pages list.
> >> + *
> >> + * Note that this function may fail to dequeue some pages temporarily empty due
> >> + * to compaction isolated pages.
> >> + *
> >> + * Return: number of pages that were added to the @pages list.
> >> + */
> >> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				 struct list_head *pages, int n_req_pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> >> +		/*
> >> +		 * Block others from accessing the 'page' while we get around
> >> +		 * establishing additional references and preparing the 'page'
> >> +		 * to be released by the balloon driver.
> >> +		 */
> >> +		if (!trylock_page(page))
> >> +			continue;
> >> +
> >> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
> >> +		    PageIsolated(page)) {
> >> +			/* raced with isolation */
> >> +			unlock_page(page);
> >> +			continue;
> >> +		}
> >> +		balloon_page_delete(page);
> >> +		__count_vm_event(BALLOON_DEFLATE);
> >> +		unlock_page(page);
> >> +		list_add(&page->lru, pages);
> >> +		if (++n_pages >= n_req_pages)
> >> +			break;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> >> +
> >> /*
> >>  * balloon_page_alloc - allocates a new page for insertion into the balloon
> >>  *			  page list.
> >> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> {
> >> 	unsigned long flags;
> >> 
> >> -	/*
> >> -	 * Block others from accessing the 'page' when we get around to
> >> -	 * establishing additional references. We should be the only one
> >> -	 * holding a reference to the 'page' at this point.
> >> -	 */
> >> -	BUG_ON(!trylock_page(page));
> >> 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> -	balloon_page_insert(b_dev_info, page);
> >> -	__count_vm_event(BALLOON_INFLATE);
> >> +	balloon_page_enqueue_one(b_dev_info, page);
> > 
> > We used to bug on failure to lock page, now we
> > silently ignore this error. Why?
> 
> That’s a mistake. I’ll add a BUG_ON() if balloon_page_enqueue_one() fails.
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-04-19 22:34       ` Nadav Amit
  (?)
@ 2019-04-19 22:47       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2019-04-19 22:47 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Pv-drivers, Greg Kroah-Hartman, linux-kernel,
	virtualization, linux-mm, Julien Freche

On Fri, Apr 19, 2019 at 10:34:04PM +0000, Nadav Amit wrote:
> > On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
> >> Introduce interfaces for ballooning enqueueing and dequeueing of a list
> >> of pages. These interfaces reduce the overhead of storing and restoring
> >> IRQs by batching the operations. In addition they do not panic if the
> >> list of pages is empty.
> >> 
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: linux-mm@kvack.org
> >> Cc: virtualization@lists.linux-foundation.org
> >> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >> ---
> >> include/linux/balloon_compaction.h |   4 +
> >> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
> >> 2 files changed, 111 insertions(+), 38 deletions(-)
> >> 
> >> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> >> index f111c780ef1d..1da79edadb69 100644
> >> --- a/include/linux/balloon_compaction.h
> >> +++ b/include/linux/balloon_compaction.h
> >> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
> >> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> 				 struct page *page);
> >> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> >> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +				      struct list_head *pages);
> >> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				     struct list_head *pages, int n_req_pages);
> > 
> > Why size_t I wonder? It can never be > n_req_pages which is int.
> > Callers also seem to assume int.
> 
> Only because on the previous iteration
> ( https://lkml.org/lkml/2019/2/6/912 ) you said:
> 
> > Are we sure this int never overflows? Why not just use u64
> > or size_t straight away?

And the answer is because n_req_pages is an int too?

> 
> I am ok either way, but please be consistent.

I guess n_req_pages should be size_t too then?

> > 
> >> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
> >> {
> > 
> > 
> >> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> >> index ef858d547e2d..88d5d9a01072 100644
> >> --- a/mm/balloon_compaction.c
> >> +++ b/mm/balloon_compaction.c
> >> @@ -10,6 +10,106 @@
> >> #include <linux/export.h>
> >> #include <linux/balloon_compaction.h>
> >> 
> >> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
> >> +				     struct page *page)
> >> +{
> >> +	/*
> >> +	 * Block others from accessing the 'page' when we get around to
> >> +	 * establishing additional references. We should be the only one
> >> +	 * holding a reference to the 'page' at this point.
> >> +	 */
> >> +	if (!trylock_page(page)) {
> >> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
> >> +		return -EFAULT;
> > 
> > Looks like all callers bug on a failure. So let's just do it here,
> > and then make this void?
> 
> As you noted below, actually balloon_page_list_enqueue() does not do
> anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
> I always get pushed back on such things. Yes, this might lead to memory
> leak, but there is no reason to crash the system.

Need to audit callers to make sure they don't misbehave in worse ways.

I think in this case this indicates that someone is using the page so if
one keeps going and adds it into balloon this will lead to corruption down the road.

If you can change the caller code such that it's just a leak,
then a warning is more appropriate. Or even do not warn at all.


> >> +	}
> >> +	list_del(&page->lru);
> >> +	balloon_page_insert(b_dev_info, page);
> >> +	unlock_page(page);
> >> +	__count_vm_event(BALLOON_INFLATE);
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
> >> + *				 list.
> >> + * @b_dev_info: balloon device descriptor where we will insert a new page to
> >> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
> >> + *
> >> + * Driver must call it to properly enqueue a balloon pages before definitively
> >> + * removing it from the guest system.
> > 
> > A bunch of grammar error here. Pls fix for clarify.
> > Also - document that nothing must lock the pages? More assumptions?
> > What is "it" in this context? All pages? And what does removing from
> > guest mean? Really adding to the balloon?
> 
> I pretty much copy-pasted this description from balloon_page_enqueue(). I
> see that you edited this message in the past at least couple of times (e.g.,
> c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.
> 
> So maybe all of the comments in this file need a rework, but I don’t think
> this patch-set needs to do it.

I see.
That one dealt with one page so "it" was the page. This one deals with
many pages so you can't just copy it over without changes.
Makes it look like "it" refers to driver or guest.

> >> + *
> >> + * Return: number of pages that were enqueued.
> >> + */
> >> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> >> +			       struct list_head *pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, pages, lru) {
> >> +		balloon_page_enqueue_one(b_dev_info, page);
> > 
> > Do we want to do something about an error here?
> 
> Hmm… This is really something that should never happen, but I still prefer
> to avoid BUG_ON(), as I said before. I will just not count the page.

Callers can BUG then if they want. That is fine but you then
need to change the callers to do it.

> > 
> >> +		n_pages++;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_enqueue);
> >> +
> >> +/**
> >> + * balloon_page_list_dequeue() - removes pages from balloon's page list and
> >> + *				 returns a list of the pages.
> >> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> >> + * @pages: pointer to the list of pages that would be returned to the caller.
> >> + * @n_req_pages: number of requested pages.
> >> + *
> >> + * Driver must call it to properly de-allocate a previous enlisted balloon pages
> >> + * before definetively releasing it back to the guest system. This function
> >> + * tries to remove @n_req_pages from the ballooned pages and return it to the
> >> + * caller in the @pages list.
> >> + *
> >> + * Note that this function may fail to dequeue some pages temporarily empty due
> >> + * to compaction isolated pages.
> >> + *
> >> + * Return: number of pages that were added to the @pages list.
> >> + */
> >> +size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> >> +				 struct list_head *pages, int n_req_pages)
> >> +{
> >> +	struct page *page, *tmp;
> >> +	unsigned long flags;
> >> +	size_t n_pages = 0;
> >> +
> >> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> +	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> >> +		/*
> >> +		 * Block others from accessing the 'page' while we get around
> >> +		 * establishing additional references and preparing the 'page'
> >> +		 * to be released by the balloon driver.
> >> +		 */
> >> +		if (!trylock_page(page))
> >> +			continue;
> >> +
> >> +		if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) &&
> >> +		    PageIsolated(page)) {
> >> +			/* raced with isolation */
> >> +			unlock_page(page);
> >> +			continue;
> >> +		}
> >> +		balloon_page_delete(page);
> >> +		__count_vm_event(BALLOON_DEFLATE);
> >> +		unlock_page(page);
> >> +		list_add(&page->lru, pages);
> >> +		if (++n_pages >= n_req_pages)
> >> +			break;
> >> +	}
> >> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >> +
> >> +	return n_pages;
> >> +}
> >> +EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> >> +
> >> /*
> >>  * balloon_page_alloc - allocates a new page for insertion into the balloon
> >>  *			  page list.
> >> @@ -43,17 +143,9 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> >> {
> >> 	unsigned long flags;
> >> 
> >> -	/*
> >> -	 * Block others from accessing the 'page' when we get around to
> >> -	 * establishing additional references. We should be the only one
> >> -	 * holding a reference to the 'page' at this point.
> >> -	 */
> >> -	BUG_ON(!trylock_page(page));
> >> 	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >> -	balloon_page_insert(b_dev_info, page);
> >> -	__count_vm_event(BALLOON_INFLATE);
> >> +	balloon_page_enqueue_one(b_dev_info, page);
> > 
> > We used to bug on failure to lock page, now we
> > silently ignore this error. Why?
> 
> That’s a mistake. I’ll add a BUG_ON() if balloon_page_enqueue_one() fails.
> 
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
  2019-04-19 22:47         ` Michael S. Tsirkin
@ 2019-04-19 22:58           ` Nadav Amit
  -1 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-19 22:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Jason Wang, virtualization,
	linux-mm, linux-kernel, Pv-drivers, Julien Freche

> On Apr 19, 2019, at 3:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Apr 19, 2019 at 10:34:04PM +0000, Nadav Amit wrote:
>>> On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
>>>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>>>> of pages. These interfaces reduce the overhead of storing and restoring
>>>> IRQs by batching the operations. In addition they do not panic if the
>>>> list of pages is empty.
>>>> 
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> ---
>>>> include/linux/balloon_compaction.h |   4 +
>>>> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>>>> 2 files changed, 111 insertions(+), 38 deletions(-)
>>>> 
>>>> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
>>>> index f111c780ef1d..1da79edadb69 100644
>>>> --- a/include/linux/balloon_compaction.h
>>>> +++ b/include/linux/balloon_compaction.h
>>>> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>>>> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>>>> 				 struct page *page);
>>>> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>>>> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>>>> +				      struct list_head *pages);
>>>> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>>>> +				     struct list_head *pages, int n_req_pages);
>>> 
>>> Why size_t I wonder? It can never be > n_req_pages which is int.
>>> Callers also seem to assume int.
>> 
>> Only because on the previous iteration
>> ( https://lkml.org/lkml/2019/2/6/912 ) you said:
>> 
>>> Are we sure this int never overflows? Why not just use u64
>>> or size_t straight away?
> 
> And the answer is because n_req_pages is an int too?
> 
>> I am ok either way, but please be consistent.
> 
> I guess n_req_pages should be size_t too then?

Yes. I will change it.

> 
>>>> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>>>> {
>>> 
>>> 
>>>> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
>>>> index ef858d547e2d..88d5d9a01072 100644
>>>> --- a/mm/balloon_compaction.c
>>>> +++ b/mm/balloon_compaction.c
>>>> @@ -10,6 +10,106 @@
>>>> #include <linux/export.h>
>>>> #include <linux/balloon_compaction.h>
>>>> 
>>>> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
>>>> +				     struct page *page)
>>>> +{
>>>> +	/*
>>>> +	 * Block others from accessing the 'page' when we get around to
>>>> +	 * establishing additional references. We should be the only one
>>>> +	 * holding a reference to the 'page' at this point.
>>>> +	 */
>>>> +	if (!trylock_page(page)) {
>>>> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
>>>> +		return -EFAULT;
>>> 
>>> Looks like all callers bug on a failure. So let's just do it here,
>>> and then make this void?
>> 
>> As you noted below, actually balloon_page_list_enqueue() does not do
>> anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
>> I always get pushed back on such things. Yes, this might lead to memory
>> leak, but there is no reason to crash the system.
> 
> Need to audit callers to make sure they don't misbehave in worse ways.
> 
> I think in this case this indicates that someone is using the page so if
> one keeps going and adds it into balloon this will lead to corruption down the road.
> 
> If you can change the caller code such that it's just a leak,
> then a warning is more appropriate. Or even do not warn at all.

Yes, you are right (and I was wrong) - this is indeed much more than a
memory leak. I’ll see if it is easy to handle this case (I am not sure), but
I think the warning should stay.

>>>> +	}
>>>> +	list_del(&page->lru);
>>>> +	balloon_page_insert(b_dev_info, page);
>>>> +	unlock_page(page);
>>>> +	__count_vm_event(BALLOON_INFLATE);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/**
>>>> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
>>>> + *				 list.
>>>> + * @b_dev_info: balloon device descriptor where we will insert a new page to
>>>> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
>>>> + *
>>>> + * Driver must call it to properly enqueue a balloon pages before definitively
>>>> + * removing it from the guest system.
>>> 
>>> A bunch of grammar error here. Pls fix for clarify.
>>> Also - document that nothing must lock the pages? More assumptions?
>>> What is "it" in this context? All pages? And what does removing from
>>> guest mean? Really adding to the balloon?
>> 
>> I pretty much copy-pasted this description from balloon_page_enqueue(). I
>> see that you edited this message in the past at least couple of times (e.g.,
>> c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.
>> 
>> So maybe all of the comments in this file need a rework, but I don’t think
>> this patch-set needs to do it.
> 
> I see.
> That one dealt with one page so "it" was the page. This one deals with
> many pages so you can't just copy it over without changes.
> Makes it look like "it" refers to driver or guest.

I will fix “it”. ;-)

> 
>>>> + *
>>>> + * Return: number of pages that were enqueued.
>>>> + */
>>>> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>>>> +			       struct list_head *pages)
>>>> +{
>>>> +	struct page *page, *tmp;
>>>> +	unsigned long flags;
>>>> +	size_t n_pages = 0;
>>>> +
>>>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>>>> +	list_for_each_entry_safe(page, tmp, pages, lru) {
>>>> +		balloon_page_enqueue_one(b_dev_info, page);
>>> 
>>> Do we want to do something about an error here?
>> 
>> Hmm… This is really something that should never happen, but I still prefer
>> to avoid BUG_ON(), as I said before. I will just not count the page.
> 
> Callers can BUG then if they want. That is fine but you then
> need to change the callers to do it.

Ok, I’ll pay more attention this time. Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] mm/balloon_compaction: list interfaces
@ 2019-04-19 22:58           ` Nadav Amit
  0 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2019-04-19 22:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Jason Wang, virtualization,
	linux-mm, linux-kernel, Pv-drivers, Julien Freche

> On Apr 19, 2019, at 3:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Apr 19, 2019 at 10:34:04PM +0000, Nadav Amit wrote:
>>> On Apr 19, 2019, at 3:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Thu, Mar 28, 2019 at 01:07:15AM +0000, Nadav Amit wrote:
>>>> Introduce interfaces for ballooning enqueueing and dequeueing of a list
>>>> of pages. These interfaces reduce the overhead of storing and restoring
>>>> IRQs by batching the operations. In addition they do not panic if the
>>>> list of pages is empty.
>>>> 
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> ---
>>>> include/linux/balloon_compaction.h |   4 +
>>>> mm/balloon_compaction.c            | 145 +++++++++++++++++++++--------
>>>> 2 files changed, 111 insertions(+), 38 deletions(-)
>>>> 
>>>> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
>>>> index f111c780ef1d..1da79edadb69 100644
>>>> --- a/include/linux/balloon_compaction.h
>>>> +++ b/include/linux/balloon_compaction.h
>>>> @@ -64,6 +64,10 @@ extern struct page *balloon_page_alloc(void);
>>>> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>>>> 				 struct page *page);
>>>> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>>>> +extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>>>> +				      struct list_head *pages);
>>>> +extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>>>> +				     struct list_head *pages, int n_req_pages);
>>> 
>>> Why size_t I wonder? It can never be > n_req_pages which is int.
>>> Callers also seem to assume int.
>> 
>> Only because on the previous iteration
>> ( https://lkml.org/lkml/2019/2/6/912 ) you said:
>> 
>>> Are we sure this int never overflows? Why not just use u64
>>> or size_t straight away?
> 
> And the answer is because n_req_pages is an int too?
> 
>> I am ok either way, but please be consistent.
> 
> I guess n_req_pages should be size_t too then?

Yes. I will change it.

> 
>>>> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>>>> {
>>> 
>>> 
>>>> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
>>>> index ef858d547e2d..88d5d9a01072 100644
>>>> --- a/mm/balloon_compaction.c
>>>> +++ b/mm/balloon_compaction.c
>>>> @@ -10,6 +10,106 @@
>>>> #include <linux/export.h>
>>>> #include <linux/balloon_compaction.h>
>>>> 
>>>> +static int balloon_page_enqueue_one(struct balloon_dev_info *b_dev_info,
>>>> +				     struct page *page)
>>>> +{
>>>> +	/*
>>>> +	 * Block others from accessing the 'page' when we get around to
>>>> +	 * establishing additional references. We should be the only one
>>>> +	 * holding a reference to the 'page' at this point.
>>>> +	 */
>>>> +	if (!trylock_page(page)) {
>>>> +		WARN_ONCE(1, "balloon inflation failed to enqueue page\n");
>>>> +		return -EFAULT;
>>> 
>>> Looks like all callers bug on a failure. So let's just do it here,
>>> and then make this void?
>> 
>> As you noted below, actually balloon_page_list_enqueue() does not do
>> anything when an error occurs. I really prefer to avoid adding BUG_ON() - 
>> I always get pushed back on such things. Yes, this might lead to memory
>> leak, but there is no reason to crash the system.
> 
> Need to audit callers to make sure they don't misbehave in worse ways.
> 
> I think in this case this indicates that someone is using the page so if
> one keeps going and adds it into balloon this will lead to corruption down the road.
> 
> If you can change the caller code such that it's just a leak,
> then a warning is more appropriate. Or even do not warn at all.

Yes, you are right (and I was wrong) - this is indeed much more than a
memory leak. I’ll see if it is easy to handle this case (I am not sure), but
I think the warning should stay.

>>>> +	}
>>>> +	list_del(&page->lru);
>>>> +	balloon_page_insert(b_dev_info, page);
>>>> +	unlock_page(page);
>>>> +	__count_vm_event(BALLOON_INFLATE);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/**
>>>> + * balloon_page_list_enqueue() - inserts a list of pages into the balloon page
>>>> + *				 list.
>>>> + * @b_dev_info: balloon device descriptor where we will insert a new page to
>>>> + * @pages: pages to enqueue - allocated using balloon_page_alloc.
>>>> + *
>>>> + * Driver must call it to properly enqueue a balloon pages before definitively
>>>> + * removing it from the guest system.
>>> 
>>> A bunch of grammar error here. Pls fix for clarify.
>>> Also - document that nothing must lock the pages? More assumptions?
>>> What is "it" in this context? All pages? And what does removing from
>>> guest mean? Really adding to the balloon?
>> 
>> I pretty much copy-pasted this description from balloon_page_enqueue(). I
>> see that you edited this message in the past at least couple of times (e.g.,
>> c7cdff0e86471 “virtio_balloon: fix deadlock on OOM”) and left it as is.
>> 
>> So maybe all of the comments in this file need a rework, but I don’t think
>> this patch-set needs to do it.
> 
> I see.
> That one dealt with one page so "it" was the page. This one deals with
> many pages so you can't just copy it over without changes.
> Makes it look like "it" refers to driver or guest.

I will fix “it”. ;-)

> 
>>>> + *
>>>> + * Return: number of pages that were enqueued.
>>>> + */
>>>> +size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>>>> +			       struct list_head *pages)
>>>> +{
>>>> +	struct page *page, *tmp;
>>>> +	unsigned long flags;
>>>> +	size_t n_pages = 0;
>>>> +
>>>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>>>> +	list_for_each_entry_safe(page, tmp, pages, lru) {
>>>> +		balloon_page_enqueue_one(b_dev_info, page);
>>> 
>>> Do we want to do something about an error here?
>> 
>> Hmm… This is really something that should never happen, but I still prefer
>> to avoid BUG_ON(), as I said before. I will just not count the page.
> 
> Callers can BUG then if they want. That is fine but you then
> need to change the callers to do it.

Ok, I’ll pay more attention this time. Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-04-19 22:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-28  1:07 [PATCH v2 0/4] vmw_balloon: compaction and shrinker support Nadav Amit
2019-03-28  1:07 ` [PATCH v2 1/4] mm/balloon_compaction: list interfaces Nadav Amit
2019-04-08 17:35   ` Nadav Amit
2019-04-08 17:35     ` Nadav Amit
2019-04-19 21:41     ` Nadav Amit
2019-04-19 21:41     ` Nadav Amit
2019-04-19 21:41       ` Nadav Amit
2019-04-19 22:07   ` Michael S. Tsirkin
2019-04-19 22:34     ` Nadav Amit
2019-04-19 22:34       ` Nadav Amit
2019-04-19 22:47       ` Michael S. Tsirkin
2019-04-19 22:47       ` Michael S. Tsirkin
2019-04-19 22:47         ` Michael S. Tsirkin
2019-04-19 22:58         ` Nadav Amit
2019-04-19 22:58           ` Nadav Amit
2019-04-19 22:07   ` Michael S. Tsirkin
2019-03-28  1:07 ` [PATCH v2 2/4] vmw_balloon: compaction support Nadav Amit
2019-03-28  1:07 ` [PATCH v2 3/4] vmw_balloon: add memory shrinker Nadav Amit
2019-03-28  1:07 ` [PATCH v2 4/4] vmw_balloon: split refused pages Nadav Amit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.