All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function
@ 2020-01-07 13:09 lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region lantianyu1986
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, akpm, dan.j.williams, jgg,
	dave.hansen, richardw.yang, namit, Tianyu.Lan, david,
	christophe.leroy, michael.h.kelley
  Cc: linux-hyperv, linux-kernel, linux-mm, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V provides dynamic memory hot add/remove function.
Memory hot-add has already enabled in Hyper-V balloon driver.
Now add memory hot-remove function.

This patchset is based on the David Hildenbrand's "virtio-mem:
paravirtualized memory" RFC V4 version and use new interface
offline_and_remove_memory().
https://lkml.org/lkml/2019/12/12/681

Change since V1:
	- Split patch into small patch for review.
	- Convert ha_lock from spin lock to mutex.
	- Add a common work to handle all mem hot plug and
	balloon msg
	- Use offline_and_remove_memory() to offline and remove
	memory.

Tianyu Lan (10):
  mm/resource: Move child to new resource when release mem region.
  mm: expose is_mem_section_removable() symbol
  x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a
    common work
  x86/Hyper-V/Balloon: Convert spin lock ha_lock to mutex
  x86/Hyper-V/Balloon: Avoid releasing ha_lock when traverse
    ha_region_list
  x86/Hyper-V/Balloon: Enable mem hot-remove capability
  x86/Hyper-V/Balloon: Handle mem hot-remove request
  x86/Hyper-V/Balloon: Handle request with non-aligned page number
  x86/Hyper-V/Balloon: Hot add mem in the gaps of hot add region
  x86/Hyper-V: Workaround Hyper-V unballoon msg bug

 drivers/hv/hv_balloon.c | 754 ++++++++++++++++++++++++++++++++++++++++--------
 kernel/resource.c       |  38 ++-
 mm/memory_hotplug.c     |   1 +
 3 files changed, 673 insertions(+), 120 deletions(-)

-- 
2.14.5


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region.
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-20 18:34   ` Michael Kelley
  2020-01-20 19:20   ` Michael Kelley
  2020-01-07 13:09 ` [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol lantianyu1986
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, akpm, dan.j.williams, jgg,
	dave.hansen, richardw.yang, namit, Tianyu.Lan, david,
	christophe.leroy, michael.h.kelley
  Cc: linux-hyperv, linux-kernel, linux-mm, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

When release mem region, old mem region may be splited to
two regions. Current allocate new struct resource for high
end mem region but not move child resources whose ranges are
in the high end range to new resource. When adjust old mem
region's range, adjust_resource() detects child region's range
is out of new range and return error. Move child resources to
high end resource before adjusting old mem range.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 kernel/resource.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 76036a41143b..1c7362825134 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -181,6 +181,38 @@ static struct resource *alloc_resource(gfp_t flags)
 	return res;
 }
 
+static void move_child_to_newresource(struct resource *old,
+				      struct resource *new)
+{
+	struct resource *tmp, **p, **np;
+
+	if (!old->child)
+		return;
+
+	p = &old->child;
+	np = &new->child;
+
+	for (;;) {
+		tmp = *p;
+		if (!tmp)
+			break;
+
+		if (tmp->start >= new->start && tmp->end <= new->end) {
+			tmp->parent = new;
+			*np = tmp;
+			np = &tmp->sibling;
+			*p = tmp->sibling;
+
+			if (!tmp->sibling)
+				*np = NULL;
+			continue;
+		}
+
+		p = &tmp->sibling;
+	}
+}
+
+
 /* Return the conflict entry if you can't request it */
 static struct resource * __request_resource(struct resource *root, struct resource *new)
 {
@@ -1246,9 +1278,6 @@ EXPORT_SYMBOL(__release_region);
  * Note:
  * - Additional release conditions, such as overlapping region, can be
  *   supported after they are confirmed as valid cases.
- * - When a busy memory resource gets split into two entries, the code
- *   assumes that all children remain in the lower address entry for
- *   simplicity.  Enhance this logic when necessary.
  */
 int release_mem_region_adjustable(struct resource *parent,
 				  resource_size_t start, resource_size_t size)
@@ -1331,11 +1360,12 @@ int release_mem_region_adjustable(struct resource *parent,
 			new_res->sibling = res->sibling;
 			new_res->child = NULL;
 
+			move_child_to_newresource(res, new_res);
+			res->sibling = new_res;
 			ret = __adjust_resource(res, res->start,
 						start - res->start);
 			if (ret)
 				break;
-			res->sibling = new_res;
 			new_res = NULL;
 		}
 
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-07 13:36   ` Michal Hocko
  2020-01-07 13:09 ` [RFC PATCH V2 3/10] x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a common work lantianyu1986
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, akpm, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-mm, vkuznets,
	eric.devolder, vbabka, osalvador, pavel.tatashin, rppt, mhocko

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V balloon driver will use is_mem_section_removable() to
check whether memory block is removable or not when receive
memory hot remove msg. Expose it.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 mm/memory_hotplug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d04369e6d3cc..a4ebfc5c48b3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1179,6 +1179,7 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 	/* All pageblocks in the memory block are likely to be hot-removable */
 	return true;
 }
+EXPORT_SYMBOL_GPL(is_mem_section_removable);
 
 /*
  * Confirm all pages in a range [start, end) belong to the same zone.
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 3/10] x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a common work
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-20 19:12   ` Michael Kelley
  2020-01-07 13:09 ` [RFC PATCH V2 4/10] x86/Hyper-V/Balloon: Convert spin lock ha_lock to mutex lantianyu1986
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

The mem hot-remove operation and balloon down will be added
or moved into work context. Add a common work to handle
opeations of mem hot-add/remove and balloon up/down.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 86 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 52 insertions(+), 34 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index b155d0052981..bdb6791e6de1 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -447,15 +447,20 @@ struct hv_hotadd_gap {
 	unsigned long end_pfn;
 };
 
-struct balloon_state {
-	__u32 num_pages;
-	struct work_struct wrk;
+union dm_msg_info {
+	struct {
+		__u32 num_pages;
+	} balloon_state;
+	struct {
+		union dm_mem_page_range ha_page_range;
+		union dm_mem_page_range ha_region_range;
+	} hot_add;
 };
 
-struct hot_add_wrk {
-	union dm_mem_page_range ha_page_range;
-	union dm_mem_page_range ha_region_range;
+struct dm_msg_wrk {
+	enum dm_message_type msg_type;
 	struct work_struct wrk;
+	union dm_msg_info dm_msg;
 };
 
 static bool allow_hibernation;
@@ -514,14 +519,9 @@ struct hv_dynmem_device {
 	unsigned int num_pages_added;
 
 	/*
-	 * State to manage the ballooning (up) operation.
+	 * State to manage the ballooning (up) and "hot-add" operation.
 	 */
-	struct balloon_state balloon_wrk;
-
-	/*
-	 * State to execute the "hot-add" operation.
-	 */
-	struct hot_add_wrk ha_wrk;
+	struct dm_msg_wrk dm_wrk;
 
 	/*
 	 * This state tracks if the host has specified a hot-add
@@ -982,7 +982,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
 
 #endif
 
-static void hot_add_req(struct work_struct *dummy)
+static void hot_add_req(union dm_msg_info *msg_info)
 {
 	struct dm_hot_add_response resp;
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -996,11 +996,11 @@ static void hot_add_req(struct work_struct *dummy)
 	resp.hdr.size = sizeof(struct dm_hot_add_response);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-	pg_start = dm->ha_wrk.ha_page_range.finfo.start_page;
-	pfn_cnt = dm->ha_wrk.ha_page_range.finfo.page_cnt;
+	pg_start = msg_info->hot_add.ha_page_range.finfo.start_page;
+	pfn_cnt = msg_info->hot_add.ha_page_range.finfo.page_cnt;
 
-	rg_start = dm->ha_wrk.ha_region_range.finfo.start_page;
-	rg_sz = dm->ha_wrk.ha_region_range.finfo.page_cnt;
+	rg_start = msg_info->hot_add.ha_region_range.finfo.start_page;
+	rg_sz = msg_info->hot_add.ha_region_range.finfo.page_cnt;
 
 	if ((rg_start == 0) && (!dm->host_specified_ha_region)) {
 		unsigned long region_size;
@@ -1261,9 +1261,9 @@ static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
 	return num_pages;
 }
 
-static void balloon_up(struct work_struct *dummy)
+static void balloon_up(union dm_msg_info *msg_info)
 {
-	unsigned int num_pages = dm_device.balloon_wrk.num_pages;
+	unsigned int num_pages = msg_info->balloon_state.num_pages;
 	unsigned int num_ballooned = 0;
 	struct dm_balloon_response *bl_resp;
 	int alloc_unit;
@@ -1313,7 +1313,7 @@ static void balloon_up(struct work_struct *dummy)
 
 		if (num_ballooned == 0 || num_ballooned == num_pages) {
 			pr_debug("Ballooned %u out of %u requested pages.\n",
-				num_pages, dm_device.balloon_wrk.num_pages);
+				num_pages, msg_info->balloon_state.num_pages);
 
 			bl_resp->more_pages = 0;
 			done = true;
@@ -1355,6 +1355,22 @@ static void balloon_up(struct work_struct *dummy)
 
 }
 
+static void dm_msg_work(struct work_struct *dummy)
+{
+	union dm_msg_info *msg_info = &dm_device.dm_wrk.dm_msg;
+
+	switch (dm_device.dm_wrk.msg_type) {
+	case DM_BALLOON_REQUEST:
+		balloon_up(msg_info);
+		break;
+	case DM_MEM_HOT_ADD_REQUEST:
+		hot_add_req(msg_info);
+		break;
+	default:
+		return;
+	}
+}
+
 static void balloon_down(struct hv_dynmem_device *dm,
 			struct dm_unballoon_request *req)
 {
@@ -1490,6 +1506,8 @@ static void balloon_onchannelcallback(void *context)
 	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
 	struct dm_balloon *bal_msg;
 	struct dm_hot_add *ha_msg;
+	struct dm_msg_wrk *dm_wrk = &dm_device.dm_wrk;
+	union dm_msg_info *msg_info = &dm_wrk->dm_msg;
 	union dm_mem_page_range *ha_pg_range;
 	union dm_mem_page_range *ha_region;
 
@@ -1522,8 +1540,9 @@ static void balloon_onchannelcallback(void *context)
 				pr_warn("Currently ballooning\n");
 			bal_msg = (struct dm_balloon *)recv_buffer;
 			dm->state = DM_BALLOON_UP;
-			dm_device.balloon_wrk.num_pages = bal_msg->num_pages;
-			schedule_work(&dm_device.balloon_wrk.wrk);
+			msg_info->balloon_state.num_pages = bal_msg->num_pages;
+			dm_wrk->msg_type = DM_BALLOON_REQUEST;
+			schedule_work(&dm_wrk->wrk);
 			break;
 
 		case DM_UNBALLOON_REQUEST:
@@ -1549,8 +1568,9 @@ static void balloon_onchannelcallback(void *context)
 				 */
 				dm->host_specified_ha_region = false;
 				ha_pg_range = &ha_msg->range;
-				dm->ha_wrk.ha_page_range = *ha_pg_range;
-				dm->ha_wrk.ha_region_range.page_range = 0;
+				msg_info->hot_add.ha_page_range = *ha_pg_range;
+				msg_info->hot_add.ha_region_range.page_range
+						= 0;
 			} else {
 				/*
 				 * Host is specifying that we first hot-add
@@ -1560,10 +1580,11 @@ static void balloon_onchannelcallback(void *context)
 				dm->host_specified_ha_region = true;
 				ha_pg_range = &ha_msg->range;
 				ha_region = &ha_pg_range[1];
-				dm->ha_wrk.ha_page_range = *ha_pg_range;
-				dm->ha_wrk.ha_region_range = *ha_region;
+				msg_info->hot_add.ha_page_range = *ha_pg_range;
+				msg_info->hot_add.ha_region_range = *ha_region;
 			}
-			schedule_work(&dm_device.ha_wrk.wrk);
+			dm_wrk->msg_type = DM_MEM_HOT_ADD_REQUEST;
+			schedule_work(&dm_wrk->wrk);
 			break;
 
 		case DM_INFO_MESSAGE:
@@ -1707,8 +1728,7 @@ static int balloon_probe(struct hv_device *dev,
 	init_completion(&dm_device.config_event);
 	INIT_LIST_HEAD(&dm_device.ha_region_list);
 	spin_lock_init(&dm_device.ha_lock);
-	INIT_WORK(&dm_device.balloon_wrk.wrk, balloon_up);
-	INIT_WORK(&dm_device.ha_wrk.wrk, hot_add_req);
+	INIT_WORK(&dm_device.dm_wrk.wrk, dm_msg_work);
 	dm_device.host_specified_ha_region = false;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -1754,8 +1774,7 @@ static int balloon_remove(struct hv_device *dev)
 	if (dm->num_pages_ballooned != 0)
 		pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
 
-	cancel_work_sync(&dm->balloon_wrk.wrk);
-	cancel_work_sync(&dm->ha_wrk.wrk);
+	cancel_work_sync(&dm->dm_wrk.wrk);
 
 	kthread_stop(dm->thread);
 	vmbus_close(dev->channel);
@@ -1783,8 +1802,7 @@ static int balloon_suspend(struct hv_device *hv_dev)
 
 	tasklet_disable(&hv_dev->channel->callback_event);
 
-	cancel_work_sync(&dm->balloon_wrk.wrk);
-	cancel_work_sync(&dm->ha_wrk.wrk);
+	cancel_work_sync(&dm->dm_wrk.wrk);
 
 	if (dm->thread) {
 		kthread_stop(dm->thread);
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 4/10] x86/Hyper-V/Balloon: Convert spin lock ha_lock to mutex
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
                   ` (2 preceding siblings ...)
  2020-01-07 13:09 ` [RFC PATCH V2 3/10] x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a common work lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 5/10] x86/Hyper-V/Balloon: Avoid releasing ha_lock when traverse ha_region_list lantianyu1986
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

ha_lock is to protect ha_region_list and is hold in process
context. When process mem hot add msg, add_memory() will be
called in the loop of traversing ha_region_list in order to
find associated ha region. add_memory() holds device_hotplug_lock
mutex and ha_lock is also hold in hv_online_page() which is
called inside add_memory(). So current code needs to release
ha_lock before calling add_memory() in order to avoid holding
mutex under spin lock protection and holding ha_lock twice
which may cause dead lock. When implement mem hot remove, also
have such issue. To avoid releasing ha_lock in the loop of
traversing ha_region_list and simplify code, convert ha_lock
from spin lock to mutex first.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 49 ++++++++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index bdb6791e6de1..185146795122 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -546,7 +546,7 @@ struct hv_dynmem_device {
 	 * Protects ha_region_list, num_pages_onlined counter and individual
 	 * regions from ha_region_list.
 	 */
-	spinlock_t ha_lock;
+	struct mutex ha_lock;
 
 	/*
 	 * A list of hot-add regions.
@@ -629,7 +629,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 			      void *v)
 {
 	struct memory_notify *mem = (struct memory_notify *)v;
-	unsigned long flags, pfn_count;
+	unsigned long pfn_count;
 
 	switch (val) {
 	case MEM_ONLINE:
@@ -641,7 +641,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 		break;
 
 	case MEM_OFFLINE:
-		spin_lock_irqsave(&dm_device.ha_lock, flags);
+		mutex_lock(&dm_device.ha_lock);
 		pfn_count = hv_page_offline_check(mem->start_pfn,
 						  mem->nr_pages);
 		if (pfn_count <= dm_device.num_pages_onlined) {
@@ -655,7 +655,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 			WARN_ON_ONCE(1);
 			dm_device.num_pages_onlined = 0;
 		}
-		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+		mutex_unlock(&dm_device.ha_lock);
 		break;
 	case MEM_GOING_ONLINE:
 	case MEM_GOING_OFFLINE:
@@ -707,12 +707,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 	unsigned long start_pfn;
 	unsigned long processed_pfn;
 	unsigned long total_pfn = pfn_count;
-	unsigned long flags;
 
 	for (i = 0; i < (size/HA_CHUNK); i++) {
 		start_pfn = start + (i * HA_CHUNK);
 
-		spin_lock_irqsave(&dm_device.ha_lock, flags);
+		mutex_lock(&dm_device.ha_lock);
 		has->ha_end_pfn +=  HA_CHUNK;
 
 		if (total_pfn > HA_CHUNK) {
@@ -724,7 +723,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}
 
 		has->covered_end_pfn +=  processed_pfn;
-		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+		mutex_unlock(&dm_device.ha_lock);
 
 		init_completion(&dm_device.ol_waitevent);
 		dm_device.ha_waiting = !memhp_auto_online;
@@ -745,10 +744,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 				 */
 				do_hot_add = false;
 			}
-			spin_lock_irqsave(&dm_device.ha_lock, flags);
+			mutex_lock(&dm_device.ha_lock);
 			has->ha_end_pfn -= HA_CHUNK;
 			has->covered_end_pfn -=  processed_pfn;
-			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+			mutex_unlock(&dm_device.ha_lock);
 			break;
 		}
 
@@ -769,10 +768,9 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 static void hv_online_page(struct page *pg, unsigned int order)
 {
 	struct hv_hotadd_state *has;
-	unsigned long flags;
 	unsigned long pfn = page_to_pfn(pg);
 
-	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	mutex_lock(&dm_device.ha_lock);
 	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/* The page belongs to a different HAS. */
 		if ((pfn < has->start_pfn) ||
@@ -782,7 +780,7 @@ static void hv_online_page(struct page *pg, unsigned int order)
 		hv_bring_pgs_online(has, pfn, 1UL << order);
 		break;
 	}
-	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+	mutex_unlock(&dm_device.ha_lock);
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
@@ -791,9 +789,8 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 	struct hv_hotadd_gap *gap;
 	unsigned long residual, new_inc;
 	int ret = 0;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	mutex_lock(&dm_device.ha_lock);
 	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/*
 		 * If the pfn range we are dealing with is not in the current
@@ -840,7 +837,7 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 		ret = 1;
 		break;
 	}
-	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+	mutex_unlock(&dm_device.ha_lock);
 
 	return ret;
 }
@@ -854,12 +851,12 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 	struct hv_hotadd_state *has;
 	unsigned long pgs_ol = 0;
 	unsigned long old_covered_state;
-	unsigned long res = 0, flags;
+	unsigned long res = 0;
 
 	pr_debug("Hot adding %lu pages starting at pfn 0x%lx.\n", pg_count,
 		pg_start);
 
-	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	mutex_lock(&dm_device.ha_lock);
 	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/*
 		 * If the pfn range we are dealing with is not in the current
@@ -912,9 +909,9 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 			} else {
 				pfn_cnt = size;
 			}
-			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+			mutex_unlock(&dm_device.ha_lock);
 			hv_mem_hot_add(has->ha_end_pfn, size, pfn_cnt, has);
-			spin_lock_irqsave(&dm_device.ha_lock, flags);
+			mutex_lock(&dm_device.ha_lock);
 		}
 		/*
 		 * If we managed to online any pages that were given to us,
@@ -923,7 +920,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 		res = has->covered_end_pfn - old_covered_state;
 		break;
 	}
-	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+	mutex_unlock(&dm_device.ha_lock);
 
 	return res;
 }
@@ -935,7 +932,6 @@ static unsigned long process_hot_add(unsigned long pg_start,
 {
 	struct hv_hotadd_state *ha_region = NULL;
 	int covered;
-	unsigned long flags;
 
 	if (pfn_cnt == 0)
 		return 0;
@@ -967,9 +963,9 @@ static unsigned long process_hot_add(unsigned long pg_start,
 		ha_region->covered_end_pfn = pg_start;
 		ha_region->end_pfn = rg_start + rg_size;
 
-		spin_lock_irqsave(&dm_device.ha_lock, flags);
+		mutex_lock(&dm_device.ha_lock);
 		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
-		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+		mutex_unlock(&dm_device.ha_lock);
 	}
 
 do_pg_range:
@@ -1727,7 +1723,7 @@ static int balloon_probe(struct hv_device *dev,
 	init_completion(&dm_device.host_event);
 	init_completion(&dm_device.config_event);
 	INIT_LIST_HEAD(&dm_device.ha_region_list);
-	spin_lock_init(&dm_device.ha_lock);
+	mutex_init(&dm_device.ha_lock);
 	INIT_WORK(&dm_device.dm_wrk.wrk, dm_msg_work);
 	dm_device.host_specified_ha_region = false;
 
@@ -1769,7 +1765,6 @@ static int balloon_remove(struct hv_device *dev)
 	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
 	struct hv_hotadd_state *has, *tmp;
 	struct hv_hotadd_gap *gap, *tmp_gap;
-	unsigned long flags;
 
 	if (dm->num_pages_ballooned != 0)
 		pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
@@ -1782,7 +1777,7 @@ static int balloon_remove(struct hv_device *dev)
 	unregister_memory_notifier(&hv_memory_nb);
 	restore_online_page_callback(&hv_online_page);
 #endif
-	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	mutex_lock(&dm_device.ha_lock);
 	list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) {
 		list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
 			list_del(&gap->list);
@@ -1791,7 +1786,7 @@ static int balloon_remove(struct hv_device *dev)
 		list_del(&has->list);
 		kfree(has);
 	}
-	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+	mutex_unlock(&dm_device.ha_lock);
 
 	return 0;
 }
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 5/10] x86/Hyper-V/Balloon: Avoid releasing ha_lock when traverse ha_region_list
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
                   ` (3 preceding siblings ...)
  2020-01-07 13:09 ` [RFC PATCH V2 4/10] x86/Hyper-V/Balloon: Convert spin lock ha_lock to mutex lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 6/10] x86/Hyper-V/Balloon: Enable mem hot-remove capability lantianyu1986
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

ha_lock is to protect ha_region_list. It is held in
hv_online_page() and handle_pg_range(). handle_pg_range()
is to traverse ha region list, find associated hot-add region
and add memory into system. hv_online_page() is called inside
of add_memory(). Current code is to release ha_lock before
calling add_memory() to avoid holding ha_lock twice in the
hv_online_page().

To avoid releasing ha_lock, add "lock_thread" in the struct hv_
dynmem_device to record thread of traversing ha region list,
check "lock_thread" in the hv_online_page() and try holding
ha_lock when current thread is not "lock_thread".

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 185146795122..729dc5551302 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -547,6 +547,7 @@ struct hv_dynmem_device {
 	 * regions from ha_region_list.
 	 */
 	struct mutex ha_lock;
+	struct task_struct *lock_thread;
 
 	/*
 	 * A list of hot-add regions.
@@ -711,7 +712,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 	for (i = 0; i < (size/HA_CHUNK); i++) {
 		start_pfn = start + (i * HA_CHUNK);
 
-		mutex_lock(&dm_device.ha_lock);
 		has->ha_end_pfn +=  HA_CHUNK;
 
 		if (total_pfn > HA_CHUNK) {
@@ -723,7 +723,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}
 
 		has->covered_end_pfn +=  processed_pfn;
-		mutex_unlock(&dm_device.ha_lock);
 
 		init_completion(&dm_device.ol_waitevent);
 		dm_device.ha_waiting = !memhp_auto_online;
@@ -744,10 +743,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 				 */
 				do_hot_add = false;
 			}
-			mutex_lock(&dm_device.ha_lock);
 			has->ha_end_pfn -= HA_CHUNK;
 			has->covered_end_pfn -=  processed_pfn;
-			mutex_unlock(&dm_device.ha_lock);
 			break;
 		}
 
@@ -769,8 +766,14 @@ static void hv_online_page(struct page *pg, unsigned int order)
 {
 	struct hv_hotadd_state *has;
 	unsigned long pfn = page_to_pfn(pg);
+	int need_unlock;
+
+	/* If current thread hasn't hold ha_lock, take ha_lock here. */
+	if (dm_device.lock_thread != current) {
+		mutex_lock(&dm_device.ha_lock);
+		need_unlock = 1;
+	}
 
-	mutex_lock(&dm_device.ha_lock);
 	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/* The page belongs to a different HAS. */
 		if ((pfn < has->start_pfn) ||
@@ -780,7 +783,8 @@ static void hv_online_page(struct page *pg, unsigned int order)
 		hv_bring_pgs_online(has, pfn, 1UL << order);
 		break;
 	}
-	mutex_unlock(&dm_device.ha_lock);
+	if (need_unlock)
+		mutex_unlock(&dm_device.ha_lock);
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
@@ -857,6 +861,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 		pg_start);
 
 	mutex_lock(&dm_device.ha_lock);
+	dm_device.lock_thread = current;
 	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/*
 		 * If the pfn range we are dealing with is not in the current
@@ -909,9 +914,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 			} else {
 				pfn_cnt = size;
 			}
-			mutex_unlock(&dm_device.ha_lock);
 			hv_mem_hot_add(has->ha_end_pfn, size, pfn_cnt, has);
-			mutex_lock(&dm_device.ha_lock);
 		}
 		/*
 		 * If we managed to online any pages that were given to us,
@@ -920,6 +923,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 		res = has->covered_end_pfn - old_covered_state;
 		break;
 	}
+	dm_device.lock_thread = NULL;
 	mutex_unlock(&dm_device.ha_lock);
 
 	return res;
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 6/10] x86/Hyper-V/Balloon: Enable mem hot-remove capability
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
                   ` (4 preceding siblings ...)
  2020-01-07 13:09 ` [RFC PATCH V2 5/10] x86/Hyper-V/Balloon: Avoid releasing ha_lock when traverse ha_region_list lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request lantianyu1986
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Enable mem hot-remove capability, handle request in the
common work and add mem hot-remove operation later.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 101 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 729dc5551302..43e490f492d5 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -49,6 +49,7 @@
  * Changes to 0.2 on 2009/05/14
  * Changes to 0.3 on 2009/12/03
  * Changed to 1.0 on 2011/04/05
+ * Changed to 2.0 on 2019/12/10
  */
 
 #define DYNMEM_MAKE_VERSION(Major, Minor) ((__u32)(((Major) << 16) | (Minor)))
@@ -94,7 +95,13 @@ enum dm_message_type {
 	 * Version 1.0.
 	 */
 	DM_INFO_MESSAGE			= 12,
-	DM_VERSION_1_MAX		= 12
+	DM_VERSION_1_MAX		= 12,
+
+	/*
+	 * Version 2.0
+	 */
+	DM_MEM_HOT_REMOVE_REQUEST        = 13,
+	DM_MEM_HOT_REMOVE_RESPONSE       = 14
 };
 
 
@@ -123,7 +130,8 @@ union dm_caps {
 		 * represents an alignment of 2^n in mega bytes.
 		 */
 		__u64 hot_add_alignment:4;
-		__u64 reservedz:58;
+		__u64 hot_remove:1;
+		__u64 reservedz:57;
 	} cap_bits;
 	__u64 caps;
 } __packed;
@@ -234,7 +242,9 @@ struct dm_capabilities {
 struct dm_capabilities_resp_msg {
 	struct dm_header hdr;
 	__u64 is_accepted:1;
-	__u64 reservedz:63;
+	__u64 hot_remove:1;
+	__u64 suppress_pressure_reports:1;
+	__u64 reservedz:61;
 } __packed;
 
 /*
@@ -377,6 +387,27 @@ struct dm_hot_add_response {
 	__u32 result;
 } __packed;
 
+struct dm_hot_remove {
+	struct dm_header hdr;
+	__u32 virtual_node;
+	__u32 page_count;
+	__u32 qos_flags;
+	__u32 reservedZ;
+} __packed;
+
+struct dm_hot_remove_response {
+	struct dm_header hdr;
+	__u32 result;
+	__u32 range_count;
+	__u64 more_pages:1;
+	__u64 reservedz:63;
+	union dm_mem_page_range range_array[];
+} __packed;
+
+#define DM_REMOVE_QOS_LARGE	 (1 << 0)
+#define DM_REMOVE_QOS_LOCAL	 (1 << 1)
+#define DM_REMOVE_QOS_MASK       (0x3)
+
 /*
  * Types of information sent from host to the guest.
  */
@@ -455,6 +486,11 @@ union dm_msg_info {
 		union dm_mem_page_range ha_page_range;
 		union dm_mem_page_range ha_region_range;
 	} hot_add;
+	struct {
+		__u32 virtual_node;
+		__u32 page_count;
+		__u32 qos_flags;
+	} hot_remove;
 };
 
 struct dm_msg_wrk {
@@ -496,6 +532,7 @@ enum hv_dm_state {
 	DM_BALLOON_UP,
 	DM_BALLOON_DOWN,
 	DM_HOT_ADD,
+	DM_HOT_REMOVE,
 	DM_INIT_ERROR
 };
 
@@ -571,6 +608,35 @@ static struct hv_dynmem_device dm_device;
 
 static void post_status(struct hv_dynmem_device *dm);
 
+static int hv_send_hot_remove_response(
+		struct dm_hot_remove_response *resp,
+		unsigned long request_count, bool more_pages)
+{
+	struct hv_dynmem_device *dm = &dm_device;
+	int ret;
+
+	resp->hdr.type = DM_MEM_HOT_REMOVE_RESPONSE;
+	resp->range_count = request_count;
+	resp->more_pages = more_pages;
+	resp->hdr.size = sizeof(struct dm_hot_remove_response)
+			+ sizeof(union dm_mem_page_range) * request_count;
+	resp->result = !!request_count;
+
+	do {
+		resp->hdr.trans_id = atomic_inc_return(&trans_id);
+		ret = vmbus_sendpacket(dm->dev->channel, resp,
+				       resp->hdr.size,
+				       (unsigned long)NULL,
+				       VM_PKT_DATA_INBAND, 0);
+
+		if (ret == -EAGAIN)
+			msleep(20);
+		post_status(&dm_device);
+	} while (ret == -EAGAIN);
+
+	return ret;
+}
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 static inline bool has_pfn_is_backed(struct hv_hotadd_state *has,
 				     unsigned long pfn)
@@ -982,6 +1048,17 @@ static unsigned long process_hot_add(unsigned long pg_start,
 
 #endif
 
+static void hot_remove_req(union dm_msg_info *msg_info)
+{
+	struct hv_dynmem_device *dm = &dm_device;
+
+	/* Add hot remove operation later and send failure response. */
+	hv_send_hot_remove_response((struct dm_hot_remove_response *)
+				balloon_up_send_buffer, 0, false);
+
+	dm->state = DM_INITIALIZED;
+}
+
 static void hot_add_req(union dm_msg_info *msg_info)
 {
 	struct dm_hot_add_response resp;
@@ -1366,6 +1443,9 @@ static void dm_msg_work(struct work_struct *dummy)
 	case DM_MEM_HOT_ADD_REQUEST:
 		hot_add_req(msg_info);
 		break;
+	case DM_MEM_HOT_REMOVE_REQUEST:
+		hot_remove_req(msg_info);
+		break;
 	default:
 		return;
 	}
@@ -1506,6 +1586,7 @@ static void balloon_onchannelcallback(void *context)
 	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
 	struct dm_balloon *bal_msg;
 	struct dm_hot_add *ha_msg;
+	struct dm_hot_remove *hr_msg;
 	struct dm_msg_wrk *dm_wrk = &dm_device.dm_wrk;
 	union dm_msg_info *msg_info = &dm_wrk->dm_msg;
 	union dm_mem_page_range *ha_pg_range;
@@ -1587,6 +1668,22 @@ static void balloon_onchannelcallback(void *context)
 			schedule_work(&dm_wrk->wrk);
 			break;
 
+		case DM_MEM_HOT_REMOVE_REQUEST:
+			if (dm->state == DM_HOT_REMOVE)
+				pr_warn("Currently hot-removing.\n");
+
+			dm->state = DM_HOT_REMOVE;
+			hr_msg = (struct dm_hot_remove *)recv_buffer;
+
+			msg_info->hot_remove.virtual_node
+					= hr_msg->virtual_node;
+			msg_info->hot_remove.page_count = hr_msg->page_count;
+			msg_info->hot_remove.qos_flags = hr_msg->qos_flags;
+
+			dm_wrk->msg_type = DM_MEM_HOT_REMOVE_REQUEST;
+			schedule_work(&dm_wrk->wrk);
+			break;
+
 		case DM_INFO_MESSAGE:
 			process_info(dm, (struct dm_info_msg *)dm_msg);
 			break;
@@ -1665,6 +1762,7 @@ static int balloon_connect_vsp(struct hv_device *dev)
 	 */
 	cap_msg.caps.cap_bits.balloon = 1;
 	cap_msg.caps.cap_bits.hot_add = 1;
+	cap_msg.caps.cap_bits.hot_remove = 1;
 
 	/*
 	 * Specify our alignment requirements as it relates
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
                   ` (5 preceding siblings ...)
  2020-01-07 13:09 ` [RFC PATCH V2 6/10] x86/Hyper-V/Balloon: Enable mem hot-remove capability lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-08  9:54   ` kbuild test robot
  2020-01-08 12:03   ` kbuild test robot
  2020-01-07 13:09 ` [RFC PATCH V2 8/10] x86/Hyper-V/Balloon: Handle request with non-aligned page number lantianyu1986
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Linux system mem hot plug unit is 128MB and request page
number maybe not aligned with unit. The non-aligned case
will handle in the later.

Handle mem hot-remve request:
First, search memory from ha region list. If find suitable memory
block, offline & remove memory and create ha region region "gap"
struct for the range. "gap" means the range in the hot-add region
is offlined or removed. The following mem hot-add msg may add
memory in the gap range back.

If there is no suitable memory in the hot-add region, search memory
from the system memory on the target node and perform offline&remove
memory.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 188 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 184 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 43e490f492d5..3d8c09fe148a 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -56,6 +56,10 @@
 #define DYNMEM_MAJOR_VERSION(Version) ((__u32)(Version) >> 16)
 #define DYNMEM_MINOR_VERSION(Version) ((__u32)(Version) & 0xff)
 
+#define MAX_HOT_REMOVE_ENTRIES						\
+		((PAGE_SIZE - sizeof(struct dm_hot_remove_response))	\
+		 / sizeof(union dm_mem_page_range))
+
 enum {
 	DYNMEM_PROTOCOL_VERSION_1 = DYNMEM_MAKE_VERSION(0, 3),
 	DYNMEM_PROTOCOL_VERSION_2 = DYNMEM_MAKE_VERSION(1, 0),
@@ -697,6 +701,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 {
 	struct memory_notify *mem = (struct memory_notify *)v;
 	unsigned long pfn_count;
+	int need_unlock;
 
 	switch (val) {
 	case MEM_ONLINE:
@@ -708,7 +713,11 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 		break;
 
 	case MEM_OFFLINE:
-		mutex_lock(&dm_device.ha_lock);
+		if (dm_device.lock_thread != current) {
+			mutex_lock(&dm_device.ha_lock);
+			need_unlock = 1;
+		}
+
 		pfn_count = hv_page_offline_check(mem->start_pfn,
 						  mem->nr_pages);
 		if (pfn_count <= dm_device.num_pages_onlined) {
@@ -722,7 +731,9 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 			WARN_ON_ONCE(1);
 			dm_device.num_pages_onlined = 0;
 		}
-		mutex_unlock(&dm_device.ha_lock);
+
+		if (need_unlock)
+			mutex_unlock(&dm_device.ha_lock);
 		break;
 	case MEM_GOING_ONLINE:
 	case MEM_GOING_OFFLINE:
@@ -1046,14 +1057,183 @@ static unsigned long process_hot_add(unsigned long pg_start,
 	return handle_pg_range(pg_start, pfn_cnt);
 }
 
+static int hv_hot_remove_range(unsigned int nid, unsigned long start_pfn,
+			       unsigned long end_pfn, unsigned long nr_pages,
+			       unsigned long *request_index,
+			       union dm_mem_page_range *range_array,
+			       struct hv_hotadd_state *has)
+{
+	unsigned long block_pages = HA_CHUNK;
+	unsigned long rm_pages = nr_pages;
+	unsigned long pfn;
+	int ret;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += block_pages) {
+		struct hv_hotadd_gap *gap;
+		int in_gap;
+
+		if (*request_index >= MAX_HOT_REMOVE_ENTRIES) {
+			struct dm_hot_remove_response *resp =
+				(struct dm_hot_remove_response *)
+					balloon_up_send_buffer;
+
+			/* Flush out all hot-remove ranges. */
+			ret = hv_send_hot_remove_response(resp, *request_index,
+							  true);
+			if (ret)
+				return ret;
+
+			/* Reset request buffer. */
+			memset(resp, 0x00, PAGE_SIZE);
+			*request_index = 0;
+		}
+
+		/*
+		 * Memory in hot-add region gaps has been offlined or removed
+		 * and so skip it if remove range overlap with gap.
+		 */
+		if (has) {
+			list_for_each_entry(gap, &has->gap_list, list)
+				if (!(pfn >= gap->end_pfn ||
+				      pfn + block_pages < gap->start_pfn)) {
+					in_gap = 1;
+					break;
+				}
+
+			if (in_gap)
+				continue;
+		}
+
+		if (online_section_nr(pfn_to_section_nr(pfn))
+		    && is_mem_section_removable(pfn, block_pages)) {
+			ret = offline_and_remove_memory(nid, pfn << PAGE_SHIFT,
+					block_pages << PAGE_SHIFT);
+			if (ret)
+				continue;
+
+			range_array[*request_index].finfo.start_page = pfn;
+			range_array[*request_index].finfo.page_cnt
+					= block_pages;
+
+			(*request_index)++;
+			nr_pages -= block_pages;
+
+			if (!nr_pages)
+				break;
+		}
+	}
+
+	return rm_pages - nr_pages;
+}
+
+static int hv_hot_remove_from_ha_list(unsigned int nid, unsigned long nr_pages,
+				      unsigned long *request_index,
+				      union dm_mem_page_range *range_array)
+{
+	struct hv_hotadd_state *has;
+	unsigned long start_pfn, end_pfn;
+	int rm_pages;
+	int old_index;
+	int ret, i;
+
+	mutex_lock(&dm_device.ha_lock);
+	dm_device.lock_thread = current;
+	list_for_each_entry(has, &dm_device.ha_region_list, list) {
+		rm_pages = min(nr_pages,
+				has->covered_end_pfn - has->start_pfn);
+		start_pfn = has->start_pfn;
+		end_pfn = has->covered_end_pfn;
+		old_index = *request_index;
+
+		if (!rm_pages || pfn_to_nid(start_pfn) != nid)
+			continue;
+
+		rm_pages = hv_hot_remove_range(nid, start_pfn, end_pfn,
+				rm_pages, request_index, range_array, has);
+		if (rm_pages < 0) {
+			ret = rm_pages;
+			goto error;
+		} else if (!rm_pages) {
+			continue;
+		}
+
+		nr_pages -= rm_pages;
+		dm_device.num_pages_added -= rm_pages;
+
+		/* Create gaps for hot remove regions. */
+		for (i = old_index; i < *request_index; i++) {
+			struct hv_hotadd_gap *gap;
+
+			gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
+			if (!gap) {
+				/*
+				 * Disable dm hot-plug when fails to allocate
+				 * memory for gaps.
+				 */
+				ret = -ENOMEM;
+				do_hot_add = false;
+				goto error;
+			}
+
+			INIT_LIST_HEAD(&gap->list);
+			gap->start_pfn = range_array[i].finfo.start_page;
+			gap->end_pfn =
+				gap->start_pfn + range_array[i].finfo.page_cnt;
+			list_add_tail(&gap->list, &has->gap_list);
+		}
+
+		if (!nr_pages)
+			break;
+	}
+
+	ret = nr_pages;
+ error:
+	dm_device.lock_thread = NULL;
+	mutex_unlock(&dm_device.ha_lock);
+
+	return ret;
+}
+
+static void hv_mem_hot_remove(unsigned int nid, u64 nr_pages)
+{
+	struct dm_hot_remove_response *resp
+		= (struct dm_hot_remove_response *)balloon_up_send_buffer;
+	unsigned long start_pfn = node_start_pfn(nid);
+	unsigned long end_pfn = node_end_pfn(nid);
+	unsigned long request_index = 0;
+	int remain_pages;
+
+	/* Todo: Handle request of non-aligned page number later. */
+
+	/* Search hot-remove memory region from hot add list first.*/
+	memset(resp, 0x00, PAGE_SIZE);
+	remain_pages = hv_hot_remove_from_ha_list(nid, nr_pages,
+				&request_index,
+				resp->range_array);
+	if (remain_pages < 0) {
+		/* Send failure response msg. */
+		request_index = 0;
+	} else if (remain_pages) {
+		start_pfn = ALIGN(start_pfn, HA_CHUNK);
+		hv_hot_remove_range(nid, start_pfn, end_pfn, remain_pages,
+				    &request_index, resp->range_array, NULL);
+	}
+
+	hv_send_hot_remove_response(resp, request_index, false);
+}
+
 #endif
 
 static void hot_remove_req(union dm_msg_info *msg_info)
 {
 	struct hv_dynmem_device *dm = &dm_device;
+	unsigned int numa_node = msg_info->hot_remove.virtual_node;
+	unsigned int page_count = msg_info->hot_remove.page_count;
 
-	/* Add hot remove operation later and send failure response. */
-	hv_send_hot_remove_response((struct dm_hot_remove_response *)
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && do_hot_add)
+		hv_mem_hot_remove(numa_node, page_count);
+	else
+		hv_send_hot_remove_response((struct dm_hot_remove_response *)
 				balloon_up_send_buffer, 0, false);
 
 	dm->state = DM_INITIALIZED;
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 8/10] x86/Hyper-V/Balloon: Handle request with non-aligned page number
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
                   ` (6 preceding siblings ...)
  2020-01-07 13:09 ` [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 9/10] x86/Hyper-V/Balloon: Hot add mem in the gaps of hot add region lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 10/10] x86/Hyper-V: Workaround Hyper-V unballoon msg bug lantianyu1986
  9 siblings, 0 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

The mem hot-remove msg may request non-aligned page number.
Hot plug unit is 128MB. Handle the remainder pages via
balloon way that allocate memory, offline pages and return
mem back to host. In order to help to check whether memory
range in mem hot add msg is allocated by balloon driver
or not, set page private to 1 when allocate page. If the
pages associated with mem range in hot add msg is present
and page private is non-zero, clear offline flag and free
mem to return back.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 149 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 129 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 3d8c09fe148a..f76c9bd7fe2f 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -641,6 +641,32 @@ static int hv_send_hot_remove_response(
 	return ret;
 }
 
+static void free_allocated_pages(__u64 start_frame, int num_pages)
+{
+	struct page *pg;
+	int i;
+
+	for (i = 0; i < num_pages; i++) {
+		pg = pfn_to_page(i + start_frame);
+
+		if (page_private(pfn_to_page(i)))
+			set_page_private(pfn_to_page(i), 0);
+
+		__ClearPageOffline(pg);
+		__free_page(pg);
+		dm_device.num_pages_ballooned--;
+	}
+}
+
+static void free_balloon_pages(struct hv_dynmem_device *dm,
+			 union dm_mem_page_range *range_array)
+{
+	int num_pages = range_array->finfo.page_cnt;
+	__u64 start_frame = range_array->finfo.start_page;
+
+	free_allocated_pages(start_frame, num_pages);
+}
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 static inline bool has_pfn_is_backed(struct hv_hotadd_state *has,
 				     unsigned long pfn)
@@ -1017,6 +1043,16 @@ static unsigned long process_hot_add(unsigned long pg_start,
 	if (pfn_cnt == 0)
 		return 0;
 
+	/*
+	 * Check whether page is allocated by driver via page private
+	 * data due to remainder pages.
+	 */
+	if (present_section_nr(pfn_to_section_nr(pg_start))
+	    && page_private(pfn_to_page(pg_start))) {
+		free_allocated_pages(pg_start, pfn_cnt);
+		return pfn_cnt;
+	}
+
 	if (!dm_device.host_specified_ha_region) {
 		covered = pfn_covered(pg_start, pfn_cnt);
 		if (covered < 0)
@@ -1194,6 +1230,82 @@ static int hv_hot_remove_from_ha_list(unsigned int nid, unsigned long nr_pages,
 	return ret;
 }
 
+static int hv_hot_remove_pages(struct dm_hot_remove_response *resp,
+			       u64 nr_pages, unsigned long *request_index,
+			       bool more_pages)
+{
+	int i, j, alloc_unit = PAGES_IN_2M;
+	struct page *pg;
+	int ret;
+
+	for (i = 0; i < nr_pages; i += alloc_unit) {
+		if (*request_index >= MAX_HOT_REMOVE_ENTRIES) {
+			/* Flush out all hot-remove ranges. */
+			ret = hv_send_hot_remove_response(resp,
+					*request_index, true);
+			if (ret)
+				goto free_pages;
+
+			/*
+			 * Continue to allocate memory for hot remove
+			 * after resetting send buffer and array index.
+			 */
+			memset(resp, 0x00, PAGE_SIZE);
+			*request_index = 0;
+		}
+retry:
+		pg = alloc_pages(GFP_HIGHUSER | __GFP_NORETRY |
+			__GFP_NOMEMALLOC | __GFP_NOWARN,
+			get_order(alloc_unit << PAGE_SHIFT));
+		if (!pg) {
+			if (alloc_unit == 1) {
+				ret = -ENOMEM;
+				goto free_pages;
+			}
+
+			alloc_unit = 1;
+			goto retry;
+		}
+
+		if (alloc_unit != 1)
+			split_page(pg, get_order(alloc_unit << PAGE_SHIFT));
+
+		for (j = 0; j < (1 << get_order(alloc_unit << PAGE_SHIFT));
+		    j++) {
+			__SetPageOffline(pg + j);
+
+			/*
+			 * Set page's private data to non-zero and use it
+			 * to identify whehter the page is allocated by driver
+			 * or new hot-add memory in process_hot_add().
+			 */
+			set_page_private(pg + j, 1);
+		}
+
+		resp->range_array[*request_index].finfo.start_page
+				= page_to_pfn(pg);
+		resp->range_array[*request_index].finfo.page_cnt
+				= alloc_unit;
+		(*request_index)++;
+
+		dm_device.num_pages_ballooned += alloc_unit;
+	}
+
+	ret = hv_send_hot_remove_response(resp, *request_index, more_pages);
+	if (ret)
+		goto free_pages;
+
+	return 0;
+
+free_pages:
+	for (i = 0; i < *request_index; i++)
+		free_balloon_pages(&dm_device, &resp->range_array[i]);
+
+	/* Response hot remove failure. */
+	hv_send_hot_remove_response(resp, 0, false);
+	return ret;
+}
+
 static void hv_mem_hot_remove(unsigned int nid, u64 nr_pages)
 {
 	struct dm_hot_remove_response *resp
@@ -1201,9 +1313,24 @@ static void hv_mem_hot_remove(unsigned int nid, u64 nr_pages)
 	unsigned long start_pfn = node_start_pfn(nid);
 	unsigned long end_pfn = node_end_pfn(nid);
 	unsigned long request_index = 0;
-	int remain_pages;
+	unsigned long remainder = nr_pages % HA_CHUNK;
+	int remain_pages, ret;
 
-	/* Todo: Handle request of non-aligned page number later. */
+	/*
+	 * If page number isn't aligned with memory hot plug unit,
+	 * handle remainder pages via balloon way.
+	 */
+	if (remainder) {
+		memset(resp, 0x00, PAGE_SIZE);
+		ret = hv_hot_remove_pages(resp, remainder, &request_index,
+				!!(nr_pages - remainder));
+		if (ret)
+			return;
+
+		nr_pages -= remainder;
+		if (!nr_pages)
+			return;
+	}
 
 	/* Search hot-remove memory region from hot add list first.*/
 	memset(resp, 0x00, PAGE_SIZE);
@@ -1448,24 +1575,6 @@ static void post_status(struct hv_dynmem_device *dm)
 
 }
 
-static void free_balloon_pages(struct hv_dynmem_device *dm,
-			 union dm_mem_page_range *range_array)
-{
-	int num_pages = range_array->finfo.page_cnt;
-	__u64 start_frame = range_array->finfo.start_page;
-	struct page *pg;
-	int i;
-
-	for (i = 0; i < num_pages; i++) {
-		pg = pfn_to_page(i + start_frame);
-		__ClearPageOffline(pg);
-		__free_page(pg);
-		dm->num_pages_ballooned--;
-	}
-}
-
-
-
 static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
 					unsigned int num_pages,
 					struct dm_balloon_response *bl_resp,
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 9/10] x86/Hyper-V/Balloon: Hot add mem in the gaps of hot add region
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
                   ` (7 preceding siblings ...)
  2020-01-07 13:09 ` [RFC PATCH V2 8/10] x86/Hyper-V/Balloon: Handle request with non-aligned page number lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-07 13:09 ` [RFC PATCH V2 10/10] x86/Hyper-V: Workaround Hyper-V unballoon msg bug lantianyu1986
  9 siblings, 0 replies; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Mem hot remove operation may find memory in the hot add region
and create gaps in ha region list if there is hot-add memory
at that point. The following hot add msg may contain memory range
in these gaps. Handle such request and change gap range after
adding memory.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 105 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index f76c9bd7fe2f..5aaae62955bf 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -907,10 +907,11 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 			continue;
 
 		/*
-		 * If the current start pfn is not where the covered_end
-		 * is, create a gap and update covered_end_pfn.
+		 * If the current start pfn is greater than covered_end_pfn,
+		 * create a gap and update covered_end_pfn. Start pfn may
+		 * locate at gap rangs which is created during mem hot remove.
 		 */
-		if (has->covered_end_pfn != start_pfn) {
+		if (has->covered_end_pfn < start_pfn) {
 			gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
 			if (!gap) {
 				ret = -ENOMEM;
@@ -949,6 +950,91 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 	return ret;
 }
 
+static int handle_hot_add_in_gap(unsigned long start, unsigned long pg_cnt,
+			  struct hv_hotadd_state *has)
+{
+	struct hv_hotadd_gap *gap, *new_gap, *tmp_gap;
+	unsigned long pfn_cnt = pg_cnt;
+	unsigned long start_pfn = start;
+	unsigned long end_pfn;
+	unsigned long pages;
+	unsigned long pgs_ol;
+	unsigned long block_pages = HA_CHUNK;
+	unsigned long pfn;
+	int nid;
+	int ret;
+
+	list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
+
+		if ((start_pfn < gap->start_pfn)
+		    || (start_pfn >= gap->end_pfn))
+			continue;
+
+		end_pfn = min(gap->end_pfn, start_pfn + pfn_cnt);
+		pgs_ol = end_pfn - start_pfn;
+
+		/*
+		 * hv_bring_pgs_online() identifies whether pfn
+		 * should be online or not via checking pfn is in
+		 * hot add covered range or gap range(Detail see
+		 * has_pfn_is_backed()). So adjust gap before bringing
+		 * online or add memory.
+		 */
+		if (gap->end_pfn - gap->start_pfn == pgs_ol) {
+			list_del(&gap->list);
+			kfree(gap);
+		} else if (gap->start_pfn < start && gap->end_pfn == end_pfn) {
+			gap->end_pfn = start_pfn;
+		} else if (gap->end_pfn > end_pfn
+		   && gap->start_pfn == start_pfn) {
+			gap->start_pfn = end_pfn;
+		} else {
+			gap->end_pfn = start_pfn;
+
+			new_gap = kzalloc(sizeof(struct hv_hotadd_gap),
+					GFP_ATOMIC);
+			if (!new_gap) {
+				do_hot_add = false;
+				return -ENOMEM;
+			}
+
+			INIT_LIST_HEAD(&new_gap->list);
+			new_gap->start_pfn = end_pfn;
+			new_gap->end_pfn = gap->end_pfn;
+			list_add_tail(&gap->list, &has->gap_list);
+		}
+
+		/* Bring online or add memmory in gaps. */
+		for (pfn = start_pfn; pfn < end_pfn;
+		     pfn = round_up(pfn + 1, block_pages)) {
+			pages = min(round_up(pfn + 1, block_pages),
+				    end_pfn) - pfn;
+
+			if (online_section_nr(pfn_to_section_nr(pfn))) {
+				hv_bring_pgs_online(has, pfn, pages);
+			} else {
+				nid = memory_add_physaddr_to_nid(PFN_PHYS(pfn));
+				ret = add_memory(nid, PFN_PHYS(pfn),
+						 round_up(pages, block_pages)
+						 << PAGE_SHIFT);
+				if (ret) {
+					pr_err("Fail to add memory in gaps(error=%d).\n",
+					       ret);
+					do_hot_add = false;
+					return ret;
+				}
+			}
+		}
+
+		start_pfn += pgs_ol;
+		pfn_cnt -= pgs_ol;
+		if (!pfn_cnt)
+			break;
+	}
+
+	return pg_cnt - pfn_cnt;
+}
+
 static unsigned long handle_pg_range(unsigned long pg_start,
 					unsigned long pg_count)
 {
@@ -975,6 +1061,22 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 
 		old_covered_state = has->covered_end_pfn;
 
+		/*
+		 * If start_pfn is less than cover_end_pfn, the hot-add memory
+		 * area is in the gap range.
+		 */
+		if (start_pfn < has->covered_end_pfn) {
+			pgs_ol = handle_hot_add_in_gap(start_pfn, pfn_cnt, has);
+
+			pfn_cnt -= pgs_ol;
+			if (!pfn_cnt) {
+				res = pgs_ol;
+				break;
+			}
+
+			start_pfn += pgs_ol;
+		}
+
 		if (start_pfn < has->ha_end_pfn) {
 			/*
 			 * This is the case where we are backing pages
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH V2 10/10] x86/Hyper-V: Workaround Hyper-V unballoon msg bug
  2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
                   ` (8 preceding siblings ...)
  2020-01-07 13:09 ` [RFC PATCH V2 9/10] x86/Hyper-V/Balloon: Hot add mem in the gaps of hot add region lantianyu1986
@ 2020-01-07 13:09 ` lantianyu1986
  2020-01-08 22:23   ` kbuild test robot
  9 siblings, 1 reply; 24+ messages in thread
From: lantianyu1986 @ 2020-01-07 13:09 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, michael.h.kelley, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V sends unballoon msg instead of mem hot add msg in some
case. System doesn't receive balloon msg before unballoon msg
and this will cause balloon_down() to produce warning of
free non-exist memory. Workaround the issue via treating
unballoon msg as mem hot-add msg when mem hot-add is
enabled.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 drivers/hv/hv_balloon.c | 116 +++++++++++++++++++++++++++---------------------
 1 file changed, 66 insertions(+), 50 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5aaae62955bf..7f3e7ab22d5d 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -486,6 +486,9 @@ union dm_msg_info {
 	struct {
 		__u32 num_pages;
 	} balloon_state;
+	struct {
+		struct dm_unballoon_request *request;
+	} unballoon_state;
 	struct {
 		union dm_mem_page_range ha_page_range;
 		union dm_mem_page_range ha_region_range;
@@ -1155,6 +1158,21 @@ static unsigned long process_hot_add(unsigned long pg_start,
 		return pfn_cnt;
 	}
 
+	if ((rg_start == 0) && (!dm_device.host_specified_ha_region)) {
+		/*
+		 * The host has not specified the hot-add region.
+		 * Based on the hot-add page range being specified,
+		 * compute a hot-add region that can cover the pages
+		 * that need to be hot-added while ensuring the alignment
+		 * and size requirements of Linux as it relates to hot-add.
+		 */
+		rg_size = (pfn_cnt / HA_CHUNK) * HA_CHUNK;
+		if (pfn_cnt % HA_CHUNK)
+			rg_size += HA_CHUNK;
+
+		rg_start = (pg_start / HA_CHUNK) * HA_CHUNK;
+	}
+
 	if (!dm_device.host_specified_ha_region) {
 		covered = pfn_covered(pg_start, pfn_cnt);
 		if (covered < 0)
@@ -1488,28 +1506,6 @@ static void hot_add_req(union dm_msg_info *msg_info)
 	rg_start = msg_info->hot_add.ha_region_range.finfo.start_page;
 	rg_sz = msg_info->hot_add.ha_region_range.finfo.page_cnt;
 
-	if ((rg_start == 0) && (!dm->host_specified_ha_region)) {
-		unsigned long region_size;
-		unsigned long region_start;
-
-		/*
-		 * The host has not specified the hot-add region.
-		 * Based on the hot-add page range being specified,
-		 * compute a hot-add region that can cover the pages
-		 * that need to be hot-added while ensuring the alignment
-		 * and size requirements of Linux as it relates to hot-add.
-		 */
-		region_start = pg_start;
-		region_size = (pfn_cnt / HA_CHUNK) * HA_CHUNK;
-		if (pfn_cnt % HA_CHUNK)
-			region_size += HA_CHUNK;
-
-		region_start = (pg_start / HA_CHUNK) * HA_CHUNK;
-
-		rg_start = region_start;
-		rg_sz = region_size;
-	}
-
 	if (do_hot_add)
 		resp.page_count = process_hot_add(pg_start, pfn_cnt,
 						rg_start, rg_sz);
@@ -1823,41 +1819,37 @@ static void balloon_up(union dm_msg_info *msg_info)
 
 }
 
-static void dm_msg_work(struct work_struct *dummy)
-{
-	union dm_msg_info *msg_info = &dm_device.dm_wrk.dm_msg;
-
-	switch (dm_device.dm_wrk.msg_type) {
-	case DM_BALLOON_REQUEST:
-		balloon_up(msg_info);
-		break;
-	case DM_MEM_HOT_ADD_REQUEST:
-		hot_add_req(msg_info);
-		break;
-	case DM_MEM_HOT_REMOVE_REQUEST:
-		hot_remove_req(msg_info);
-		break;
-	default:
-		return;
-	}
-}
-
-static void balloon_down(struct hv_dynmem_device *dm,
-			struct dm_unballoon_request *req)
+static void balloon_down(union dm_msg_info *msg_info)
 {
+	struct dm_unballoon_request *req = msg_info->unballoon_state.request;
 	union dm_mem_page_range *range_array = req->range_array;
+	struct hv_dynmem_device *dm = &dm_device;
+	unsigned int prev_pages_ballooned = dm->num_pages_ballooned;
 	int range_count = req->range_count;
 	struct dm_unballoon_response resp;
 	int i;
-	unsigned int prev_pages_ballooned = dm->num_pages_ballooned;
 
 	for (i = 0; i < range_count; i++) {
-		free_balloon_pages(dm, &range_array[i]);
-		complete(&dm_device.config_event);
+		/*
+		 * Hyper-V has a bug that send unballoon msg instead
+		 * of hot add msg even if there is no balloon msg sent
+		 * before. Treat all unballoon msgs as hot add msgs
+		 * if hot add capability is enabled.
+		 */
+		if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && do_hot_add) {
+			dm->host_specified_ha_region = false;
+			dm->num_pages_added +=
+				process_hot_add(range_array[i].finfo.start_page,
+				range_array[i].finfo.page_cnt, 0, 0);
+		} else {
+			free_balloon_pages(dm, &range_array[i]);
+			complete(&dm_device.config_event);
+		}
 	}
 
-	pr_debug("Freed %u ballooned pages.\n",
-		prev_pages_ballooned - dm->num_pages_ballooned);
+	if (!do_hot_add)
+		pr_debug("Freed %u ballooned pages.\n",
+			prev_pages_ballooned - dm->num_pages_ballooned);
 
 	if (req->more_pages == 1)
 		return;
@@ -1875,6 +1867,28 @@ static void balloon_down(struct hv_dynmem_device *dm,
 	dm->state = DM_INITIALIZED;
 }
 
+static void dm_msg_work(struct work_struct *dummy)
+{
+	union dm_msg_info *msg_info = &dm_device.dm_wrk.dm_msg;
+
+	switch (dm_device.dm_wrk.msg_type) {
+	case DM_BALLOON_REQUEST:
+		balloon_up(msg_info);
+		break;
+	case DM_UNBALLOON_REQUEST:
+		balloon_down(msg_info);
+		break;
+	case DM_MEM_HOT_ADD_REQUEST:
+		hot_add_req(msg_info);
+		break;
+	case DM_MEM_HOT_REMOVE_REQUEST:
+		hot_remove_req(msg_info);
+		break;
+	default:
+		return;
+	}
+}
+
 static void balloon_onchannelcallback(void *context);
 
 static int dm_thread_func(void *dm_dev)
@@ -2024,8 +2038,10 @@ static void balloon_onchannelcallback(void *context)
 			}
 
 			dm->state = DM_BALLOON_DOWN;
-			balloon_down(dm,
-				 (struct dm_unballoon_request *)recv_buffer);
+			dm_wrk->msg_type = DM_UNBALLOON_REQUEST;
+			msg_info->unballoon_state.request
+				= (struct dm_unballoon_request *)recv_buffer;
+			schedule_work(&dm_wrk->wrk);
 			break;
 
 		case DM_MEM_HOT_ADD_REQUEST:
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-07 13:09 ` [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol lantianyu1986
@ 2020-01-07 13:36   ` Michal Hocko
  2020-01-10 13:41     ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Hocko @ 2020-01-07 13:36 UTC (permalink / raw)
  To: lantianyu1986
  Cc: kys, haiyangz, sthemmin, sashal, akpm, michael.h.kelley, david,
	Tianyu Lan, linux-hyperv, linux-kernel, linux-mm, vkuznets,
	eric.devolder, vbabka, osalvador, pavel.tatashin, rppt

On Tue 07-01-20 21:09:42, lantianyu1986@gmail.com wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> Hyper-V balloon driver will use is_mem_section_removable() to
> check whether memory block is removable or not when receive
> memory hot remove msg. Expose it.

I do not think this is a good idea. The check is inherently racy. Why
cannot the balloon driver simply hotremove the region when asked?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request
  2020-01-07 13:09 ` [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request lantianyu1986
@ 2020-01-08  9:54   ` kbuild test robot
  2020-01-08 12:03   ` kbuild test robot
  1 sibling, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2020-01-08  9:54 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3842 bytes --]

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on linus/master]
[also build test ERROR on v5.5-rc5 next-20191220]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/lantianyu1986-gmail-com/x86-Hyper-V-Add-Dynamic-memory-hot-remove-function/20200108-055844
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ae6088216ce4b99b3a4aaaccd2eb2dd40d473d42
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.5.0-3) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/hv/hv_balloon.c: In function 'hv_hot_remove_range':
>> drivers/hv/hv_balloon.c:1109:10: error: implicit declaration of function 'offline_and_remove_memory'; did you mean '__remove_memory'? [-Werror=implicit-function-declaration]
       ret = offline_and_remove_memory(nid, pfn << PAGE_SHIFT,
             ^~~~~~~~~~~~~~~~~~~~~~~~~
             __remove_memory
   cc1: some warnings being treated as errors

vim +1109 drivers/hv/hv_balloon.c

  1059	
  1060	static int hv_hot_remove_range(unsigned int nid, unsigned long start_pfn,
  1061				       unsigned long end_pfn, unsigned long nr_pages,
  1062				       unsigned long *request_index,
  1063				       union dm_mem_page_range *range_array,
  1064				       struct hv_hotadd_state *has)
  1065	{
  1066		unsigned long block_pages = HA_CHUNK;
  1067		unsigned long rm_pages = nr_pages;
  1068		unsigned long pfn;
  1069		int ret;
  1070	
  1071		for (pfn = start_pfn; pfn < end_pfn; pfn += block_pages) {
  1072			struct hv_hotadd_gap *gap;
  1073			int in_gap;
  1074	
  1075			if (*request_index >= MAX_HOT_REMOVE_ENTRIES) {
  1076				struct dm_hot_remove_response *resp =
  1077					(struct dm_hot_remove_response *)
  1078						balloon_up_send_buffer;
  1079	
  1080				/* Flush out all hot-remove ranges. */
  1081				ret = hv_send_hot_remove_response(resp, *request_index,
  1082								  true);
  1083				if (ret)
  1084					return ret;
  1085	
  1086				/* Reset request buffer. */
  1087				memset(resp, 0x00, PAGE_SIZE);
  1088				*request_index = 0;
  1089			}
  1090	
  1091			/*
  1092			 * Memory in hot-add region gaps has been offlined or removed
  1093			 * and so skip it if remove range overlap with gap.
  1094			 */
  1095			if (has) {
  1096				list_for_each_entry(gap, &has->gap_list, list)
  1097					if (!(pfn >= gap->end_pfn ||
  1098					      pfn + block_pages < gap->start_pfn)) {
  1099						in_gap = 1;
  1100						break;
  1101					}
  1102	
  1103				if (in_gap)
  1104					continue;
  1105			}
  1106	
  1107			if (online_section_nr(pfn_to_section_nr(pfn))
  1108			    && is_mem_section_removable(pfn, block_pages)) {
> 1109				ret = offline_and_remove_memory(nid, pfn << PAGE_SHIFT,
  1110						block_pages << PAGE_SHIFT);
  1111				if (ret)
  1112					continue;
  1113	
  1114				range_array[*request_index].finfo.start_page = pfn;
  1115				range_array[*request_index].finfo.page_cnt
  1116						= block_pages;
  1117	
  1118				(*request_index)++;
  1119				nr_pages -= block_pages;
  1120	
  1121				if (!nr_pages)
  1122					break;
  1123			}
  1124		}
  1125	
  1126		return rm_pages - nr_pages;
  1127	}
  1128	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 44036 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request
  2020-01-07 13:09 ` [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request lantianyu1986
  2020-01-08  9:54   ` kbuild test robot
@ 2020-01-08 12:03   ` kbuild test robot
  1 sibling, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2020-01-08 12:03 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on linus/master]
[also build test ERROR on v5.5-rc5 next-20200106]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/lantianyu1986-gmail-com/x86-Hyper-V-Add-Dynamic-memory-hot-remove-function/20200108-055844
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ae6088216ce4b99b3a4aaaccd2eb2dd40d473d42
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-3) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/hv/hv_balloon.c: In function 'hot_remove_req':
>> drivers/hv/hv_balloon.c:1234:3: error: implicit declaration of function 'hv_mem_hot_remove'; did you mean 'device_link_remove'? [-Werror=implicit-function-declaration]
      hv_mem_hot_remove(numa_node, page_count);
      ^~~~~~~~~~~~~~~~~
      device_link_remove
   cc1: some warnings being treated as errors

vim +1234 drivers/hv/hv_balloon.c

  1226	
  1227	static void hot_remove_req(union dm_msg_info *msg_info)
  1228	{
  1229		struct hv_dynmem_device *dm = &dm_device;
  1230		unsigned int numa_node = msg_info->hot_remove.virtual_node;
  1231		unsigned int page_count = msg_info->hot_remove.page_count;
  1232	
  1233		if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && do_hot_add)
> 1234			hv_mem_hot_remove(numa_node, page_count);
  1235		else
  1236			hv_send_hot_remove_response((struct dm_hot_remove_response *)
  1237					balloon_up_send_buffer, 0, false);
  1238	
  1239		dm->state = DM_INITIALIZED;
  1240	}
  1241	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 70225 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH V2 10/10] x86/Hyper-V: Workaround Hyper-V unballoon msg bug
  2020-01-07 13:09 ` [RFC PATCH V2 10/10] x86/Hyper-V: Workaround Hyper-V unballoon msg bug lantianyu1986
@ 2020-01-08 22:23   ` kbuild test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2020-01-08 22:23 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3767 bytes --]

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on linus/master]
[also build test ERROR on v5.5-rc5 next-20200106]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/lantianyu1986-gmail-com/x86-Hyper-V-Add-Dynamic-memory-hot-remove-function/20200108-055844
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ae6088216ce4b99b3a4aaaccd2eb2dd40d473d42
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-3) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/hv/hv_balloon.c: In function 'hot_remove_req':
   drivers/hv/hv_balloon.c:1481:3: error: implicit declaration of function 'hv_mem_hot_remove'; did you mean 'device_link_remove'? [-Werror=implicit-function-declaration]
      hv_mem_hot_remove(numa_node, page_count);
      ^~~~~~~~~~~~~~~~~
      device_link_remove
   drivers/hv/hv_balloon.c: In function 'balloon_down':
>> drivers/hv/hv_balloon.c:1842:5: error: implicit declaration of function 'process_hot_add'; did you mean '__check_hot_add'? [-Werror=implicit-function-declaration]
        process_hot_add(range_array[i].finfo.start_page,
        ^~~~~~~~~~~~~~~
        __check_hot_add
   cc1: some warnings being treated as errors

vim +1842 drivers/hv/hv_balloon.c

  1821	
  1822	static void balloon_down(union dm_msg_info *msg_info)
  1823	{
  1824		struct dm_unballoon_request *req = msg_info->unballoon_state.request;
  1825		union dm_mem_page_range *range_array = req->range_array;
  1826		struct hv_dynmem_device *dm = &dm_device;
  1827		unsigned int prev_pages_ballooned = dm->num_pages_ballooned;
  1828		int range_count = req->range_count;
  1829		struct dm_unballoon_response resp;
  1830		int i;
  1831	
  1832		for (i = 0; i < range_count; i++) {
  1833			/*
  1834			 * Hyper-V has a bug that send unballoon msg instead
  1835			 * of hot add msg even if there is no balloon msg sent
  1836			 * before. Treat all unballoon msgs as hot add msgs
  1837			 * if hot add capability is enabled.
  1838			 */
  1839			if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && do_hot_add) {
  1840				dm->host_specified_ha_region = false;
  1841				dm->num_pages_added +=
> 1842					process_hot_add(range_array[i].finfo.start_page,
  1843					range_array[i].finfo.page_cnt, 0, 0);
  1844			} else {
  1845				free_balloon_pages(dm, &range_array[i]);
  1846				complete(&dm_device.config_event);
  1847			}
  1848		}
  1849	
  1850		if (!do_hot_add)
  1851			pr_debug("Freed %u ballooned pages.\n",
  1852				prev_pages_ballooned - dm->num_pages_ballooned);
  1853	
  1854		if (req->more_pages == 1)
  1855			return;
  1856	
  1857		memset(&resp, 0, sizeof(struct dm_unballoon_response));
  1858		resp.hdr.type = DM_UNBALLOON_RESPONSE;
  1859		resp.hdr.trans_id = atomic_inc_return(&trans_id);
  1860		resp.hdr.size = sizeof(struct dm_unballoon_response);
  1861	
  1862		vmbus_sendpacket(dm_device.dev->channel, &resp,
  1863					sizeof(struct dm_unballoon_response),
  1864					(unsigned long)NULL,
  1865					VM_PKT_DATA_INBAND, 0);
  1866	
  1867		dm->state = DM_INITIALIZED;
  1868	}
  1869	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 70225 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-07 13:36   ` Michal Hocko
@ 2020-01-10 13:41     ` David Hildenbrand
  2020-01-13 14:49       ` [EXTERNAL] " Tianyu Lan
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2020-01-10 13:41 UTC (permalink / raw)
  To: Michal Hocko, lantianyu1986
  Cc: kys, haiyangz, sthemmin, sashal, akpm, michael.h.kelley,
	Tianyu Lan, linux-hyperv, linux-kernel, linux-mm, vkuznets,
	eric.devolder, vbabka, osalvador, pavel.tatashin, rppt

On 07.01.20 14:36, Michal Hocko wrote:
> On Tue 07-01-20 21:09:42, lantianyu1986@gmail.com wrote:
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>
>> Hyper-V balloon driver will use is_mem_section_removable() to
>> check whether memory block is removable or not when receive
>> memory hot remove msg. Expose it.
> 
> I do not think this is a good idea. The check is inherently racy. Why
> cannot the balloon driver simply hotremove the region when asked?
> 

It's not only racy, it also gives no guarantees. False postives and
false negatives are possible.

If you want to avoid having to loop forever trying to offline when
calling offline_and_remove_memory(), you could try to
alloc_contig_range() the memory first and then play the
PG_offline+notifier game like virtio-mem.

I don't remember clearly, but I think pinned pages can make offlining
loop for a long time. And I remember there were other scenarios as well
(including out of memory conditions and similar).

I sent an RFC [1] for powerpc/memtrace that does the same (just error
handling is more complicated as it wants to offline and remove multiple
consecutive memory blocks) - if you want to try to go down that path.

[1] https://lkml.kernel.org/r/20191217123851.8854-1-david@redhat.com

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-10 13:41     ` David Hildenbrand
@ 2020-01-13 14:49       ` Tianyu Lan
  2020-01-13 15:01         ` David Hildenbrand
  2020-01-14  9:50         ` Michal Hocko
  0 siblings, 2 replies; 24+ messages in thread
From: Tianyu Lan @ 2020-01-13 14:49 UTC (permalink / raw)
  To: David Hildenbrand, Michal Hocko, lantianyu1986
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, sashal, akpm,
	Michael Kelley, linux-hyperv, linux-kernel, linux-mm, vkuznets,
	eric.devolder, vbabka, osalvador, Pasha Tatashin, rppt

> From: David Hildenbrand <david@redhat.com>
> Sent: Friday, January 10, 2020 9:42 PM
> To: Michal Hocko <mhocko@kernel.org>; lantianyu1986@gmail.com
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> sashal@kernel.org; akpm@linux-foundation.org; Michael Kelley
> <mikelley@microsoft.com>; Tianyu Lan <Tianyu.Lan@microsoft.com>; linux-
> hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> vkuznets <vkuznets@redhat.com>; eric.devolder@oracle.com; vbabka@suse.cz;
> osalvador@suse.de; Pasha Tatashin <Pavel.Tatashin@microsoft.com>;
> rppt@linux.ibm.com
> Subject: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose
> is_mem_section_removable() symbol
> 
> On 07.01.20 14:36, Michal Hocko wrote:
> > On Tue 07-01-20 21:09:42, lantianyu1986@gmail.com wrote:
> >> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >>
> >> Hyper-V balloon driver will use is_mem_section_removable() to check
> >> whether memory block is removable or not when receive memory hot
> >> remove msg. Expose it.
> >
> > I do not think this is a good idea. The check is inherently racy. Why
> > cannot the balloon driver simply hotremove the region when asked?
> >
> 
> It's not only racy, it also gives no guarantees. False postives and false negatives
> are possible.
> 
> If you want to avoid having to loop forever trying to offline when calling
> offline_and_remove_memory(), you could try to
> alloc_contig_range() the memory first and then play the PG_offline+notifier
> game like virtio-mem.
> 
> I don't remember clearly, but I think pinned pages can make offlining loop for a
> long time. And I remember there were other scenarios as well (including out of
> memory conditions and similar).
> 
> I sent an RFC [1] for powerpc/memtrace that does the same (just error
> handling is more complicated as it wants to offline and remove multiple
> consecutive memory blocks) - if you want to try to go down that path.
> 
Hi David & Michal:
	Thanks for your review. Some memory blocks are not suitable for hot-plug.
If not check memory block's removable, offline_pages() will report some failure error
e.g, "failed due to memory holes" and  "failure to isolate range". I think the check maybe
added into offline_and_remove_memory()? This may help to not create/expose a new
interface to do such check in module.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-13 14:49       ` [EXTERNAL] " Tianyu Lan
@ 2020-01-13 15:01         ` David Hildenbrand
  2020-01-14  9:50         ` Michal Hocko
  1 sibling, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2020-01-13 15:01 UTC (permalink / raw)
  To: Tianyu Lan, Michal Hocko, lantianyu1986
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, sashal, akpm,
	Michael Kelley, linux-hyperv, linux-kernel, linux-mm, vkuznets,
	eric.devolder, vbabka, osalvador, Pasha Tatashin, rppt

On 13.01.20 15:49, Tianyu Lan wrote:
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Friday, January 10, 2020 9:42 PM
>> To: Michal Hocko <mhocko@kernel.org>; lantianyu1986@gmail.com
>> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
>> <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
>> sashal@kernel.org; akpm@linux-foundation.org; Michael Kelley
>> <mikelley@microsoft.com>; Tianyu Lan <Tianyu.Lan@microsoft.com>; linux-
>> hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
>> vkuznets <vkuznets@redhat.com>; eric.devolder@oracle.com; vbabka@suse.cz;
>> osalvador@suse.de; Pasha Tatashin <Pavel.Tatashin@microsoft.com>;
>> rppt@linux.ibm.com
>> Subject: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose
>> is_mem_section_removable() symbol
>>
>> On 07.01.20 14:36, Michal Hocko wrote:
>>> On Tue 07-01-20 21:09:42, lantianyu1986@gmail.com wrote:
>>>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>>>
>>>> Hyper-V balloon driver will use is_mem_section_removable() to check
>>>> whether memory block is removable or not when receive memory hot
>>>> remove msg. Expose it.
>>>
>>> I do not think this is a good idea. The check is inherently racy. Why
>>> cannot the balloon driver simply hotremove the region when asked?
>>>
>>
>> It's not only racy, it also gives no guarantees. False postives and false negatives
>> are possible.
>>
>> If you want to avoid having to loop forever trying to offline when calling
>> offline_and_remove_memory(), you could try to
>> alloc_contig_range() the memory first and then play the PG_offline+notifier
>> game like virtio-mem.
>>
>> I don't remember clearly, but I think pinned pages can make offlining loop for a
>> long time. And I remember there were other scenarios as well (including out of
>> memory conditions and similar).
>>
>> I sent an RFC [1] for powerpc/memtrace that does the same (just error
>> handling is more complicated as it wants to offline and remove multiple
>> consecutive memory blocks) - if you want to try to go down that path.
>>
> Hi David & Michal:
> 	Thanks for your review. Some memory blocks are not suitable for hot-plug.
> If not check memory block's removable, offline_pages() will report some failure error
> e.g, "failed due to memory holes" and  "failure to isolate range". I think the check maybe
> added into offline_and_remove_memory()? This may help to not create/expose a new
> interface to do such check in module.

So it's all about the logging output. Duplicating these checks feels
very wrong. And you will still get plenty of page dumps (read below), so
that won't help.

We have pr_debug() for these "failure ..." message. that should
therefore not be an issue on production systems, right?

However, you will see dump_page()s quite often, which logs via pr_warn().

Of course, we could add a mechanism to temporarily disable logging
output for these call paths, but it might actually be helpful for
debugging. We might just want to convert everything that is not actually
a warning to pr_debug() - especially in dump_page().

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-13 14:49       ` [EXTERNAL] " Tianyu Lan
  2020-01-13 15:01         ` David Hildenbrand
@ 2020-01-14  9:50         ` Michal Hocko
  2020-01-17 16:35           ` Tianyu Lan
  1 sibling, 1 reply; 24+ messages in thread
From: Michal Hocko @ 2020-01-14  9:50 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: David Hildenbrand, lantianyu1986, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal, akpm, Michael Kelley, linux-hyperv,
	linux-kernel, linux-mm, vkuznets, eric.devolder, vbabka,
	osalvador, Pasha Tatashin, rppt

On Mon 13-01-20 14:49:38, Tianyu Lan wrote:
> Hi David & Michal:
> 	Thanks for your review. Some memory blocks are not suitable for hot-plug.
> If not check memory block's removable, offline_pages() will report some failure error
> e.g, "failed due to memory holes" and  "failure to isolate range". I think the check maybe
> added into offline_and_remove_memory()? This may help to not create/expose a new
> interface to do such check in module.

Why is a log message a problem in the first place. The operation has
failed afterall. Does the driver try to offline an arbitrary memory?
Could you describe your usecase in more details please?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-14  9:50         ` Michal Hocko
@ 2020-01-17 16:35           ` Tianyu Lan
  2020-01-20 14:14             ` Michal Hocko
  0 siblings, 1 reply; 24+ messages in thread
From: Tianyu Lan @ 2020-01-17 16:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, lantianyu1986, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal, akpm, Michael Kelley, linux-hyperv,
	linux-kernel, linux-mm, vkuznets, eric.devolder, vbabka,
	osalvador, Pasha Tatashin, rppt

> From: Michal Hocko <mhocko@kernel.org>
> Sent: Tuesday, January 14, 2020 5:51 PM
> To: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Cc: David Hildenbrand <david@redhat.com>; lantianyu1986@gmail.com; KY
> Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; sashal@kernel.org;
> akpm@linux-foundation.org; Michael Kelley <mikelley@microsoft.com>; linux-
> hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> vkuznets <vkuznets@redhat.com>; eric.devolder@oracle.com; vbabka@suse.cz;
> osalvador@suse.de; Pasha Tatashin <Pavel.Tatashin@microsoft.com>;
> rppt@linux.ibm.com
> Subject: Re: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose
> is_mem_section_removable() symbol
> 
> On Mon 13-01-20 14:49:38, Tianyu Lan wrote:
> > Hi David & Michal:
> > 	Thanks for your review. Some memory blocks are not suitable for hot-
> plug.
> > If not check memory block's removable, offline_pages() will report
> > some failure error e.g, "failed due to memory holes" and  "failure to
> > isolate range". I think the check maybe added into
> > offline_and_remove_memory()? This may help to not create/expose a new
> interface to do such check in module.
> 
> Why is a log message a problem in the first place. The operation has failed
> afterall. Does the driver try to offline an arbitrary memory?

Yes.

> Could you describe your usecase in more details please?

Hyper-V sends hot-remove request message which just contains requested
page number but not provide detail range. So Hyper-V driver needs to search
suitable memory block in system memory to return back to host if there is no
memory hot-add before. So I used the is_mem_section_removable() do such check.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol
  2020-01-17 16:35           ` Tianyu Lan
@ 2020-01-20 14:14             ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2020-01-20 14:14 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: David Hildenbrand, lantianyu1986, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal, akpm, Michael Kelley, linux-hyperv,
	linux-kernel, linux-mm, vkuznets, eric.devolder, vbabka,
	osalvador, Pasha Tatashin, rppt

On Fri 17-01-20 16:35:03, Tianyu Lan wrote:
[...]
> > Could you describe your usecase in more details please?
> 
> Hyper-V sends hot-remove request message which just contains requested
> page number but not provide detail range. So Hyper-V driver needs to search
> suitable memory block in system memory to return back to host if there is no
> memory hot-add before. So I used the is_mem_section_removable() do such check.

As David described, you would be much better of by using
alloc_contig_range to find a memory that would be suitable for
hotremoving without any races.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region.
  2020-01-07 13:09 ` [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region lantianyu1986
@ 2020-01-20 18:34   ` Michael Kelley
  2020-01-20 19:20   ` Michael Kelley
  1 sibling, 0 replies; 24+ messages in thread
From: Michael Kelley @ 2020-01-20 18:34 UTC (permalink / raw)
  To: lantianyu1986, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal, akpm, dan.j.williams, jgg, dave.hansen, richardw.yang,
	namit, Tianyu Lan, david, christophe.leroy
  Cc: linux-hyperv, linux-kernel, linux-mm, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com> Sent: Tuesday, January 7, 2020 5:10 AM
> 
> When release mem region, old mem region may be splited to
> two regions. Current allocate new struct resource for high
> end mem region but not move child resources whose ranges are
> in the high end range to new resource. When adjust old mem
> region's range, adjust_resource() detects child region's range
> is out of new range and return error. Move child resources to
> high end resource before adjusting old mem range.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  kernel/resource.c | 38 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 76036a41143b..1c7362825134 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -181,6 +181,38 @@ static struct resource *alloc_resource(gfp_t flags)
>  	return res;
>  }
> 
> +static void move_child_to_newresource(struct resource *old,
> +				      struct resource *new)
> +{
> +	struct resource *tmp, **p, **np;
> +
> +	if (!old->child)
> +		return;

I don't think the above test is needed.  This case is handled by the first
three lines of the "for" loop.

> +
> +	p = &old->child;
> +	np = &new->child;
> +
> +	for (;;) {
> +		tmp = *p;
> +		if (!tmp)
> +			break;
> +
> +		if (tmp->start >= new->start && tmp->end <= new->end) {
> +			tmp->parent = new;
> +			*np = tmp;
> +			np = &tmp->sibling;
> +			*p = tmp->sibling;
> +
> +			if (!tmp->sibling)
> +				*np = NULL;

I don't think the above two lines are right.  They seem tautological.  If the ! were
removed it would be clearing the sibling link for the child as it exists under its new
parent, which should be done.  But the child that is moved to the new parent always
becomes the last entry in the new parent's child list.  So could you just unconditionally
do tmp->sibling = NULL?   That link will get fixed up if another child is moved.

Michael

> +			continue;
> +		}
> +
> +		p = &tmp->sibling;
> +	}
> +}
> +
> +
>  /* Return the conflict entry if you can't request it */
>  static struct resource * __request_resource(struct resource *root, struct resource *new)
>  {
> @@ -1246,9 +1278,6 @@ EXPORT_SYMBOL(__release_region);
>   * Note:
>   * - Additional release conditions, such as overlapping region, can be
>   *   supported after they are confirmed as valid cases.
> - * - When a busy memory resource gets split into two entries, the code
> - *   assumes that all children remain in the lower address entry for
> - *   simplicity.  Enhance this logic when necessary.
>   */
>  int release_mem_region_adjustable(struct resource *parent,
>  				  resource_size_t start, resource_size_t size)
> @@ -1331,11 +1360,12 @@ int release_mem_region_adjustable(struct resource *parent,
>  			new_res->sibling = res->sibling;
>  			new_res->child = NULL;
> 
> +			move_child_to_newresource(res, new_res);
> +			res->sibling = new_res;
>  			ret = __adjust_resource(res, res->start,
>  						start - res->start);
>  			if (ret)
>  				break;
> -			res->sibling = new_res;
>  			new_res = NULL;
>  		}
> 
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH V2 3/10] x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a common work
  2020-01-07 13:09 ` [RFC PATCH V2 3/10] x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a common work lantianyu1986
@ 2020-01-20 19:12   ` Michael Kelley
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Kelley @ 2020-01-20 19:12 UTC (permalink / raw)
  To: lantianyu1986, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal, david
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com> Sent: Tuesday, January 7, 2020 5:10 AM
> 
> The mem hot-remove operation and balloon down will be added
> or moved into work context. Add a common work to handle
> opeations of mem hot-add/remove and balloon up/down.

Let me suggest some improved wording for this commit message:

The Hyper-V balloon driver currently has separate work contexts for memory
hot-add operations and for balloon up (i.e., remove memory from the guest)
operations.  Hot-remove is being added and must be done in a work context.
And finally, balloon down is currently not done in a work context, but needs
to move to a work context because of a Hyper-V bug that confuses balloon
operations and hot-remove.  Rather than add code for two additional work
contexts, consolidate the current hot-add and balloon up operations onto
a single work context.  Subsequent patches will add hot-remove and
balloon down to that same work context.

My assumption is that the consolidation into a single work context is
just code simplification.  If there's another reason, please describe it.

The code is this patch looks OK to me.

Michael

> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  drivers/hv/hv_balloon.c | 86 ++++++++++++++++++++++++++++++-------------------
>  1 file changed, 52 insertions(+), 34 deletions(-)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region.
  2020-01-07 13:09 ` [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region lantianyu1986
  2020-01-20 18:34   ` Michael Kelley
@ 2020-01-20 19:20   ` Michael Kelley
  1 sibling, 0 replies; 24+ messages in thread
From: Michael Kelley @ 2020-01-20 19:20 UTC (permalink / raw)
  To: lantianyu1986, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal, akpm, dan.j.williams, jgg, dave.hansen, richardw.yang,
	namit, Tianyu Lan, david, christophe.leroy
  Cc: linux-hyperv, linux-kernel, linux-mm, vkuznets, eric.devolder

From: Tianyu Lan <Tianyu.Lan@microsoft.com> Sent: Tuesday, January 7, 2020 5:10 AM
> 
> When release mem region, old mem region may be splited to
> two regions. Current allocate new struct resource for high
> end mem region but not move child resources whose ranges are
> in the high end range to new resource. When adjust old mem
> region's range, adjust_resource() detects child region's range
> is out of new range and return error. Move child resources to
> high end resource before adjusting old mem range.

Let me also suggests some wording improvements to the commit message:

When releasing a mem region, the old mem region may need to be
split into two regions.  In this case, the current code allocates the new
region and adjust the original region to specify a smaller range.  But child
regions that fall into the range of the new region are not moved to that
new region.  Consequently, when running __adjust_resource() on the
original region, it detects that the child region's range is out of the new
range, and returns an error.

Fix this by moving appropriate child resources to the new region before
adjusting the original mem region range.

> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  kernel/resource.c | 38 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 34 insertions(+), 4 deletions(-)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-01-20 19:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07 13:09 [RFC PATCH V2 00/10] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
2020-01-07 13:09 ` [RFC PATCH V2 1/10] mm/resource: Move child to new resource when release mem region lantianyu1986
2020-01-20 18:34   ` Michael Kelley
2020-01-20 19:20   ` Michael Kelley
2020-01-07 13:09 ` [RFC PATCH V2 2/10] mm: expose is_mem_section_removable() symbol lantianyu1986
2020-01-07 13:36   ` Michal Hocko
2020-01-10 13:41     ` David Hildenbrand
2020-01-13 14:49       ` [EXTERNAL] " Tianyu Lan
2020-01-13 15:01         ` David Hildenbrand
2020-01-14  9:50         ` Michal Hocko
2020-01-17 16:35           ` Tianyu Lan
2020-01-20 14:14             ` Michal Hocko
2020-01-07 13:09 ` [RFC PATCH V2 3/10] x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a common work lantianyu1986
2020-01-20 19:12   ` Michael Kelley
2020-01-07 13:09 ` [RFC PATCH V2 4/10] x86/Hyper-V/Balloon: Convert spin lock ha_lock to mutex lantianyu1986
2020-01-07 13:09 ` [RFC PATCH V2 5/10] x86/Hyper-V/Balloon: Avoid releasing ha_lock when traverse ha_region_list lantianyu1986
2020-01-07 13:09 ` [RFC PATCH V2 6/10] x86/Hyper-V/Balloon: Enable mem hot-remove capability lantianyu1986
2020-01-07 13:09 ` [RFC PATCH V2 7/10] x86/Hyper-V/Balloon: Handle mem hot-remove request lantianyu1986
2020-01-08  9:54   ` kbuild test robot
2020-01-08 12:03   ` kbuild test robot
2020-01-07 13:09 ` [RFC PATCH V2 8/10] x86/Hyper-V/Balloon: Handle request with non-aligned page number lantianyu1986
2020-01-07 13:09 ` [RFC PATCH V2 9/10] x86/Hyper-V/Balloon: Hot add mem in the gaps of hot add region lantianyu1986
2020-01-07 13:09 ` [RFC PATCH V2 10/10] x86/Hyper-V: Workaround Hyper-V unballoon msg bug lantianyu1986
2020-01-08 22:23   ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.