[PATCH v2 RESEND 0/4] Drivers: hv: balloon: fix WS2012 memory hotplug issues and do some cleanup

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 RESEND 0/4] Drivers: hv: balloon: fix WS2012 memory hotplug issues and do some cleanup
@ 2016-08-11 12:47 Vitaly Kuznetsov
  2016-08-11 12:47 ` [PATCH v2 RESEND 1/4] Drivers: hv: balloon: keep track of where ha_region starts Vitaly Kuznetsov
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Vitaly Kuznetsov @ 2016-08-11 12:47 UTC (permalink / raw)
  To: devel; +Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Alex Ng

Changes since v2:
- I'm sorry, I screwed up Alex's address, this is just a resend.

Changes since v1:
- Keep ol_waitevent and wait when kernel memory onlining is disabled [Alex Ng]

Crashes with Hyper-V balloon driver are reported with WS2012 (non-R2),
hosts I was able to identify two issues which I fix with first two patches
of this series. Patches 3 removes wait on ol_waitevent when we have in
in-kernel memory onlining, patch 4 gets rid of ha_region_mutex by doing
the locking fine-grained with a spinlock.

Vitaly Kuznetsov (4):
  Drivers: hv: balloon: keep track of where ha_region starts
  Drivers: hv: balloon: account for gaps in hot add regions
  Drivers: hv: balloon: don't wait for ol_waitevent when
    memhp_auto_online is enabled
  Drivers: hv: balloon: replace ha_region_mutex with spinlock

 drivers/hv/hv_balloon.c | 216 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 144 insertions(+), 72 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 RESEND 1/4] Drivers: hv: balloon: keep track of where ha_region starts
  2016-08-11 12:47 [PATCH v2 RESEND 0/4] Drivers: hv: balloon: fix WS2012 memory hotplug issues and do some cleanup Vitaly Kuznetsov
@ 2016-08-11 12:47 ` Vitaly Kuznetsov
  2016-08-11 12:47 ` [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Vitaly Kuznetsov @ 2016-08-11 12:47 UTC (permalink / raw)
  To: devel; +Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Alex Ng

Windows 2012 (non-R2) does not specify hot add region in hot add requests
and the logic in hot_add_req() is trying to find a 128Mb-aligned region
covering the request. It may also happen that host's requests are not 128Mb
aligned and the created ha_region will start before the first specified
PFN. We can't online these non-present pages but we don't remember the real
start of the region.

This is a regression introduced by the commit 5abbbb75d733 ("Drivers: hv:
hv_balloon: don't lose memory when onlining order is not natural"). While
the idea of keeping the 'moving window' was wrong (as there is no guarantee
that hot add requests come ordered) we should still keep track of
covered_start_pfn. This is not a revert, the logic is different.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/hv_balloon.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index df35fb7..4ae26d6 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -430,13 +430,14 @@ struct dm_info_msg {
  * currently hot added. We hot add in multiples of 128M
  * chunks; it is possible that we may not be able to bring
  * online all the pages in the region. The range
- * covered_end_pfn defines the pages that can
+ * covered_start_pfn:covered_end_pfn defines the pages that can
  * be brough online.
  */
 
 struct hv_hotadd_state {
 	struct list_head list;
 	unsigned long start_pfn;
+	unsigned long covered_start_pfn;
 	unsigned long covered_end_pfn;
 	unsigned long ha_end_pfn;
 	unsigned long end_pfn;
@@ -682,7 +683,8 @@ static void hv_online_page(struct page *pg)
 
 	list_for_each(cur, &dm_device.ha_region_list) {
 		has = list_entry(cur, struct hv_hotadd_state, list);
-		cur_start_pgp = (unsigned long)pfn_to_page(has->start_pfn);
+		cur_start_pgp = (unsigned long)
+			pfn_to_page(has->covered_start_pfn);
 		cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
 		if (((unsigned long)pg >= cur_start_pgp) &&
@@ -854,6 +856,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
 		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
 		ha_region->start_pfn = rg_start;
 		ha_region->ha_end_pfn = rg_start;
+		ha_region->covered_start_pfn = pg_start;
 		ha_region->covered_end_pfn = pg_start;
 		ha_region->end_pfn = rg_start + rg_size;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-11 12:47 [PATCH v2 RESEND 0/4] Drivers: hv: balloon: fix WS2012 memory hotplug issues and do some cleanup Vitaly Kuznetsov
  2016-08-11 12:47 ` [PATCH v2 RESEND 1/4] Drivers: hv: balloon: keep track of where ha_region starts Vitaly Kuznetsov
@ 2016-08-11 12:47 ` Vitaly Kuznetsov
  2016-08-15 22:17   ` Alex Ng (LIS)
  2016-08-11 12:47 ` [PATCH v2 RESEND 3/4] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled Vitaly Kuznetsov
  2016-08-11 12:47 ` [PATCH v2 RESEND 4/4] Drivers: hv: balloon: replace ha_region_mutex with spinlock Vitaly Kuznetsov
  3 siblings, 1 reply; 8+ messages in thread
From: Vitaly Kuznetsov @ 2016-08-11 12:47 UTC (permalink / raw)
  To: devel; +Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Alex Ng

I'm observing the following hot add requests from the WS2012 host:

hot_add_req: start_pfn = 0x108200 count = 330752
hot_add_req: start_pfn = 0x158e00 count = 193536
hot_add_req: start_pfn = 0x188400 count = 239616

As the host doesn't specify hot add regions we're trying to create
128Mb-aligned region covering the first request, we create the 0x108000 -
0x160000 region and we add 0x108000 - 0x158e00 memory. The second request
passes the pfn_covered() check, we enlarge the region to 0x108000 -
0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the
third request as it starts at 0x188400 so there is a 0x200 gap which is
not covered. As the end of our region is 0x190000 now it again passes the
pfn_covered() check were we just adjust the covered_end_pfn and make it
0x188400 instead of 0x188200 which means that we'll try to online
0x188200-0x188400 pages but these pages were never assigned to us and we
crash.

We can't react to such requests by creating new hot add regions as it may
happen that the whole suggested range falls into the previously identified
128Mb-aligned area so we'll end up adding nothing or create intersecting
regions and our current logic doesn't allow that. Instead, create a list of
such 'gaps' and check for them in the page online callback.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/hv_balloon.c | 107 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 83 insertions(+), 24 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4ae26d6..064ef3d 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -441,6 +441,16 @@ struct hv_hotadd_state {
 	unsigned long covered_end_pfn;
 	unsigned long ha_end_pfn;
 	unsigned long end_pfn;
+	/*
+	 * A list of gaps.
+	 */
+	struct list_head gap_list;
+};
+
+struct hv_hotadd_gap {
+	struct list_head list;
+	unsigned long start_pfn;
+	unsigned long end_pfn;
 };
 
 struct balloon_state {
@@ -676,35 +686,63 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 static void hv_online_page(struct page *pg)
 {
-	struct list_head *cur;
 	struct hv_hotadd_state *has;
+	struct hv_hotadd_gap *gap;
 	unsigned long cur_start_pgp;
 	unsigned long cur_end_pgp;
+	bool is_gap = false;
 
 	list_for_each(cur, &dm_device.ha_region_list) {
 		has = list_entry(cur, struct hv_hotadd_state, list);
 		cur_start_pgp = (unsigned long)
+			pfn_to_page(has->start_pfn);
+		cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
+
+		/* The page belongs to a different HAS. */
+		if (((unsigned long)pg < cur_start_pgp) ||
+		    ((unsigned long)pg >= cur_end_pgp))
+			continue;
+
+		cur_start_pgp = (unsigned long)
 			pfn_to_page(has->covered_start_pfn);
 		cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
-		if (((unsigned long)pg >= cur_start_pgp) &&
-			((unsigned long)pg < cur_end_pgp)) {
-			/*
-			 * This frame is currently backed; online the
-			 * page.
-			 */
-			__online_page_set_limits(pg);
-			__online_page_increment_counters(pg);
-			__online_page_free(pg);
+		/* The page is not backed. */
+		if (((unsigned long)pg < cur_start_pgp) ||
+		    ((unsigned long)pg >= cur_end_pgp))
+			continue;
+
+		/* Check for gaps. */
+		list_for_each_entry(gap, &has->gap_list, list) {
+			cur_start_pgp = (unsigned long)
+				pfn_to_page(gap->start_pfn);
+			cur_end_pgp = (unsigned long)
+				pfn_to_page(gap->end_pfn);
+			if (((unsigned long)pg >= cur_start_pgp) &&
+			    ((unsigned long)pg < cur_end_pgp)) {
+				is_gap = true;
+				break;
+			}
 		}
+
+		if (is_gap)
+			break;
+
+		/* This frame is currently backed; online the page. */
+		__online_page_set_limits(pg);
+		__online_page_increment_counters(pg);
+		__online_page_free(pg);
+		break;
 	}
 }
 
-static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
+static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 {
 	struct list_head *cur;
 	struct hv_hotadd_state *has;
+	struct hv_hotadd_gap *gap;
 	unsigned long residual, new_inc;
+	int ret = 0;
 
 	if (list_empty(&dm_device.ha_region_list))
 		return false;
@@ -718,6 +756,24 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 		 */
 		if (start_pfn < has->start_pfn || start_pfn >= has->end_pfn)
 			continue;
+
+		/*
+		 * If the current start pfn is not where the covered_end
+		 * is, create a gap and update covered_end_pfn.
+		 */
+		if (has->covered_end_pfn != start_pfn) {
+			gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
+			if (!gap)
+				return -ENOMEM;
+
+			INIT_LIST_HEAD(&gap->list);
+			gap->start_pfn = has->covered_end_pfn;
+			gap->end_pfn = start_pfn;
+			list_add_tail(&gap->list, &has->gap_list);
+
+			has->covered_end_pfn = start_pfn;
+		}
+
 		/*
 		 * If the current hot add-request extends beyond
 		 * our current limit; extend it.
@@ -734,19 +790,10 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 			has->end_pfn += new_inc;
 		}
 
-		/*
-		 * If the current start pfn is not where the covered_end
-		 * is, update it.
-		 */
-
-		if (has->covered_end_pfn != start_pfn)
-			has->covered_end_pfn = start_pfn;
-
-		return true;
-
+		return 1;
 	}
 
-	return false;
+	return 0;
 }
 
 static unsigned long handle_pg_range(unsigned long pg_start,
@@ -834,13 +881,19 @@ static unsigned long process_hot_add(unsigned long pg_start,
 					unsigned long rg_size)
 {
 	struct hv_hotadd_state *ha_region = NULL;
+	int covered;
 
 	if (pfn_cnt == 0)
 		return 0;
 
-	if (!dm_device.host_specified_ha_region)
-		if (pfn_covered(pg_start, pfn_cnt))
+	if (!dm_device.host_specified_ha_region) {
+		covered = pfn_covered(pg_start, pfn_cnt);
+		if (covered < 0)
+			return 0;
+
+		if (covered)
 			goto do_pg_range;
+	}
 
 	/*
 	 * If the host has specified a hot-add range; deal with it first.
@@ -852,6 +905,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
 			return 0;
 
 		INIT_LIST_HEAD(&ha_region->list);
+		INIT_LIST_HEAD(&ha_region->gap_list);
 
 		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
 		ha_region->start_pfn = rg_start;
@@ -1585,6 +1639,7 @@ static int balloon_remove(struct hv_device *dev)
 	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
 	struct list_head *cur, *tmp;
 	struct hv_hotadd_state *has;
+	struct hv_hotadd_gap *gap, *tmp_gap;
 
 	if (dm->num_pages_ballooned != 0)
 		pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
@@ -1601,6 +1656,10 @@ static int balloon_remove(struct hv_device *dev)
 #endif
 	list_for_each_safe(cur, tmp, &dm->ha_region_list) {
 		has = list_entry(cur, struct hv_hotadd_state, list);
+		list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
+			list_del(&gap->list);
+			kfree(gap);
+		}
 		list_del(&has->list);
 		kfree(has);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 RESEND 3/4] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled
  2016-08-11 12:47 [PATCH v2 RESEND 0/4] Drivers: hv: balloon: fix WS2012 memory hotplug issues and do some cleanup Vitaly Kuznetsov
  2016-08-11 12:47 ` [PATCH v2 RESEND 1/4] Drivers: hv: balloon: keep track of where ha_region starts Vitaly Kuznetsov
  2016-08-11 12:47 ` [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions Vitaly Kuznetsov
@ 2016-08-11 12:47 ` Vitaly Kuznetsov
  2016-08-11 12:47 ` [PATCH v2 RESEND 4/4] Drivers: hv: balloon: replace ha_region_mutex with spinlock Vitaly Kuznetsov
  3 siblings, 0 replies; 8+ messages in thread
From: Vitaly Kuznetsov @ 2016-08-11 12:47 UTC (permalink / raw)
  To: devel; +Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Alex Ng

With the recently introduced in-kernel memory onlining
(MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
to come online in the driver and we can get rid of the waiting.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/hv_balloon.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 064ef3d..a8b8b9a 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -645,7 +645,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		has->covered_end_pfn +=  processed_pfn;

 		init_completion(&dm_device.ol_waitevent);
-		dm_device.ha_waiting = true;
+		dm_device.ha_waiting = !memhp_auto_online;

 		mutex_unlock(&dm_device.ha_region_mutex);
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
@@ -671,12 +671,14 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}

 		/*
-		 * Wait for the memory block to be onlined.
-		 * Since the hot add has succeeded, it is ok to
-		 * proceed even if the pages in the hot added region
-		 * have not been "onlined" within the allowed time.
+		 * Wait for the memory block to be onlined when memory onlining
+		 * is done outside of kernel (memhp_auto_online). Since the hot
+		 * add has succeeded, it is ok to proceed even if the pages in
+		 * the hot added region have not been "onlined" within the
+		 * allowed time.
 		 */
-		wait_for_completion_timeout(&dm_device.ol_waitevent, 5*HZ);
+		if (dm_device.ha_waiting)
+			wait_for_completion_timeout(&dm_device.ol_waitevent, 5*HZ);
 		mutex_lock(&dm_device.ha_region_mutex);
 		post_status(&dm_device);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 RESEND 4/4] Drivers: hv: balloon: replace ha_region_mutex with spinlock
  2016-08-11 12:47 [PATCH v2 RESEND 0/4] Drivers: hv: balloon: fix WS2012 memory hotplug issues and do some cleanup Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2016-08-11 12:47 ` [PATCH v2 RESEND 3/4] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled Vitaly Kuznetsov
@ 2016-08-11 12:47 ` Vitaly Kuznetsov
  3 siblings, 0 replies; 8+ messages in thread
From: Vitaly Kuznetsov @ 2016-08-11 12:47 UTC (permalink / raw)
  To: devel; +Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Alex Ng

lockdep reports possible circular locking dependency when udev is used
for memory onlining:

 systemd-udevd/3996 is trying to acquire lock:
  ((memory_chain).rwsem){++++.+}, at: [<ffffffff810d137e>] __blocking_notifier_call_chain+0x4e/0xc0

 but task is already holding lock:
  (&dm_device.ha_region_mutex){+.+.+.}, at: [<ffffffffa015382e>] hv_memory_notifier+0x5e/0xc0 [hv_balloon]
 ...

which is probably a false positive because we take and release
ha_region_mutex from memory notifier chain depending on the arg. No real
deadlocks were reported so far (though I'm not really sure about
preemptible kernels...) but we don't really need to hold the mutex
for so long. We use it to protect ha_region_list (and its members) and the
num_pages_onlined counter. None of these operations require us to sleep
and nothing is slow, switch to using spinlock with interrupts disabled.

While on it, replace list_for_each -> list_for_each_entry as we actually
need entries in all these cases, drop meaningless list_empty() checks.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/hv_balloon.c | 96 ++++++++++++++++++++++++++-----------------------
 1 file changed, 52 insertions(+), 44 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index a8b8b9a..19498f4 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -547,7 +547,11 @@ struct hv_dynmem_device {
 	 */
 	struct task_struct *thread;
 
-	struct mutex ha_region_mutex;
+	/*
+	 * Protects ha_region_list, num_pages_onlined counter and individual
+	 * regions from ha_region_list.
+	 */
+	spinlock_t ha_lock;
 
 	/*
 	 * A list of hot-add regions.
@@ -571,18 +575,14 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 			      void *v)
 {
 	struct memory_notify *mem = (struct memory_notify *)v;
+	unsigned long flags;
 
 	switch (val) {
-	case MEM_GOING_ONLINE:
-		mutex_lock(&dm_device.ha_region_mutex);
-		break;
-
 	case MEM_ONLINE:
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
 		dm_device.num_pages_onlined += mem->nr_pages;
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 	case MEM_CANCEL_ONLINE:
-		if (val == MEM_ONLINE ||
-		    mutex_is_locked(&dm_device.ha_region_mutex))
-			mutex_unlock(&dm_device.ha_region_mutex);
 		if (dm_device.ha_waiting) {
 			dm_device.ha_waiting = false;
 			complete(&dm_device.ol_waitevent);
@@ -590,10 +590,11 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 		break;
 
 	case MEM_OFFLINE:
-		mutex_lock(&dm_device.ha_region_mutex);
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
 		dm_device.num_pages_onlined -= mem->nr_pages;
-		mutex_unlock(&dm_device.ha_region_mutex);
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 		break;
+	case MEM_GOING_ONLINE:
 	case MEM_GOING_OFFLINE:
 	case MEM_CANCEL_OFFLINE:
 		break;
@@ -629,9 +630,12 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 	unsigned long start_pfn;
 	unsigned long processed_pfn;
 	unsigned long total_pfn = pfn_count;
+	unsigned long flags;
 
 	for (i = 0; i < (size/HA_CHUNK); i++) {
 		start_pfn = start + (i * HA_CHUNK);
+
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
 		has->ha_end_pfn +=  HA_CHUNK;
 
 		if (total_pfn > HA_CHUNK) {
@@ -643,11 +647,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}
 
 		has->covered_end_pfn +=  processed_pfn;
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
 		init_completion(&dm_device.ol_waitevent);
 		dm_device.ha_waiting = !memhp_auto_online;
 
-		mutex_unlock(&dm_device.ha_region_mutex);
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
 				(HA_CHUNK << PAGE_SHIFT));
@@ -664,9 +668,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 				 */
 				do_hot_add = false;
 			}
+			spin_lock_irqsave(&dm_device.ha_lock, flags);
 			has->ha_end_pfn -= HA_CHUNK;
 			has->covered_end_pfn -=  processed_pfn;
-			mutex_lock(&dm_device.ha_region_mutex);
+			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 			break;
 		}
 
@@ -679,7 +684,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		 */
 		if (dm_device.ha_waiting)
 			wait_for_completion_timeout(&dm_device.ol_waitevent, 5*HZ);
-		mutex_lock(&dm_device.ha_region_mutex);
 		post_status(&dm_device);
 	}
 
@@ -693,9 +697,10 @@ static void hv_online_page(struct page *pg)
 	unsigned long cur_start_pgp;
 	unsigned long cur_end_pgp;
 	bool is_gap = false;
+	unsigned long flags;
 
-	list_for_each(cur, &dm_device.ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		cur_start_pgp = (unsigned long)
 			pfn_to_page(has->start_pfn);
 		cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
@@ -736,22 +741,19 @@ static void hv_online_page(struct page *pg)
 		__online_page_free(pg);
 		break;
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 {
-	struct list_head *cur;
 	struct hv_hotadd_state *has;
 	struct hv_hotadd_gap *gap;
 	unsigned long residual, new_inc;
 	int ret = 0;
+	unsigned long flags;
 
-	if (list_empty(&dm_device.ha_region_list))
-		return false;
-
-	list_for_each(cur, &dm_device.ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
-
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/*
 		 * If the pfn range we are dealing with is not in the current
 		 * "hot add block", move on.
@@ -765,8 +767,10 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 		 */
 		if (has->covered_end_pfn != start_pfn) {
 			gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
-			if (!gap)
-				return -ENOMEM;
+			if (!gap) {
+				ret = -ENOMEM;
+				break;
+			}
 
 			INIT_LIST_HEAD(&gap->list);
 			gap->start_pfn = has->covered_end_pfn;
@@ -792,10 +796,12 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 			has->end_pfn += new_inc;
 		}
 
-		return 1;
+		ret = 1;
+		break;
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
-	return 0;
+	return ret;
 }
 
 static unsigned long handle_pg_range(unsigned long pg_start,
@@ -804,17 +810,13 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 	unsigned long start_pfn = pg_start;
 	unsigned long pfn_cnt = pg_count;
 	unsigned long size;
-	struct list_head *cur;
 	struct hv_hotadd_state *has;
 	unsigned long pgs_ol = 0;
 	unsigned long old_covered_state;
+	unsigned long res = 0, flags;
 
-	if (list_empty(&dm_device.ha_region_list))
-		return 0;
-
-	list_for_each(cur, &dm_device.ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
-
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/*
 		 * If the pfn range we are dealing with is not in the current
 		 * "hot add block", move on.
@@ -864,17 +866,20 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 			} else {
 				pfn_cnt = size;
 			}
+			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 			hv_mem_hot_add(has->ha_end_pfn, size, pfn_cnt, has);
+			spin_lock_irqsave(&dm_device.ha_lock, flags);
 		}
 		/*
 		 * If we managed to online any pages that were given to us,
 		 * we declare success.
 		 */
-		return has->covered_end_pfn - old_covered_state;
-
+		res = has->covered_end_pfn - old_covered_state;
+		break;
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
-	return 0;
+	return res;
 }
 
 static unsigned long process_hot_add(unsigned long pg_start,
@@ -884,6 +889,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
 {
 	struct hv_hotadd_state *ha_region = NULL;
 	int covered;
+	unsigned long flags;
 
 	if (pfn_cnt == 0)
 		return 0;
@@ -909,12 +915,15 @@ static unsigned long process_hot_add(unsigned long pg_start,
 		INIT_LIST_HEAD(&ha_region->list);
 		INIT_LIST_HEAD(&ha_region->gap_list);
 
-		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
 		ha_region->start_pfn = rg_start;
 		ha_region->ha_end_pfn = rg_start;
 		ha_region->covered_start_pfn = pg_start;
 		ha_region->covered_end_pfn = pg_start;
 		ha_region->end_pfn = rg_start + rg_size;
+
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
+		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 	}
 
 do_pg_range:
@@ -941,7 +950,6 @@ static void hot_add_req(struct work_struct *dummy)
 	resp.hdr.size = sizeof(struct dm_hot_add_response);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-	mutex_lock(&dm_device.ha_region_mutex);
 	pg_start = dm->ha_wrk.ha_page_range.finfo.start_page;
 	pfn_cnt = dm->ha_wrk.ha_page_range.finfo.page_cnt;
 
@@ -975,7 +983,6 @@ static void hot_add_req(struct work_struct *dummy)
 						rg_start, rg_sz);
 
 	dm->num_pages_added += resp.page_count;
-	mutex_unlock(&dm_device.ha_region_mutex);
 #endif
 	/*
 	 * The result field of the response structure has the
@@ -1520,7 +1527,7 @@ static int balloon_probe(struct hv_device *dev,
 	init_completion(&dm_device.host_event);
 	init_completion(&dm_device.config_event);
 	INIT_LIST_HEAD(&dm_device.ha_region_list);
-	mutex_init(&dm_device.ha_region_mutex);
+	spin_lock_init(&dm_device.ha_lock);
 	INIT_WORK(&dm_device.balloon_wrk.wrk, balloon_up);
 	INIT_WORK(&dm_device.ha_wrk.wrk, hot_add_req);
 	dm_device.host_specified_ha_region = false;
@@ -1639,9 +1646,9 @@ probe_error0:
 static int balloon_remove(struct hv_device *dev)
 {
 	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
-	struct list_head *cur, *tmp;
-	struct hv_hotadd_state *has;
+	struct hv_hotadd_state *has, *tmp;
 	struct hv_hotadd_gap *gap, *tmp_gap;
+	unsigned long flags;
 
 	if (dm->num_pages_ballooned != 0)
 		pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
@@ -1656,8 +1663,8 @@ static int balloon_remove(struct hv_device *dev)
 	restore_online_page_callback(&hv_online_page);
 	unregister_memory_notifier(&hv_memory_nb);
 #endif
-	list_for_each_safe(cur, tmp, &dm->ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) {
 		list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
 			list_del(&gap->list);
 			kfree(gap);
@@ -1665,6 +1672,7 @@ static int balloon_remove(struct hv_device *dev)
 		list_del(&has->list);
 		kfree(has);
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-11 12:47 ` [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions Vitaly Kuznetsov
@ 2016-08-15 22:17   ` Alex Ng (LIS)
  2016-08-16  8:40     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Ng (LIS) @ 2016-08-15 22:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov, devel; +Cc: linux-kernel, KY Srinivasan, Haiyang Zhang

> @@ -676,35 +686,63 @@ static void hv_mem_hot_add(unsigned long start,
> unsigned long size,
> 
>  static void hv_online_page(struct page *pg)  {
> -	struct list_head *cur;
>  	struct hv_hotadd_state *has;
> +	struct hv_hotadd_gap *gap;
>  	unsigned long cur_start_pgp;
>  	unsigned long cur_end_pgp;
> +	bool is_gap = false;
> 
>  	list_for_each(cur, &dm_device.ha_region_list) {
>  		has = list_entry(cur, struct hv_hotadd_state, list);
>  		cur_start_pgp = (unsigned long)
> +			pfn_to_page(has->start_pfn);
> +		cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
> +
> +		/* The page belongs to a different HAS. */
> +		if (((unsigned long)pg < cur_start_pgp) ||
> +		    ((unsigned long)pg >= cur_end_pgp))
> +			continue;
> +
> +		cur_start_pgp = (unsigned long)
>  			pfn_to_page(has->covered_start_pfn);
>  		cur_end_pgp = (unsigned long)pfn_to_page(has-
> >covered_end_pfn);
> 
> -		if (((unsigned long)pg >= cur_start_pgp) &&
> -			((unsigned long)pg < cur_end_pgp)) {
> -			/*
> -			 * This frame is currently backed; online the
> -			 * page.
> -			 */
> -			__online_page_set_limits(pg);
> -			__online_page_increment_counters(pg);
> -			__online_page_free(pg);
> +		/* The page is not backed. */
> +		if (((unsigned long)pg < cur_start_pgp) ||
> +		    ((unsigned long)pg >= cur_end_pgp))
> +			continue;
> +
> +		/* Check for gaps. */
> +		list_for_each_entry(gap, &has->gap_list, list) {
> +			cur_start_pgp = (unsigned long)
> +				pfn_to_page(gap->start_pfn);
> +			cur_end_pgp = (unsigned long)
> +				pfn_to_page(gap->end_pfn);
> +			if (((unsigned long)pg >= cur_start_pgp) &&
> +			    ((unsigned long)pg < cur_end_pgp)) {
> +				is_gap = true;
> +				break;
> +			}
>  		}
> +
> +		if (is_gap)
> +			break;
> +
> +		/* This frame is currently backed; online the page. */
> +		__online_page_set_limits(pg);
> +		__online_page_increment_counters(pg);
> +		__online_page_free(pg);
> +		break;
>  	}
>  }
> 

We may need to add similar logic to check for gaps in hv_bring_pgs_online().

[...]
>  static unsigned long handle_pg_range(unsigned long pg_start, @@ -834,13
> +881,19 @@ static unsigned long process_hot_add(unsigned long pg_start,
>  					unsigned long rg_size)
>  {
>  	struct hv_hotadd_state *ha_region = NULL;
> +	int covered;
> 
>  	if (pfn_cnt == 0)
>  		return 0;
> 
> -	if (!dm_device.host_specified_ha_region)
> -		if (pfn_covered(pg_start, pfn_cnt))
> +	if (!dm_device.host_specified_ha_region) {
> +		covered = pfn_covered(pg_start, pfn_cnt);
> +		if (covered < 0)
> +			return 0;

If the hot-add pages aren't covered by any region, then shouldn't it fall through instead of returning?
That way the new ha_region can be added to the list and we hot-add the pages accordingly.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-15 22:17   ` Alex Ng (LIS)
@ 2016-08-16  8:40     ` Vitaly Kuznetsov
  2016-08-16 21:44       ` Alex Ng (LIS)
  0 siblings, 1 reply; 8+ messages in thread
From: Vitaly Kuznetsov @ 2016-08-16  8:40 UTC (permalink / raw)
  To: Alex Ng (LIS); +Cc: devel, linux-kernel, KY Srinivasan, Haiyang Zhang

"Alex Ng (LIS)" <alexng@microsoft.com> writes:

>> @@ -676,35 +686,63 @@ static void hv_mem_hot_add(unsigned long start,
>> unsigned long size,
>> 
>>  static void hv_online_page(struct page *pg)  {
>> -	struct list_head *cur;
>>  	struct hv_hotadd_state *has;
>> +	struct hv_hotadd_gap *gap;
>>  	unsigned long cur_start_pgp;
>>  	unsigned long cur_end_pgp;
>> +	bool is_gap = false;
>> 
>>  	list_for_each(cur, &dm_device.ha_region_list) {
>>  		has = list_entry(cur, struct hv_hotadd_state, list);
>>  		cur_start_pgp = (unsigned long)
>> +			pfn_to_page(has->start_pfn);
>> +		cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
>> +
>> +		/* The page belongs to a different HAS. */
>> +		if (((unsigned long)pg < cur_start_pgp) ||
>> +		    ((unsigned long)pg >= cur_end_pgp))
>> +			continue;
>> +
>> +		cur_start_pgp = (unsigned long)
>>  			pfn_to_page(has->covered_start_pfn);
>>  		cur_end_pgp = (unsigned long)pfn_to_page(has-
>> >covered_end_pfn);
>> 
>> -		if (((unsigned long)pg >= cur_start_pgp) &&
>> -			((unsigned long)pg < cur_end_pgp)) {
>> -			/*
>> -			 * This frame is currently backed; online the
>> -			 * page.
>> -			 */
>> -			__online_page_set_limits(pg);
>> -			__online_page_increment_counters(pg);
>> -			__online_page_free(pg);
>> +		/* The page is not backed. */
>> +		if (((unsigned long)pg < cur_start_pgp) ||
>> +		    ((unsigned long)pg >= cur_end_pgp))
>> +			continue;
>> +
>> +		/* Check for gaps. */
>> +		list_for_each_entry(gap, &has->gap_list, list) {
>> +			cur_start_pgp = (unsigned long)
>> +				pfn_to_page(gap->start_pfn);
>> +			cur_end_pgp = (unsigned long)
>> +				pfn_to_page(gap->end_pfn);
>> +			if (((unsigned long)pg >= cur_start_pgp) &&
>> +			    ((unsigned long)pg < cur_end_pgp)) {
>> +				is_gap = true;
>> +				break;
>> +			}
>>  		}
>> +
>> +		if (is_gap)
>> +			break;
>> +
>> +		/* This frame is currently backed; online the page. */
>> +		__online_page_set_limits(pg);
>> +		__online_page_increment_counters(pg);
>> +		__online_page_free(pg);
>> +		break;
>>  	}
>>  }
>> 
>
> We may need to add similar logic to check for gaps in hv_bring_pgs_online().
>
> [...]

Yes, probably, I'll take a look and try to refactor the onlinig code in
a separate function to avoid duplication.

>>  static unsigned long handle_pg_range(unsigned long pg_start, @@ -834,13
>> +881,19 @@ static unsigned long process_hot_add(unsigned long pg_start,
>>  					unsigned long rg_size)
>>  {
>>  	struct hv_hotadd_state *ha_region = NULL;
>> +	int covered;
>> 
>>  	if (pfn_cnt == 0)
>>  		return 0;
>> 
>> -	if (!dm_device.host_specified_ha_region)
>> -		if (pfn_covered(pg_start, pfn_cnt))
>> +	if (!dm_device.host_specified_ha_region) {
>> +		covered = pfn_covered(pg_start, pfn_cnt);
>> +		if (covered < 0)
>> +			return 0;
>
> If the hot-add pages aren't covered by any region, then shouldn't it fall through instead of returning?
> That way the new ha_region can be added to the list and we hot-add the
> pages accordingly.

I was under an impression this is impossible:
hot_add_req()/process_hot_add() will create a new region in this
case. 'covered < 0' was added to handle one particular error: failure to
allocate memory to record gap (struct hv_hotadd_gap) and I don't have a
better idea how to handle this: if we can't remember the gap we'll crash
later on onlining...

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-16  8:40     ` Vitaly Kuznetsov
@ 2016-08-16 21:44       ` Alex Ng (LIS)
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Ng (LIS) @ 2016-08-16 21:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: devel, linux-kernel, KY Srinivasan, Haiyang Zhang

> >>  static unsigned long handle_pg_range(unsigned long pg_start, @@ -
> 834,13
> >> +881,19 @@ static unsigned long process_hot_add(unsigned long
> pg_start,
> >>  					unsigned long rg_size)
> >>  {
> >>  	struct hv_hotadd_state *ha_region = NULL;
> >> +	int covered;
> >>
> >>  	if (pfn_cnt == 0)
> >>  		return 0;
> >>
> >> -	if (!dm_device.host_specified_ha_region)
> >> -		if (pfn_covered(pg_start, pfn_cnt))
> >> +	if (!dm_device.host_specified_ha_region) {
> >> +		covered = pfn_covered(pg_start, pfn_cnt);
> >> +		if (covered < 0)
> >> +			return 0;
> >
> > If the hot-add pages aren't covered by any region, then shouldn't it fall
> through instead of returning?
> > That way the new ha_region can be added to the list and we hot-add the
> > pages accordingly.
> 
> I was under an impression this is impossible:
> hot_add_req()/process_hot_add() will create a new region in this
> case. 'covered < 0' was added to handle one particular error: failure to
> allocate memory to record gap (struct hv_hotadd_gap) and I don't have a
> better idea how to handle this: if we can't remember the gap we'll crash
> later on onlining...
> 

You are correct. I misread your patch thinking "covered < 0" means the page range is not covered; when it is actually handling an error case.

> --
>   Vitaly

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-16 21:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 12:47 [PATCH v2 RESEND 0/4] Drivers: hv: balloon: fix WS2012 memory hotplug issues and do some cleanup Vitaly Kuznetsov
2016-08-11 12:47 ` [PATCH v2 RESEND 1/4] Drivers: hv: balloon: keep track of where ha_region starts Vitaly Kuznetsov
2016-08-11 12:47 ` [PATCH v2 RESEND 2/4] Drivers: hv: balloon: account for gaps in hot add regions Vitaly Kuznetsov
2016-08-15 22:17   ` Alex Ng (LIS)
2016-08-16  8:40     ` Vitaly Kuznetsov
2016-08-16 21:44       ` Alex Ng (LIS)
2016-08-11 12:47 ` [PATCH v2 RESEND 3/4] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled Vitaly Kuznetsov
2016-08-11 12:47 ` [PATCH v2 RESEND 4/4] Drivers: hv: balloon: replace ha_region_mutex with spinlock Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.