linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Drivers: hv: balloon: Miscellaneous fixes.
@ 2016-08-24 23:22 kys
  2016-08-24 23:23 ` [PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts kys
  0 siblings, 1 reply; 9+ messages in thread
From: kys @ 2016-08-24 23:22 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, alexng
  Cc: K. Y. Srinivasan

From: K. Y. Srinivasan <kys@microsoft.com>

Miscellaneous fixes to the balloon driver.

Alex Ng (1):
  Drivers: hv: balloon: Use available memory value in pressure report

Vitaly Kuznetsov (4):
  Drivers: hv: balloon: keep track of where ha_region starts
  Drivers: hv: balloon: account for gaps in hot add regions
  Drivers: hv: balloon: don't wait for ol_waitevent when
    memhp_auto_online is enabled
  Drivers: hv: balloon: replace ha_region_mutex with spinlock

 drivers/hv/hv_balloon.c |  254 ++++++++++++++++++++++++++++++-----------------
 1 files changed, 161 insertions(+), 93 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts
  2016-08-24 23:22 [PATCH 0/5] Drivers: hv: balloon: Miscellaneous fixes kys
@ 2016-08-24 23:23 ` kys
  2016-08-24 23:23   ` [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions kys
                     ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: kys @ 2016-08-24 23:23 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, alexng
  Cc: K. Y. Srinivasan

From: Vitaly Kuznetsov <vkuznets@redhat.com>

Windows 2012 (non-R2) does not specify hot add region in hot add requests
and the logic in hot_add_req() is trying to find a 128Mb-aligned region
covering the request. It may also happen that host's requests are not 128Mb
aligned and the created ha_region will start before the first specified
PFN. We can't online these non-present pages but we don't remember the real
start of the region.

This is a regression introduced by the commit 5abbbb75d733 ("Drivers: hv:
hv_balloon: don't lose memory when onlining order is not natural"). While
the idea of keeping the 'moving window' was wrong (as there is no guarantee
that hot add requests come ordered) we should still keep track of
covered_start_pfn. This is not a revert, the logic is different.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_balloon.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index df35fb7..4ae26d6 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -430,13 +430,14 @@ struct dm_info_msg {
  * currently hot added. We hot add in multiples of 128M
  * chunks; it is possible that we may not be able to bring
  * online all the pages in the region. The range
- * covered_end_pfn defines the pages that can
+ * covered_start_pfn:covered_end_pfn defines the pages that can
  * be brough online.
  */
 
 struct hv_hotadd_state {
 	struct list_head list;
 	unsigned long start_pfn;
+	unsigned long covered_start_pfn;
 	unsigned long covered_end_pfn;
 	unsigned long ha_end_pfn;
 	unsigned long end_pfn;
@@ -682,7 +683,8 @@ static void hv_online_page(struct page *pg)
 
 	list_for_each(cur, &dm_device.ha_region_list) {
 		has = list_entry(cur, struct hv_hotadd_state, list);
-		cur_start_pgp = (unsigned long)pfn_to_page(has->start_pfn);
+		cur_start_pgp = (unsigned long)
+			pfn_to_page(has->covered_start_pfn);
 		cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
 		if (((unsigned long)pg >= cur_start_pgp) &&
@@ -854,6 +856,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
 		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
 		ha_region->start_pfn = rg_start;
 		ha_region->ha_end_pfn = rg_start;
+		ha_region->covered_start_pfn = pg_start;
 		ha_region->covered_end_pfn = pg_start;
 		ha_region->end_pfn = rg_start + rg_size;
 	}
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-24 23:23 ` [PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts kys
@ 2016-08-24 23:23   ` kys
  2016-08-25  4:53     ` Yauheni Kaliuta
  2016-08-24 23:23   ` [PATCH 3/5] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled kys
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: kys @ 2016-08-24 23:23 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, alexng
  Cc: K. Y. Srinivasan

From: Vitaly Kuznetsov <vkuznets@redhat.com>

I'm observing the following hot add requests from the WS2012 host:

hot_add_req: start_pfn = 0x108200 count = 330752
hot_add_req: start_pfn = 0x158e00 count = 193536
hot_add_req: start_pfn = 0x188400 count = 239616

As the host doesn't specify hot add regions we're trying to create
128Mb-aligned region covering the first request, we create the 0x108000 -
0x160000 region and we add 0x108000 - 0x158e00 memory. The second request
passes the pfn_covered() check, we enlarge the region to 0x108000 -
0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the
third request as it starts at 0x188400 so there is a 0x200 gap which is
not covered. As the end of our region is 0x190000 now it again passes the
pfn_covered() check were we just adjust the covered_end_pfn and make it
0x188400 instead of 0x188200 which means that we'll try to online
0x188200-0x188400 pages but these pages were never assigned to us and we
crash.

We can't react to such requests by creating new hot add regions as it may
happen that the whole suggested range falls into the previously identified
128Mb-aligned area so we'll end up adding nothing or create intersecting
regions and our current logic doesn't allow that. Instead, create a list of
such 'gaps' and check for them in the page online callback.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_balloon.c |  131 +++++++++++++++++++++++++++++++++-------------
 1 files changed, 94 insertions(+), 37 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4ae26d6..18766f6 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -441,6 +441,16 @@ struct hv_hotadd_state {
 	unsigned long covered_end_pfn;
 	unsigned long ha_end_pfn;
 	unsigned long end_pfn;
+	/*
+	 * A list of gaps.
+	 */
+	struct list_head gap_list;
+};
+
+struct hv_hotadd_gap {
+	struct list_head list;
+	unsigned long start_pfn;
+	unsigned long end_pfn;
 };
 
 struct balloon_state {
@@ -596,18 +606,46 @@ static struct notifier_block hv_memory_nb = {
 	.priority = 0
 };
 
+/* Check if the particular page is backed and can be onlined and online it. */
+static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg)
+{
+	unsigned long cur_start_pgp;
+	unsigned long cur_end_pgp;
+	struct hv_hotadd_gap *gap;
+
+	cur_start_pgp = (unsigned long)pfn_to_page(has->covered_start_pfn);
+	cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
-static void hv_bring_pgs_online(unsigned long start_pfn, unsigned long size)
+	/* The page is not backed. */
+	if (((unsigned long)pg < cur_start_pgp) ||
+	    ((unsigned long)pg >= cur_end_pgp))
+		return;
+
+	/* Check for gaps. */
+	list_for_each_entry(gap, &has->gap_list, list) {
+		cur_start_pgp = (unsigned long)
+			pfn_to_page(gap->start_pfn);
+		cur_end_pgp = (unsigned long)
+			pfn_to_page(gap->end_pfn);
+		if (((unsigned long)pg >= cur_start_pgp) &&
+		    ((unsigned long)pg < cur_end_pgp)) {
+			return;
+		}
+	}
+
+	/* This frame is currently backed; online the page. */
+	__online_page_set_limits(pg);
+	__online_page_increment_counters(pg);
+	__online_page_free(pg);
+}
+
+static void hv_bring_pgs_online(struct hv_hotadd_state *has,
+				unsigned long start_pfn, unsigned long size)
 {
 	int i;
 
-	for (i = 0; i < size; i++) {
-		struct page *pg;
-		pg = pfn_to_page(start_pfn + i);
-		__online_page_set_limits(pg);
-		__online_page_increment_counters(pg);
-		__online_page_free(pg);
-	}
+	for (i = 0; i < size; i++)
+		hv_page_online_one(has, pfn_to_page(start_pfn + i));
 }
 
 static void hv_mem_hot_add(unsigned long start, unsigned long size,
@@ -684,26 +722,24 @@ static void hv_online_page(struct page *pg)
 	list_for_each(cur, &dm_device.ha_region_list) {
 		has = list_entry(cur, struct hv_hotadd_state, list);
 		cur_start_pgp = (unsigned long)
-			pfn_to_page(has->covered_start_pfn);
-		cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
+			pfn_to_page(has->start_pfn);
+		cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
 
-		if (((unsigned long)pg >= cur_start_pgp) &&
-			((unsigned long)pg < cur_end_pgp)) {
-			/*
-			 * This frame is currently backed; online the
-			 * page.
-			 */
-			__online_page_set_limits(pg);
-			__online_page_increment_counters(pg);
-			__online_page_free(pg);
-		}
+		/* The page belongs to a different HAS. */
+		if (((unsigned long)pg < cur_start_pgp) ||
+		    ((unsigned long)pg >= cur_end_pgp))
+			continue;
+
+		hv_page_online_one(has, pg);
+		break;
 	}
 }
 
-static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
+static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 {
 	struct list_head *cur;
 	struct hv_hotadd_state *has;
+	struct hv_hotadd_gap *gap;
 	unsigned long residual, new_inc;
 
 	if (list_empty(&dm_device.ha_region_list))
@@ -718,6 +754,24 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 		 */
 		if (start_pfn < has->start_pfn || start_pfn >= has->end_pfn)
 			continue;
+
+		/*
+		 * If the current start pfn is not where the covered_end
+		 * is, create a gap and update covered_end_pfn.
+		 */
+		if (has->covered_end_pfn != start_pfn) {
+			gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
+			if (!gap)
+				return -ENOMEM;
+
+			INIT_LIST_HEAD(&gap->list);
+			gap->start_pfn = has->covered_end_pfn;
+			gap->end_pfn = start_pfn;
+			list_add_tail(&gap->list, &has->gap_list);
+
+			has->covered_end_pfn = start_pfn;
+		}
+
 		/*
 		 * If the current hot add-request extends beyond
 		 * our current limit; extend it.
@@ -734,19 +788,10 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 			has->end_pfn += new_inc;
 		}
 
-		/*
-		 * If the current start pfn is not where the covered_end
-		 * is, update it.
-		 */
-
-		if (has->covered_end_pfn != start_pfn)
-			has->covered_end_pfn = start_pfn;
-
-		return true;
-
+		return 1;
 	}
 
-	return false;
+	return 0;
 }
 
 static unsigned long handle_pg_range(unsigned long pg_start,
@@ -785,6 +830,8 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 			if (pgs_ol > pfn_cnt)
 				pgs_ol = pfn_cnt;
 
+			has->covered_end_pfn +=  pgs_ol;
+			pfn_cnt -= pgs_ol;
 			/*
 			 * Check if the corresponding memory block is already
 			 * online by checking its last previously backed page.
@@ -793,10 +840,8 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 			 */
 			if (start_pfn > has->start_pfn &&
 			    !PageReserved(pfn_to_page(start_pfn - 1)))
-				hv_bring_pgs_online(start_pfn, pgs_ol);
+				hv_bring_pgs_online(has, start_pfn, pgs_ol);
 
-			has->covered_end_pfn +=  pgs_ol;
-			pfn_cnt -= pgs_ol;
 		}
 
 		if ((has->ha_end_pfn < has->end_pfn) && (pfn_cnt > 0)) {
@@ -834,13 +879,19 @@ static unsigned long process_hot_add(unsigned long pg_start,
 					unsigned long rg_size)
 {
 	struct hv_hotadd_state *ha_region = NULL;
+	int covered;
 
 	if (pfn_cnt == 0)
 		return 0;
 
-	if (!dm_device.host_specified_ha_region)
-		if (pfn_covered(pg_start, pfn_cnt))
+	if (!dm_device.host_specified_ha_region) {
+		covered = pfn_covered(pg_start, pfn_cnt);
+		if (covered < 0)
+			return 0;
+
+		if (covered)
 			goto do_pg_range;
+	}
 
 	/*
 	 * If the host has specified a hot-add range; deal with it first.
@@ -852,6 +903,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
 			return 0;
 
 		INIT_LIST_HEAD(&ha_region->list);
+		INIT_LIST_HEAD(&ha_region->gap_list);
 
 		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
 		ha_region->start_pfn = rg_start;
@@ -1585,6 +1637,7 @@ static int balloon_remove(struct hv_device *dev)
 	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
 	struct list_head *cur, *tmp;
 	struct hv_hotadd_state *has;
+	struct hv_hotadd_gap *gap, *tmp_gap;
 
 	if (dm->num_pages_ballooned != 0)
 		pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
@@ -1601,6 +1654,10 @@ static int balloon_remove(struct hv_device *dev)
 #endif
 	list_for_each_safe(cur, tmp, &dm->ha_region_list) {
 		has = list_entry(cur, struct hv_hotadd_state, list);
+		list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
+			list_del(&gap->list);
+			kfree(gap);
+		}
 		list_del(&has->list);
 		kfree(has);
 	}
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled
  2016-08-24 23:23 ` [PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts kys
  2016-08-24 23:23   ` [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions kys
@ 2016-08-24 23:23   ` kys
  2016-08-24 23:23   ` [PATCH 4/5] Drivers: hv: balloon: replace ha_region_mutex with spinlock kys
  2016-08-24 23:23   ` [PATCH 5/5] Drivers: hv: balloon: Use available memory value in pressure report kys
  3 siblings, 0 replies; 9+ messages in thread
From: kys @ 2016-08-24 23:23 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, alexng
  Cc: K. Y. Srinivasan

From: Vitaly Kuznetsov <vkuznets@redhat.com>

With the recently introduced in-kernel memory onlining
(MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
to come online in the driver and we can get rid of the waiting.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_balloon.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 18766f6..3441326 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -673,7 +673,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		has->covered_end_pfn +=  processed_pfn;
 
 		init_completion(&dm_device.ol_waitevent);
-		dm_device.ha_waiting = true;
+		dm_device.ha_waiting = !memhp_auto_online;
 
 		mutex_unlock(&dm_device.ha_region_mutex);
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
@@ -699,12 +699,15 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}
 
 		/*
-		 * Wait for the memory block to be onlined.
-		 * Since the hot add has succeeded, it is ok to
-		 * proceed even if the pages in the hot added region
-		 * have not been "onlined" within the allowed time.
+		 * Wait for the memory block to be onlined when memory onlining
+		 * is done outside of kernel (memhp_auto_online). Since the hot
+		 * add has succeeded, it is ok to proceed even if the pages in
+		 * the hot added region have not been "onlined" within the
+		 * allowed time.
 		 */
-		wait_for_completion_timeout(&dm_device.ol_waitevent, 5*HZ);
+		if (dm_device.ha_waiting)
+			wait_for_completion_timeout(&dm_device.ol_waitevent,
+						    5*HZ);
 		mutex_lock(&dm_device.ha_region_mutex);
 		post_status(&dm_device);
 	}
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] Drivers: hv: balloon: replace ha_region_mutex with spinlock
  2016-08-24 23:23 ` [PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts kys
  2016-08-24 23:23   ` [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions kys
  2016-08-24 23:23   ` [PATCH 3/5] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled kys
@ 2016-08-24 23:23   ` kys
  2016-08-24 23:23   ` [PATCH 5/5] Drivers: hv: balloon: Use available memory value in pressure report kys
  3 siblings, 0 replies; 9+ messages in thread
From: kys @ 2016-08-24 23:23 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, alexng
  Cc: K. Y. Srinivasan

From: Vitaly Kuznetsov <vkuznets@redhat.com>

lockdep reports possible circular locking dependency when udev is used
for memory onlining:

 systemd-udevd/3996 is trying to acquire lock:
  ((memory_chain).rwsem){++++.+}, at: [<ffffffff810d137e>] __blocking_notifier_call_chain+0x4e/0xc0

 but task is already holding lock:
  (&dm_device.ha_region_mutex){+.+.+.}, at: [<ffffffffa015382e>] hv_memory_notifier+0x5e/0xc0 [hv_balloon]
 ...

which is probably a false positive because we take and release
ha_region_mutex from memory notifier chain depending on the arg. No real
deadlocks were reported so far (though I'm not really sure about
preemptible kernels...) but we don't really need to hold the mutex
for so long. We use it to protect ha_region_list (and its members) and the
num_pages_onlined counter. None of these operations require us to sleep
and nothing is slow, switch to using spinlock with interrupts disabled.

While on it, replace list_for_each -> list_for_each_entry as we actually
need entries in all these cases, drop meaningless list_empty() checks.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_balloon.c |   98 +++++++++++++++++++++++++---------------------
 1 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 3441326..d55e0e7 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -547,7 +547,11 @@ struct hv_dynmem_device {
 	 */
 	struct task_struct *thread;
 
-	struct mutex ha_region_mutex;
+	/*
+	 * Protects ha_region_list, num_pages_onlined counter and individual
+	 * regions from ha_region_list.
+	 */
+	spinlock_t ha_lock;
 
 	/*
 	 * A list of hot-add regions.
@@ -571,18 +575,14 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 			      void *v)
 {
 	struct memory_notify *mem = (struct memory_notify *)v;
+	unsigned long flags;
 
 	switch (val) {
-	case MEM_GOING_ONLINE:
-		mutex_lock(&dm_device.ha_region_mutex);
-		break;
-
 	case MEM_ONLINE:
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
 		dm_device.num_pages_onlined += mem->nr_pages;
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 	case MEM_CANCEL_ONLINE:
-		if (val == MEM_ONLINE ||
-		    mutex_is_locked(&dm_device.ha_region_mutex))
-			mutex_unlock(&dm_device.ha_region_mutex);
 		if (dm_device.ha_waiting) {
 			dm_device.ha_waiting = false;
 			complete(&dm_device.ol_waitevent);
@@ -590,10 +590,11 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 		break;
 
 	case MEM_OFFLINE:
-		mutex_lock(&dm_device.ha_region_mutex);
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
 		dm_device.num_pages_onlined -= mem->nr_pages;
-		mutex_unlock(&dm_device.ha_region_mutex);
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 		break;
+	case MEM_GOING_ONLINE:
 	case MEM_GOING_OFFLINE:
 	case MEM_CANCEL_OFFLINE:
 		break;
@@ -657,9 +658,12 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 	unsigned long start_pfn;
 	unsigned long processed_pfn;
 	unsigned long total_pfn = pfn_count;
+	unsigned long flags;
 
 	for (i = 0; i < (size/HA_CHUNK); i++) {
 		start_pfn = start + (i * HA_CHUNK);
+
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
 		has->ha_end_pfn +=  HA_CHUNK;
 
 		if (total_pfn > HA_CHUNK) {
@@ -671,11 +675,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}
 
 		has->covered_end_pfn +=  processed_pfn;
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
 		init_completion(&dm_device.ol_waitevent);
 		dm_device.ha_waiting = !memhp_auto_online;
 
-		mutex_unlock(&dm_device.ha_region_mutex);
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
 				(HA_CHUNK << PAGE_SHIFT));
@@ -692,9 +696,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 				 */
 				do_hot_add = false;
 			}
+			spin_lock_irqsave(&dm_device.ha_lock, flags);
 			has->ha_end_pfn -= HA_CHUNK;
 			has->covered_end_pfn -=  processed_pfn;
-			mutex_lock(&dm_device.ha_region_mutex);
+			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 			break;
 		}
 
@@ -708,7 +713,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		if (dm_device.ha_waiting)
 			wait_for_completion_timeout(&dm_device.ol_waitevent,
 						    5*HZ);
-		mutex_lock(&dm_device.ha_region_mutex);
 		post_status(&dm_device);
 	}
 
@@ -717,13 +721,13 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 static void hv_online_page(struct page *pg)
 {
-	struct list_head *cur;
 	struct hv_hotadd_state *has;
 	unsigned long cur_start_pgp;
 	unsigned long cur_end_pgp;
+	unsigned long flags;
 
-	list_for_each(cur, &dm_device.ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		cur_start_pgp = (unsigned long)
 			pfn_to_page(has->start_pfn);
 		cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
@@ -736,21 +740,19 @@ static void hv_online_page(struct page *pg)
 		hv_page_online_one(has, pg);
 		break;
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 {
-	struct list_head *cur;
 	struct hv_hotadd_state *has;
 	struct hv_hotadd_gap *gap;
 	unsigned long residual, new_inc;
+	int ret = 0;
+	unsigned long flags;
 
-	if (list_empty(&dm_device.ha_region_list))
-		return false;
-
-	list_for_each(cur, &dm_device.ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
-
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/*
 		 * If the pfn range we are dealing with is not in the current
 		 * "hot add block", move on.
@@ -764,8 +766,10 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 		 */
 		if (has->covered_end_pfn != start_pfn) {
 			gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
-			if (!gap)
-				return -ENOMEM;
+			if (!gap) {
+				ret = -ENOMEM;
+				break;
+			}
 
 			INIT_LIST_HEAD(&gap->list);
 			gap->start_pfn = has->covered_end_pfn;
@@ -791,10 +795,12 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 			has->end_pfn += new_inc;
 		}
 
-		return 1;
+		ret = 1;
+		break;
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
-	return 0;
+	return ret;
 }
 
 static unsigned long handle_pg_range(unsigned long pg_start,
@@ -803,17 +809,13 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 	unsigned long start_pfn = pg_start;
 	unsigned long pfn_cnt = pg_count;
 	unsigned long size;
-	struct list_head *cur;
 	struct hv_hotadd_state *has;
 	unsigned long pgs_ol = 0;
 	unsigned long old_covered_state;
+	unsigned long res = 0, flags;
 
-	if (list_empty(&dm_device.ha_region_list))
-		return 0;
-
-	list_for_each(cur, &dm_device.ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
-
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/*
 		 * If the pfn range we are dealing with is not in the current
 		 * "hot add block", move on.
@@ -863,17 +865,20 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 			} else {
 				pfn_cnt = size;
 			}
+			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 			hv_mem_hot_add(has->ha_end_pfn, size, pfn_cnt, has);
+			spin_lock_irqsave(&dm_device.ha_lock, flags);
 		}
 		/*
 		 * If we managed to online any pages that were given to us,
 		 * we declare success.
 		 */
-		return has->covered_end_pfn - old_covered_state;
-
+		res = has->covered_end_pfn - old_covered_state;
+		break;
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
-	return 0;
+	return res;
 }
 
 static unsigned long process_hot_add(unsigned long pg_start,
@@ -883,6 +888,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
 {
 	struct hv_hotadd_state *ha_region = NULL;
 	int covered;
+	unsigned long flags;
 
 	if (pfn_cnt == 0)
 		return 0;
@@ -908,12 +914,15 @@ static unsigned long process_hot_add(unsigned long pg_start,
 		INIT_LIST_HEAD(&ha_region->list);
 		INIT_LIST_HEAD(&ha_region->gap_list);
 
-		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
 		ha_region->start_pfn = rg_start;
 		ha_region->ha_end_pfn = rg_start;
 		ha_region->covered_start_pfn = pg_start;
 		ha_region->covered_end_pfn = pg_start;
 		ha_region->end_pfn = rg_start + rg_size;
+
+		spin_lock_irqsave(&dm_device.ha_lock, flags);
+		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
+		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 	}
 
 do_pg_range:
@@ -940,7 +949,6 @@ static void hot_add_req(struct work_struct *dummy)
 	resp.hdr.size = sizeof(struct dm_hot_add_response);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-	mutex_lock(&dm_device.ha_region_mutex);
 	pg_start = dm->ha_wrk.ha_page_range.finfo.start_page;
 	pfn_cnt = dm->ha_wrk.ha_page_range.finfo.page_cnt;
 
@@ -974,7 +982,6 @@ static void hot_add_req(struct work_struct *dummy)
 						rg_start, rg_sz);
 
 	dm->num_pages_added += resp.page_count;
-	mutex_unlock(&dm_device.ha_region_mutex);
 #endif
 	/*
 	 * The result field of the response structure has the
@@ -1519,7 +1526,7 @@ static int balloon_probe(struct hv_device *dev,
 	init_completion(&dm_device.host_event);
 	init_completion(&dm_device.config_event);
 	INIT_LIST_HEAD(&dm_device.ha_region_list);
-	mutex_init(&dm_device.ha_region_mutex);
+	spin_lock_init(&dm_device.ha_lock);
 	INIT_WORK(&dm_device.balloon_wrk.wrk, balloon_up);
 	INIT_WORK(&dm_device.ha_wrk.wrk, hot_add_req);
 	dm_device.host_specified_ha_region = false;
@@ -1638,9 +1645,9 @@ probe_error0:
 static int balloon_remove(struct hv_device *dev)
 {
 	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
-	struct list_head *cur, *tmp;
-	struct hv_hotadd_state *has;
+	struct hv_hotadd_state *has, *tmp;
 	struct hv_hotadd_gap *gap, *tmp_gap;
+	unsigned long flags;
 
 	if (dm->num_pages_ballooned != 0)
 		pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
@@ -1655,8 +1662,8 @@ static int balloon_remove(struct hv_device *dev)
 	restore_online_page_callback(&hv_online_page);
 	unregister_memory_notifier(&hv_memory_nb);
 #endif
-	list_for_each_safe(cur, tmp, &dm->ha_region_list) {
-		has = list_entry(cur, struct hv_hotadd_state, list);
+	spin_lock_irqsave(&dm_device.ha_lock, flags);
+	list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) {
 		list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
 			list_del(&gap->list);
 			kfree(gap);
@@ -1664,6 +1671,7 @@ static int balloon_remove(struct hv_device *dev)
 		list_del(&has->list);
 		kfree(has);
 	}
+	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
 	return 0;
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] Drivers: hv: balloon: Use available memory value in pressure report
  2016-08-24 23:23 ` [PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts kys
                     ` (2 preceding siblings ...)
  2016-08-24 23:23   ` [PATCH 4/5] Drivers: hv: balloon: replace ha_region_mutex with spinlock kys
@ 2016-08-24 23:23   ` kys
  3 siblings, 0 replies; 9+ messages in thread
From: kys @ 2016-08-24 23:23 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, alexng
  Cc: Alex Ng, K. Y. Srinivasan

From: Alex Ng <alexng@messages.microsoft.com>

Reports for available memory should use the si_mem_available() value.
The previous freeram value does not include available page cache memory.

Signed-off-by: Alex Ng <alexng@messages.microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_balloon.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index d55e0e7..fdf8da9 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1075,7 +1075,6 @@ static unsigned long compute_balloon_floor(void)
 static void post_status(struct hv_dynmem_device *dm)
 {
 	struct dm_status status;
-	struct sysinfo val;
 	unsigned long now = jiffies;
 	unsigned long last_post = last_post_time;
 
@@ -1087,7 +1086,6 @@ static void post_status(struct hv_dynmem_device *dm)
 	if (!time_after(now, (last_post_time + HZ)))
 		return;
 
-	si_meminfo(&val);
 	memset(&status, 0, sizeof(struct dm_status));
 	status.hdr.type = DM_STATUS_REPORT;
 	status.hdr.size = sizeof(struct dm_status);
@@ -1103,7 +1101,7 @@ static void post_status(struct hv_dynmem_device *dm)
 	 * num_pages_onlined) as committed to the host, otherwise it can try
 	 * asking us to balloon them out.
 	 */
-	status.num_avail = val.freeram;
+	status.num_avail = si_mem_available();
 	status.num_committed = vm_memory_committed() +
 		dm->num_pages_ballooned +
 		(dm->num_pages_added > dm->num_pages_onlined ?
@@ -1209,7 +1207,7 @@ static void balloon_up(struct work_struct *dummy)
 	int ret;
 	bool done = false;
 	int i;
-	struct sysinfo val;
+	long avail_pages;
 	unsigned long floor;
 
 	/* The host balloons pages in 2M granularity. */
@@ -1221,12 +1219,12 @@ static void balloon_up(struct work_struct *dummy)
 	 */
 	alloc_unit = 512;
 
-	si_meminfo(&val);
+	avail_pages = si_mem_available();
 	floor = compute_balloon_floor();
 
 	/* Refuse to balloon below the floor, keep the 2M granularity. */
-	if (val.freeram < num_pages || val.freeram - num_pages < floor) {
-		num_pages = val.freeram > floor ? (val.freeram - floor) : 0;
+	if (avail_pages < num_pages || avail_pages - num_pages < floor) {
+		num_pages = avail_pages > floor ? (avail_pages - floor) : 0;
 		num_pages -= num_pages % PAGES_IN_2M;
 	}
 
@@ -1237,7 +1235,6 @@ static void balloon_up(struct work_struct *dummy)
 		bl_resp->hdr.size = sizeof(struct dm_balloon_response);
 		bl_resp->more_pages = 1;
 
-
 		num_pages -= num_ballooned;
 		num_ballooned = alloc_balloon_pages(&dm_device, num_pages,
 						    bl_resp, alloc_unit);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-24 23:23   ` [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions kys
@ 2016-08-25  4:53     ` Yauheni Kaliuta
  2016-08-25 12:13       ` Vitaly Kuznetsov
  2016-08-25 17:00       ` KY Srinivasan
  0 siblings, 2 replies; 9+ messages in thread
From: Yauheni Kaliuta @ 2016-08-25  4:53 UTC (permalink / raw)
  To: kys
  Cc: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang,
	leann.ogasawara, alexng

Hi, kys!

>>>>> On Wed, 24 Aug 2016 16:23:10 -0700, kys   wrote:

[...]

 > -static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 > +static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 >  {
 >  	struct list_head *cur;
 >  	struct hv_hotadd_state *has;
 > +	struct hv_hotadd_gap *gap;
 >  	unsigned long residual, new_inc;
 
 >  	if (list_empty(&dm_device.ha_region_list))

One "return false;" left here.


[...]

 > -		if (has->covered_end_pfn != start_pfn)
 > -			has->covered_end_pfn = start_pfn;
 > -
 > -		return true;
 > -
 > +		return 1;
 >  	}
 
 > -	return false;
 > +	return 0;
 >  }

[...]

-- 
WBR,
Yauheni Kaliuta

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-25  4:53     ` Yauheni Kaliuta
@ 2016-08-25 12:13       ` Vitaly Kuznetsov
  2016-08-25 17:00       ` KY Srinivasan
  1 sibling, 0 replies; 9+ messages in thread
From: Vitaly Kuznetsov @ 2016-08-25 12:13 UTC (permalink / raw)
  To: Yauheni Kaliuta
  Cc: kys, gregkh, linux-kernel, devel, olaf, apw, jasowang,
	leann.ogasawara, alexng

Yauheni Kaliuta <yauheni.kaliuta@redhat.com> writes:

> Hi, kys!
>
>>>>>> On Wed, 24 Aug 2016 16:23:10 -0700, kys   wrote:
>
> [...]
>
>  > -static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
>  > +static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
>  >  {
>  >  	struct list_head *cur;
>  >  	struct hv_hotadd_state *has;
>  > +	struct hv_hotadd_gap *gap;
>  >  	unsigned long residual, new_inc;
>
>  >  	if (list_empty(&dm_device.ha_region_list))
>
> One "return false;" left here.
>

This is removed in patch 4/5 of this series and false is 0 anyway)

[...]

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions
  2016-08-25  4:53     ` Yauheni Kaliuta
  2016-08-25 12:13       ` Vitaly Kuznetsov
@ 2016-08-25 17:00       ` KY Srinivasan
  1 sibling, 0 replies; 9+ messages in thread
From: KY Srinivasan @ 2016-08-25 17:00 UTC (permalink / raw)
  To: Yauheni Kaliuta, kys
  Cc: olaf, gregkh, jasowang, Alex Ng (LIS),
	linux-kernel, apw, devel, leann.ogasawara



> -----Original Message-----
> From: devel [mailto:driverdev-devel-bounces@linuxdriverproject.org] On
> Behalf Of Yauheni Kaliuta
> Sent: Wednesday, August 24, 2016 9:54 PM
> To: kys@exchange.microsoft.com
> Cc: olaf@aepfle.de; gregkh@linuxfoundation.org; jasowang@redhat.com; Alex
> Ng (LIS) <alexng@microsoft.com>; linux-kernel@vger.kernel.org;
> apw@canonical.com; devel@linuxdriverproject.org;
> leann.ogasawara@canonical.com
> Subject: Re: [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add
> regions
> 
> Hi, kys!
> 
> >>>>> On Wed, 24 Aug 2016 16:23:10 -0700, kys   wrote:
> 
> [...]
> 
>  > -static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
>  > +static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
>  >  {
>  >  	struct list_head *cur;
>  >  	struct hv_hotadd_state *has;
>  > +	struct hv_hotadd_gap *gap;
>  >  	unsigned long residual, new_inc;
> 
>  >  	if (list_empty(&dm_device.ha_region_list))
> 
> One "return false;" left here.

It is fixed up later in the series and the series is bisectable.

K. Y

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-25 17:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-24 23:22 [PATCH 0/5] Drivers: hv: balloon: Miscellaneous fixes kys
2016-08-24 23:23 ` [PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts kys
2016-08-24 23:23   ` [PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions kys
2016-08-25  4:53     ` Yauheni Kaliuta
2016-08-25 12:13       ` Vitaly Kuznetsov
2016-08-25 17:00       ` KY Srinivasan
2016-08-24 23:23   ` [PATCH 3/5] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled kys
2016-08-24 23:23   ` [PATCH 4/5] Drivers: hv: balloon: replace ha_region_mutex with spinlock kys
2016-08-24 23:23   ` [PATCH 5/5] Drivers: hv: balloon: Use available memory value in pressure report kys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).