linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA
@ 2012-08-13 11:12 Xiao Guangrong
  2012-08-13 11:13 ` [PATCH 01/12] thp: fix the count of THP_COLLAPSE_ALLOC Xiao Guangrong
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Hugh Dickins, David Rientjes, LKML,
	Linux Memory Management List

khugepaged_mutex is used very complexly and there are too many small pieces
of the code depend on CONFIG_NUMA, they make the code very hardly understand

This patchset try to optimize use of khugepaged_mutex and reduce dependence
of CONFIG_NUMA, after the patchset, the code is more readable


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/12] thp: fix the count of THP_COLLAPSE_ALLOC
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
@ 2012-08-13 11:13 ` Xiao Guangrong
  2012-08-13 11:19   ` Kirill A. Shutemov
  2012-08-13 11:13 ` [PATCH 02/12] thp: remove unnecessary check in start_khugepaged Xiao Guangrong
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:13 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

THP_COLLAPSE_ALLOC is double counted if NUMA is disabled since it has
already been calculated in khugepaged_alloc_hugepage

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..80bcd42 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1880,9 +1880,9 @@ static void collapse_huge_page(struct mm_struct *mm,
 		*hpage = ERR_PTR(-ENOMEM);
 		return;
 	}
+	count_vm_event(THP_COLLAPSE_ALLOC);
 #endif

-	count_vm_event(THP_COLLAPSE_ALLOC);
 	if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
 #ifdef CONFIG_NUMA
 		put_page(new_page);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/12] thp: remove unnecessary check in start_khugepaged
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
  2012-08-13 11:13 ` [PATCH 01/12] thp: fix the count of THP_COLLAPSE_ALLOC Xiao Guangrong
@ 2012-08-13 11:13 ` Xiao Guangrong
  2012-08-13 11:14 ` [PATCH 03/12] thp: move khugepaged_mutex out of khugepaged Xiao Guangrong
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:13 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

The check is unnecessary since if mm_slot_cache or mm_slots_hash initialize
failed, no sysfs interface will be created

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 80bcd42..399e8c9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -140,10 +140,7 @@ static int start_khugepaged(void)
 	int err = 0;
 	if (khugepaged_enabled()) {
 		int wakeup;
-		if (unlikely(!mm_slot_cache || !mm_slots_hash)) {
-			err = -ENOMEM;
-			goto out;
-		}
+
 		mutex_lock(&khugepaged_mutex);
 		if (!khugepaged_thread)
 			khugepaged_thread = kthread_run(khugepaged, NULL,
@@ -163,7 +160,7 @@ static int start_khugepaged(void)
 	} else
 		/* wakeup to exit */
 		wake_up_interruptible(&khugepaged_wait);
-out:
+
 	return err;
 }

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/12] thp: move khugepaged_mutex out of khugepaged
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
  2012-08-13 11:13 ` [PATCH 01/12] thp: fix the count of THP_COLLAPSE_ALLOC Xiao Guangrong
  2012-08-13 11:13 ` [PATCH 02/12] thp: remove unnecessary check in start_khugepaged Xiao Guangrong
@ 2012-08-13 11:14 ` Xiao Guangrong
  2012-08-13 11:14 ` [PATCH 04/12] thp: remove unnecessary khugepaged_thread check Xiao Guangrong
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:14 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

Currently, hugepaged_mutex is used really complexly and hard to understand,
actually, it is just used to serialize start_khugepaged and khugepaged for
these reasons:
- khugepaged_thread is shared between them
- the thp disable path (echo never > transparent_hugepage/enabled) is
  nonblocking, so we need to protect khugepaged_thread to get a stable
  running state

These can be avoided by:
- use the lock to serialize the thread creation and cancel
- thp disable path can not finised until the thread exits

Then khugepaged_thread is fully controlled by start_khugepaged, khugepaged
will be happy without the lock

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |   36 +++++++++++++-----------------------
 1 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 399e8c9..3715c52 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -139,9 +139,6 @@ static int start_khugepaged(void)
 {
 	int err = 0;
 	if (khugepaged_enabled()) {
-		int wakeup;
-
-		mutex_lock(&khugepaged_mutex);
 		if (!khugepaged_thread)
 			khugepaged_thread = kthread_run(khugepaged, NULL,
 							"khugepaged");
@@ -151,15 +148,17 @@ static int start_khugepaged(void)
 			err = PTR_ERR(khugepaged_thread);
 			khugepaged_thread = NULL;
 		}
-		wakeup = !list_empty(&khugepaged_scan.mm_head);
-		mutex_unlock(&khugepaged_mutex);
-		if (wakeup)
+
+		if (!list_empty(&khugepaged_scan.mm_head))
 			wake_up_interruptible(&khugepaged_wait);

 		set_recommended_min_free_kbytes();
-	} else
+	} else if (khugepaged_thread) {
 		/* wakeup to exit */
 		wake_up_interruptible(&khugepaged_wait);
+		kthread_stop(khugepaged_thread);
+		khugepaged_thread = NULL;
+	}

 	return err;
 }
@@ -221,7 +220,12 @@ static ssize_t enabled_store(struct kobject *kobj,
 				TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG);

 	if (ret > 0) {
-		int err = start_khugepaged();
+		int err;
+
+		mutex_lock(&khugepaged_mutex);
+		err = start_khugepaged();
+		mutex_unlock(&khugepaged_mutex);
+
 		if (err)
 			ret = err;
 	}
@@ -2344,20 +2348,10 @@ static int khugepaged(void *none)
 	set_freezable();
 	set_user_nice(current, 19);

-	/* serialize with start_khugepaged() */
-	mutex_lock(&khugepaged_mutex);
-
-	for (;;) {
-		mutex_unlock(&khugepaged_mutex);
+	while (!kthread_should_stop()) {
 		VM_BUG_ON(khugepaged_thread != current);
 		khugepaged_loop();
 		VM_BUG_ON(khugepaged_thread != current);
-
-		mutex_lock(&khugepaged_mutex);
-		if (!khugepaged_enabled())
-			break;
-		if (unlikely(kthread_should_stop()))
-			break;
 	}

 	spin_lock(&khugepaged_mm_lock);
@@ -2366,10 +2360,6 @@ static int khugepaged(void *none)
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-
-	khugepaged_thread = NULL;
-	mutex_unlock(&khugepaged_mutex);
-
 	return 0;
 }

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/12] thp: remove unnecessary khugepaged_thread check
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (2 preceding siblings ...)
  2012-08-13 11:14 ` [PATCH 03/12] thp: move khugepaged_mutex out of khugepaged Xiao Guangrong
@ 2012-08-13 11:14 ` Xiao Guangrong
  2012-08-13 11:14 ` [PATCH 05/12] thp: remove wake_up_interruptible in the exit path Xiao Guangrong
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:14 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

Now, khugepaged creation and cancel are completely serial under the
protection of khugepaged_mutex, it is impossible that many khugepaged
entities are running

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3715c52..b218700 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2348,11 +2348,8 @@ static int khugepaged(void *none)
 	set_freezable();
 	set_user_nice(current, 19);

-	while (!kthread_should_stop()) {
-		VM_BUG_ON(khugepaged_thread != current);
+	while (!kthread_should_stop())
 		khugepaged_loop();
-		VM_BUG_ON(khugepaged_thread != current);
-	}

 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/12] thp: remove wake_up_interruptible in the exit path
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (3 preceding siblings ...)
  2012-08-13 11:14 ` [PATCH 04/12] thp: remove unnecessary khugepaged_thread check Xiao Guangrong
@ 2012-08-13 11:14 ` Xiao Guangrong
  2012-08-13 11:15 ` [PATCH 06/12] thp: remove some code depend on CONFIG_NUMA Xiao Guangrong
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:14 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

Add the check of kthread_should_stop() to the conditions which are used
to wakeup on khugepaged_wait, then kthread_stop is enough to let the
thread exit

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |   35 +++++++++++++++++++++--------------
 1 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b218700..86f71af 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -154,8 +154,6 @@ static int start_khugepaged(void)

 		set_recommended_min_free_kbytes();
 	} else if (khugepaged_thread) {
-		/* wakeup to exit */
-		wake_up_interruptible(&khugepaged_wait);
 		kthread_stop(khugepaged_thread);
 		khugepaged_thread = NULL;
 	}
@@ -2236,7 +2234,7 @@ static int khugepaged_has_work(void)
 static int khugepaged_wait_event(void)
 {
 	return !list_empty(&khugepaged_scan.mm_head) ||
-		!khugepaged_enabled();
+		kthread_should_stop();
 }

 static void khugepaged_do_scan(struct page **hpage)
@@ -2303,6 +2301,24 @@ static struct page *khugepaged_alloc_hugepage(void)
 }
 #endif

+static void khugepaged_wait_work(void)
+{
+	try_to_freeze();
+
+	if (khugepaged_has_work()) {
+		if (!khugepaged_scan_sleep_millisecs)
+			return;
+
+		wait_event_freezable_timeout(khugepaged_wait,
+					     kthread_should_stop(),
+			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
+		return;
+	}
+
+	if (khugepaged_enabled())
+		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+}
+
 static void khugepaged_loop(void)
 {
 	struct page *hpage;
@@ -2327,17 +2343,8 @@ static void khugepaged_loop(void)
 		if (hpage)
 			put_page(hpage);
 #endif
-		try_to_freeze();
-		if (unlikely(kthread_should_stop()))
-			break;
-		if (khugepaged_has_work()) {
-			if (!khugepaged_scan_sleep_millisecs)
-				continue;
-			wait_event_freezable_timeout(khugepaged_wait, false,
-			    msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		} else if (khugepaged_enabled())
-			wait_event_freezable(khugepaged_wait,
-					     khugepaged_wait_event());
+
+		khugepaged_wait_work();
 	}
 }

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/12] thp: remove some code depend on CONFIG_NUMA
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (4 preceding siblings ...)
  2012-08-13 11:14 ` [PATCH 05/12] thp: remove wake_up_interruptible in the exit path Xiao Guangrong
@ 2012-08-13 11:15 ` Xiao Guangrong
  2012-08-13 11:15 ` [PATCH 07/12] thp: merge page pre-alloc in khugepaged_loop into khugepaged_do_scan Xiao Guangrong
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:15 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

If NUMA is disabled, hpage is used as page pre-alloc, so there are
two cases for hpage:
- it is !NULL, means the page is not consumed otherwise,
- the page has been consumed

If NUMA is enabled, hpage is just used as alloc-fail indicator which
is not a real page, NULL means not fail triggered.

So, we can release the page only if !IS_ERR_OR_NULL

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86f71af..5f620cf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2321,11 +2321,8 @@ static void khugepaged_wait_work(void)

 static void khugepaged_loop(void)
 {
-	struct page *hpage;
+	struct page *hpage = NULL;

-#ifdef CONFIG_NUMA
-	hpage = NULL;
-#endif
 	while (likely(khugepaged_enabled())) {
 #ifndef CONFIG_NUMA
 		hpage = khugepaged_alloc_hugepage();
@@ -2339,10 +2336,9 @@ static void khugepaged_loop(void)
 #endif

 		khugepaged_do_scan(&hpage);
-#ifndef CONFIG_NUMA
-		if (hpage)
+
+		if (!IS_ERR_OR_NULL(hpage))
 			put_page(hpage);
-#endif

 		khugepaged_wait_work();
 	}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/12] thp: merge page pre-alloc in khugepaged_loop into khugepaged_do_scan
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (5 preceding siblings ...)
  2012-08-13 11:15 ` [PATCH 06/12] thp: remove some code depend on CONFIG_NUMA Xiao Guangrong
@ 2012-08-13 11:15 ` Xiao Guangrong
  2012-08-13 11:16 ` [PATCH 08/12] thp: release page in page pre-alloc path Xiao Guangrong
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:15 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

There are two pre-alloc operations in these two function, the different is:
- it allows to sleep if page alloc fail in khugepaged_loop
- it exits immediately if page alloc fail in khugepaged_do_scan

Actually, in khugepaged_do_scan, we can allow the pre-alloc to sleep on the
first failure, then the operation in khugepaged_loop can be removed

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |   97 +++++++++++++++++++++++++-----------------------------
 1 files changed, 45 insertions(+), 52 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5f620cf..42d3d74 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2237,10 +2237,40 @@ static int khugepaged_wait_event(void)
 		kthread_should_stop();
 }

-static void khugepaged_do_scan(struct page **hpage)
+static void khugepaged_alloc_sleep(void)
+{
+	wait_event_freezable_timeout(khugepaged_wait, false,
+			msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
+}
+
+#ifndef CONFIG_NUMA
+static struct page *khugepaged_alloc_hugepage(bool *wait)
+{
+	struct page *hpage;
+
+	do {
+		hpage = alloc_hugepage(khugepaged_defrag());
+		if (!hpage) {
+			count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
+			if (!*wait)
+				return NULL;
+
+			*wait = false;
+			khugepaged_alloc_sleep();
+		} else
+			count_vm_event(THP_COLLAPSE_ALLOC);
+	} while (unlikely(!hpage) && likely(khugepaged_enabled()));
+
+	return hpage;
+}
+#endif
+
+static void khugepaged_do_scan(void)
 {
+	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
 	unsigned int pages = khugepaged_pages_to_scan;
+	bool wait = true;

 	barrier(); /* write khugepaged_pages_to_scan to local stack */

@@ -2248,17 +2278,18 @@ static void khugepaged_do_scan(struct page **hpage)
 		cond_resched();

 #ifndef CONFIG_NUMA
-		if (!*hpage) {
-			*hpage = alloc_hugepage(khugepaged_defrag());
-			if (unlikely(!*hpage)) {
-				count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
+		if (!hpage)
+			hpage = khugepaged_alloc_hugepage(&wait);
+
+		if (unlikely(!hpage))
+			break;
+#else
+		if (IS_ERR(hpage)) {
+			if (!wait)
 				break;
-			}
-			count_vm_event(THP_COLLAPSE_ALLOC);
+			wait = false;
+			khugepaged_alloc_sleep();
 		}
-#else
-		if (IS_ERR(*hpage))
-			break;
 #endif

 		if (unlikely(kthread_should_stop() || freezing(current)))
@@ -2270,37 +2301,16 @@ static void khugepaged_do_scan(struct page **hpage)
 		if (khugepaged_has_work() &&
 		    pass_through_head < 2)
 			progress += khugepaged_scan_mm_slot(pages - progress,
-							    hpage);
+							    &hpage);
 		else
 			progress = pages;
 		spin_unlock(&khugepaged_mm_lock);
 	}
-}

-static void khugepaged_alloc_sleep(void)
-{
-	wait_event_freezable_timeout(khugepaged_wait, false,
-			msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
+	if (!IS_ERR_OR_NULL(hpage))
+		put_page(hpage);
 }

-#ifndef CONFIG_NUMA
-static struct page *khugepaged_alloc_hugepage(void)
-{
-	struct page *hpage;
-
-	do {
-		hpage = alloc_hugepage(khugepaged_defrag());
-		if (!hpage) {
-			count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
-			khugepaged_alloc_sleep();
-		} else
-			count_vm_event(THP_COLLAPSE_ALLOC);
-	} while (unlikely(!hpage) &&
-		 likely(khugepaged_enabled()));
-	return hpage;
-}
-#endif
-
 static void khugepaged_wait_work(void)
 {
 	try_to_freeze();
@@ -2321,25 +2331,8 @@ static void khugepaged_wait_work(void)

 static void khugepaged_loop(void)
 {
-	struct page *hpage = NULL;
-
 	while (likely(khugepaged_enabled())) {
-#ifndef CONFIG_NUMA
-		hpage = khugepaged_alloc_hugepage();
-		if (unlikely(!hpage))
-			break;
-#else
-		if (IS_ERR(hpage)) {
-			khugepaged_alloc_sleep();
-			hpage = NULL;
-		}
-#endif
-
-		khugepaged_do_scan(&hpage);
-
-		if (!IS_ERR_OR_NULL(hpage))
-			put_page(hpage);
-
+		khugepaged_do_scan();
 		khugepaged_wait_work();
 	}
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/12] thp: release page in page pre-alloc path
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (6 preceding siblings ...)
  2012-08-13 11:15 ` [PATCH 07/12] thp: merge page pre-alloc in khugepaged_loop into khugepaged_do_scan Xiao Guangrong
@ 2012-08-13 11:16 ` Xiao Guangrong
  2012-08-13 11:16 ` [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page Xiao Guangrong
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:16 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

If NUMA is enabled, we can release the page in the page pre-alloc operation,
then the CONFIG_NUMA dependent code can be reduced

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |   19 +++++++------------
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 42d3d74..050b8d0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1879,15 +1879,12 @@ static void collapse_huge_page(struct mm_struct *mm,
 		*hpage = ERR_PTR(-ENOMEM);
 		return;
 	}
+	*hpage = new_page;
 	count_vm_event(THP_COLLAPSE_ALLOC);
 #endif

-	if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
-#ifdef CONFIG_NUMA
-		put_page(new_page);
-#endif
+	if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)))
 		return;
-	}

 	/*
 	 * Prevent all access to pagetables with the exception of
@@ -1992,9 +1989,8 @@ static void collapse_huge_page(struct mm_struct *mm,
 	prepare_pmd_huge_pte(pgtable, mm);
 	spin_unlock(&mm->page_table_lock);

-#ifndef CONFIG_NUMA
 	*hpage = NULL;
-#endif
+
 	khugepaged_pages_collapsed++;
 out_up_write:
 	up_write(&mm->mmap_sem);
@@ -2002,9 +1998,6 @@ out_up_write:

 out:
 	mem_cgroup_uncharge_page(new_page);
-#ifdef CONFIG_NUMA
-	put_page(new_page);
-#endif
 	goto out_up_write;
 }

@@ -2275,8 +2268,6 @@ static void khugepaged_do_scan(void)
 	barrier(); /* write khugepaged_pages_to_scan to local stack */

 	while (progress < pages) {
-		cond_resched();
-
 #ifndef CONFIG_NUMA
 		if (!hpage)
 			hpage = khugepaged_alloc_hugepage(&wait);
@@ -2289,8 +2280,12 @@ static void khugepaged_do_scan(void)
 				break;
 			wait = false;
 			khugepaged_alloc_sleep();
+		} else if (hpage) {
+			put_page(hpage);
+			hpage = NULL;
 		}
 #endif
+		cond_resched();

 		if (unlikely(kthread_should_stop() || freezing(current)))
 			break;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (7 preceding siblings ...)
  2012-08-13 11:16 ` [PATCH 08/12] thp: release page in page pre-alloc path Xiao Guangrong
@ 2012-08-13 11:16 ` Xiao Guangrong
  2012-09-12  2:03   ` Hugh Dickins
  2012-08-13 11:16 ` [PATCH 10/12] thp: remove khugepaged_loop Xiao Guangrong
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:16 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

They are used to abstract the difference between NUMA enabled and NUMA disabled
to make the code more readable

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |  166 ++++++++++++++++++++++++++++++++----------------------
 1 files changed, 98 insertions(+), 68 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 050b8d0..82f6cce 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1833,28 +1833,34 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
 	}
 }

-static void collapse_huge_page(struct mm_struct *mm,
-			       unsigned long address,
-			       struct page **hpage,
-			       struct vm_area_struct *vma,
-			       int node)
+static void khugepaged_alloc_sleep(void)
 {
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd, _pmd;
-	pte_t *pte;
-	pgtable_t pgtable;
-	struct page *new_page;
-	spinlock_t *ptl;
-	int isolated;
-	unsigned long hstart, hend;
+	wait_event_freezable_timeout(khugepaged_wait, false,
+			msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
+}

-	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-#ifndef CONFIG_NUMA
-	up_read(&mm->mmap_sem);
-	VM_BUG_ON(!*hpage);
-	new_page = *hpage;
-#else
+#ifdef CONFIG_NUMA
+static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
+{
+	if (IS_ERR(*hpage)) {
+		if (!*wait)
+			return false;
+
+		*wait = false;
+		khugepaged_alloc_sleep();
+	} else if (*hpage) {
+		put_page(*hpage);
+		*hpage = NULL;
+	}
+
+	return true;
+}
+
+static struct page
+*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm,
+		       struct vm_area_struct *vma, unsigned long address,
+		       int node)
+{
 	VM_BUG_ON(*hpage);
 	/*
 	 * Allocate the page while the vma is still valid and under
@@ -1866,7 +1872,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 * mmap_sem in read mode is good idea also to allow greater
 	 * scalability.
 	 */
-	new_page = alloc_hugepage_vma(khugepaged_defrag(), vma, address,
+	*hpage  = alloc_hugepage_vma(khugepaged_defrag(), vma, address,
 				      node, __GFP_OTHER_NODE);

 	/*
@@ -1874,15 +1880,81 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 * preparation for taking it in write mode.
 	 */
 	up_read(&mm->mmap_sem);
-	if (unlikely(!new_page)) {
+	if (unlikely(!*hpage)) {
 		count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
 		*hpage = ERR_PTR(-ENOMEM);
-		return;
+		return NULL;
 	}
-	*hpage = new_page;
+
 	count_vm_event(THP_COLLAPSE_ALLOC);
+	return *hpage;
+}
+#else
+static struct page *khugepaged_alloc_hugepage(bool *wait)
+{
+	struct page *hpage;
+
+	do {
+		hpage = alloc_hugepage(khugepaged_defrag());
+		if (!hpage) {
+			count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
+			if (!*wait)
+				return NULL;
+
+			*wait = false;
+			khugepaged_alloc_sleep();
+		} else
+			count_vm_event(THP_COLLAPSE_ALLOC);
+	} while (unlikely(!hpage) && likely(khugepaged_enabled()));
+
+	return hpage;
+}
+
+static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
+{
+	if (!*hpage)
+		*hpage = khugepaged_alloc_hugepage(wait);
+
+	if (unlikely(!*hpage))
+		return false;
+
+	return true;
+}
+
+static struct page
+*khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm,
+		       struct vm_area_struct *vma, unsigned long address,
+		       int node)
+{
+	up_read(&mm->mmap_sem);
+	VM_BUG_ON(!*hpage);
+	return  *hpage;
+}
 #endif

+static void collapse_huge_page(struct mm_struct *mm,
+				   unsigned long address,
+				   struct page **hpage,
+				   struct vm_area_struct *vma,
+				   int node)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd, _pmd;
+	pte_t *pte;
+	pgtable_t pgtable;
+	struct page *new_page;
+	spinlock_t *ptl;
+	int isolated;
+	unsigned long hstart, hend;
+
+	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+
+	/* release the mmap_sem read lock. */
+	new_page = khugepaged_alloc_page(hpage, mm, vma, address, node);
+	if (!new_page)
+		return;
+
 	if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)))
 		return;

@@ -2230,34 +2302,6 @@ static int khugepaged_wait_event(void)
 		kthread_should_stop();
 }

-static void khugepaged_alloc_sleep(void)
-{
-	wait_event_freezable_timeout(khugepaged_wait, false,
-			msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
-}
-
-#ifndef CONFIG_NUMA
-static struct page *khugepaged_alloc_hugepage(bool *wait)
-{
-	struct page *hpage;
-
-	do {
-		hpage = alloc_hugepage(khugepaged_defrag());
-		if (!hpage) {
-			count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
-			if (!*wait)
-				return NULL;
-
-			*wait = false;
-			khugepaged_alloc_sleep();
-		} else
-			count_vm_event(THP_COLLAPSE_ALLOC);
-	} while (unlikely(!hpage) && likely(khugepaged_enabled()));
-
-	return hpage;
-}
-#endif
-
 static void khugepaged_do_scan(void)
 {
 	struct page *hpage = NULL;
@@ -2268,23 +2312,9 @@ static void khugepaged_do_scan(void)
 	barrier(); /* write khugepaged_pages_to_scan to local stack */

 	while (progress < pages) {
-#ifndef CONFIG_NUMA
-		if (!hpage)
-			hpage = khugepaged_alloc_hugepage(&wait);
-
-		if (unlikely(!hpage))
+		if (!khugepaged_prealloc_page(&hpage, &wait))
 			break;
-#else
-		if (IS_ERR(hpage)) {
-			if (!wait)
-				break;
-			wait = false;
-			khugepaged_alloc_sleep();
-		} else if (hpage) {
-			put_page(hpage);
-			hpage = NULL;
-		}
-#endif
+
 		cond_resched();

 		if (unlikely(kthread_should_stop() || freezing(current)))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/12] thp: remove khugepaged_loop
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (8 preceding siblings ...)
  2012-08-13 11:16 ` [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page Xiao Guangrong
@ 2012-08-13 11:16 ` Xiao Guangrong
  2012-08-13 11:17 ` [PATCH 11/12] thp: use khugepaged_enabled to remove duplicate code Xiao Guangrong
  2012-08-13 11:17 ` [PATCH 12/12] thp: remove unnecessary set_recommended_min_free_kbytes Xiao Guangrong
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:16 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

Merge khugepaged_loop into khugepaged

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |   14 ++++----------
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 82f6cce..6ddf671 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2354,14 +2354,6 @@ static void khugepaged_wait_work(void)
 		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
 }

-static void khugepaged_loop(void)
-{
-	while (likely(khugepaged_enabled())) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-}
-
 static int khugepaged(void *none)
 {
 	struct mm_slot *mm_slot;
@@ -2369,8 +2361,10 @@ static int khugepaged(void *none)
 	set_freezable();
 	set_user_nice(current, 19);

-	while (!kthread_should_stop())
-		khugepaged_loop();
+	while (!kthread_should_stop()) {
+		khugepaged_do_scan();
+		khugepaged_wait_work();
+	}

 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/12] thp: use khugepaged_enabled to remove duplicate code
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (9 preceding siblings ...)
  2012-08-13 11:16 ` [PATCH 10/12] thp: remove khugepaged_loop Xiao Guangrong
@ 2012-08-13 11:17 ` Xiao Guangrong
  2012-08-13 11:17 ` [PATCH 12/12] thp: remove unnecessary set_recommended_min_free_kbytes Xiao Guangrong
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:17 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

Use khugepaged_enabled to see whether thp is enabled

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |   11 ++---------
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6ddf671..6becf6c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -102,10 +102,7 @@ static int set_recommended_min_free_kbytes(void)
 	unsigned long recommended_min;
 	extern int min_free_kbytes;

-	if (!test_bit(TRANSPARENT_HUGEPAGE_FLAG,
-		      &transparent_hugepage_flags) &&
-	    !test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
-		      &transparent_hugepage_flags))
+	if (!khugepaged_enabled())
 		return 0;

 	for_each_populated_zone(zone)
@@ -228,11 +225,7 @@ static ssize_t enabled_store(struct kobject *kobj,
 			ret = err;
 	}

-	if (ret > 0 &&
-	    (test_bit(TRANSPARENT_HUGEPAGE_FLAG,
-		      &transparent_hugepage_flags) ||
-	     test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
-		      &transparent_hugepage_flags)))
+	if (ret > 0 && khugepaged_enabled())
 		set_recommended_min_free_kbytes();

 	return ret;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/12] thp: remove unnecessary set_recommended_min_free_kbytes
  2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
                   ` (10 preceding siblings ...)
  2012-08-13 11:17 ` [PATCH 11/12] thp: use khugepaged_enabled to remove duplicate code Xiao Guangrong
@ 2012-08-13 11:17 ` Xiao Guangrong
  11 siblings, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-08-13 11:17 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

Since it is called in start_khugepaged

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6becf6c..6533956 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -225,9 +225,6 @@ static ssize_t enabled_store(struct kobject *kobj,
 			ret = err;
 	}

-	if (ret > 0 && khugepaged_enabled())
-		set_recommended_min_free_kbytes();
-
 	return ret;
 }
 static struct kobj_attribute enabled_attr =
@@ -562,8 +559,6 @@ static int __init hugepage_init(void)

 	start_khugepaged();

-	set_recommended_min_free_kbytes();
-
 	return 0;
 out:
 	hugepage_exit_sysfs(hugepage_kobj);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/12] thp: fix the count of THP_COLLAPSE_ALLOC
  2012-08-13 11:13 ` [PATCH 01/12] thp: fix the count of THP_COLLAPSE_ALLOC Xiao Guangrong
@ 2012-08-13 11:19   ` Kirill A. Shutemov
  0 siblings, 0 replies; 20+ messages in thread
From: Kirill A. Shutemov @ 2012-08-13 11:19 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Hugh Dickins, David Rientjes,
	LKML, Linux Memory Management List

On Mon, Aug 13, 2012 at 07:13:16PM +0800, Xiao Guangrong wrote:
> THP_COLLAPSE_ALLOC is double counted if NUMA is disabled since it has
> already been calculated in khugepaged_alloc_hugepage
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> ---
>  mm/huge_memory.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 57c4b93..80bcd42 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1880,9 +1880,9 @@ static void collapse_huge_page(struct mm_struct *mm,
>  		*hpage = ERR_PTR(-ENOMEM);
>  		return;
>  	}
> +	count_vm_event(THP_COLLAPSE_ALLOC);
>  #endif
> 
> -	count_vm_event(THP_COLLAPSE_ALLOC);
>  	if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
>  #ifdef CONFIG_NUMA
>  		put_page(new_page);
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
  2012-08-13 11:16 ` [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page Xiao Guangrong
@ 2012-09-12  2:03   ` Hugh Dickins
  2012-09-12  2:35     ` Xiao Guangrong
  2012-09-12  3:37     ` Xiao Guangrong
  0 siblings, 2 replies; 20+ messages in thread
From: Hugh Dickins @ 2012-09-12  2:03 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Michel Lespinasse,
	David Rientjes, LKML, Linux Memory Management List

On Mon, 13 Aug 2012, Xiao Guangrong wrote:

> They are used to abstract the difference between NUMA enabled and NUMA disabled
> to make the code more readable
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  mm/huge_memory.c |  166 ++++++++++++++++++++++++++++++++----------------------
>  1 files changed, 98 insertions(+), 68 deletions(-)

Hmm, that in itself is not necessarily an improvement.

I'm a bit sceptical about this patch,
thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
in last Thursday's mmotm 2012-09-06-16-46.

What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
forcing out to swap), while I happened to have CONFIG_NUMA=y.

That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

(If I'm honest, I'll admit I have Michel's "interval trees for anon rmap"
patches in on top, and so the line number was actually shifted to 1839:
but I don't believe his patches were in any way involved here, and
indeed I've not yet found a problem with them: they look very good.)

I expect the BUG could quite easily be fixed up by making another call
to khugepaged_prealloc_page() from somewhere to free up the hpage;
but forgive me if I dislike using "prealloc" to free.

I do agree with you that the several CONFIG_NUMA ifdefs dotted around
mm/huge_memory.c are regrettable, but I'm not at all sure that you're
improving the situation with this patch, which gives misleading names
to functions and moves the mmap_sem upping out of line.

I think you need to revisit it: maybe not go so far (leaving a few
CONFIG_NUMAs behind, if they're not too bad), or maybe go further
(add a separate function for freeing in the NUMA case, instead of
using "prealloc").  I don't know what's best: have a play and see.

That's what I was intending to write yesterday.  But overnight I
was running with this 9/12 backed out (I think 10,11,12 should be
independent), and found "BUG at mm/huge_memory.c:1835!" this morning.

That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
when 9/12 is reverted.

So maybe 9/12 is just obscuring what was already a BUG, either earlier
in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
earlier releases, nor without CONFIG_NUMA).  I've not spent any time
looking for it, maybe it's obvious - can you spot and fix it?

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
  2012-09-12  2:03   ` Hugh Dickins
@ 2012-09-12  2:35     ` Xiao Guangrong
  2012-09-12  3:37     ` Xiao Guangrong
  1 sibling, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-09-12  2:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Andrea Arcangeli, Michel Lespinasse,
	David Rientjes, LKML, Linux Memory Management List

On 09/12/2012 10:03 AM, Hugh Dickins wrote:
> On Mon, 13 Aug 2012, Xiao Guangrong wrote:
> 
>> They are used to abstract the difference between NUMA enabled and NUMA disabled
>> to make the code more readable
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  mm/huge_memory.c |  166 ++++++++++++++++++++++++++++++++----------------------
>>  1 files changed, 98 insertions(+), 68 deletions(-)
> 
> Hmm, that in itself is not necessarily an improvement.
> 
> I'm a bit sceptical about this patch,
> thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
> in last Thursday's mmotm 2012-09-06-16-46.
> 
> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> forcing out to swap), while I happened to have CONFIG_NUMA=y.
> 
> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

I will look into it, thanks for your point it out.

> 
> (If I'm honest, I'll admit I have Michel's "interval trees for anon rmap"
> patches in on top, and so the line number was actually shifted to 1839:
> but I don't believe his patches were in any way involved here, and
> indeed I've not yet found a problem with them: they look very good.)
> 
> I expect the BUG could quite easily be fixed up by making another call
> to khugepaged_prealloc_page() from somewhere to free up the hpage;
> but forgive me if I dislike using "prealloc" to free.
> 
> I do agree with you that the several CONFIG_NUMA ifdefs dotted around
> mm/huge_memory.c are regrettable, but I'm not at all sure that you're
> improving the situation with this patch, which gives misleading names
> to functions and moves the mmap_sem upping out of line.
> 
> I think you need to revisit it: maybe not go so far (leaving a few
> CONFIG_NUMAs behind, if they're not too bad), or maybe go further
> (add a separate function for freeing in the NUMA case, instead of
> using "prealloc").  I don't know what's best: have a play and see.

Sorry for that, i will find a better way to do this.

> 
> That's what I was intending to write yesterday.  But overnight I
> was running with this 9/12 backed out (I think 10,11,12 should be
> independent), and found "BUG at mm/huge_memory.c:1835!" this morning.
> 
> That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
> when 9/12 is reverted.
> 
> So maybe 9/12 is just obscuring what was already a BUG, either earlier
> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> looking for it, maybe it's obvious - can you spot and fix it?

Sure, will fix it as soon as possible. Thanks!



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
  2012-09-12  2:03   ` Hugh Dickins
  2012-09-12  2:35     ` Xiao Guangrong
@ 2012-09-12  3:37     ` Xiao Guangrong
  2012-09-13  6:27       ` Hugh Dickins
  1 sibling, 1 reply; 20+ messages in thread
From: Xiao Guangrong @ 2012-09-12  3:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Andrea Arcangeli, Michel Lespinasse,
	David Rientjes, LKML, Linux Memory Management List

On 09/12/2012 10:03 AM, Hugh Dickins wrote:

> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> forcing out to swap), while I happened to have CONFIG_NUMA=y.
> 
> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

> 
> So maybe 9/12 is just obscuring what was already a BUG, either earlier
> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> looking for it, maybe it's obvious - can you spot and fix it?

Hugh,

I think i have already found the reason, if i am correct, the bug was existing
before my patch.

Could you please try below patch? And, could please allow me to fix the bug first,
then post another patch to improve the things you dislike?


Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator

If NUMA is enabled, the indicator is not reset if the previous page
request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 mm/huge_memory.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e366ca5..66d2bc6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
 			return false;

 		*wait = false;
+		*hpage = NULL;
 		khugepaged_alloc_sleep();
 	} else if (*hpage) {
 		put_page(*hpage);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
  2012-09-12  3:37     ` Xiao Guangrong
@ 2012-09-13  6:27       ` Hugh Dickins
  2012-09-13  6:33         ` Hugh Dickins
  2012-09-13  9:26         ` Xiao Guangrong
  0 siblings, 2 replies; 20+ messages in thread
From: Hugh Dickins @ 2012-09-13  6:27 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Michel Lespinasse,
	David Rientjes, LKML, Linux Memory Management List

On Wed, 12 Sep 2012, Xiao Guangrong wrote:
> On 09/12/2012 10:03 AM, Hugh Dickins wrote:
> 
> > What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> > running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> > forcing out to swap), while I happened to have CONFIG_NUMA=y.
> > 
> > That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().
> 
> > 
> > So maybe 9/12 is just obscuring what was already a BUG, either earlier
> > in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> > earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> > looking for it, maybe it's obvious - can you spot and fix it?
> 
> Hugh,
> 
> I think i have already found the reason,

Great, thank you.

> if i am correct, the bug was existing before my patch.

Before your patchset?  Are you sure of that?

> 
> Could you please try below patch?

I put it on this morning, and ran load all day without a crash:
I think you indeed found the cause.

> And, could please allow me to fix the bug first,
> then post another patch to improve the things you dislike?

Good plan.

I've not yet glanced at your 2/3 and 3/3, I'm afraid.
I think akpm has helpfully included just 1/3 in the new mmotm.

> 
> Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator
> 
> If NUMA is enabled, the indicator is not reset if the previous page
> request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  mm/huge_memory.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e366ca5..66d2bc6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
>  			return false;
> 
>  		*wait = false;
> +		*hpage = NULL;
>  		khugepaged_alloc_sleep();
>  	} else if (*hpage) {
>  		put_page(*hpage);

The unshown line just below this is

		*hpage = NULL;

I do wish you would take the "*hpage = NULL;" out of if and else blocks
and place it once below both.

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
  2012-09-13  6:27       ` Hugh Dickins
@ 2012-09-13  6:33         ` Hugh Dickins
  2012-09-13  9:26         ` Xiao Guangrong
  1 sibling, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2012-09-13  6:33 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Andrew Morton, Andrea Arcangeli, Michel Lespinasse,
	David Rientjes, LKML, Linux Memory Management List

On Wed, 12 Sep 2012, Hugh Dickins wrote:
> > @@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
> >  			return false;
> > 
> >  		*wait = false;
> > +		*hpage = NULL;
> >  		khugepaged_alloc_sleep();
> >  	} else if (*hpage) {
> >  		put_page(*hpage);
> 
> The unshown line just below this is
> 
> 		*hpage = NULL;
> 
> I do wish you would take the "*hpage = NULL;" out of if and else blocks
> and place it once below both.

Hold on, I'm being unreasonable: that's an "else if", and I've no good
reason to request you to set *hpage = NULL when it's already NULL.
It would be okay if you did, but there's no reason for me to prefer it.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
  2012-09-13  6:27       ` Hugh Dickins
  2012-09-13  6:33         ` Hugh Dickins
@ 2012-09-13  9:26         ` Xiao Guangrong
  1 sibling, 0 replies; 20+ messages in thread
From: Xiao Guangrong @ 2012-09-13  9:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Andrea Arcangeli, Michel Lespinasse,
	David Rientjes, LKML, Linux Memory Management List

On 09/13/2012 02:27 PM, Hugh Dickins wrote:
> On Wed, 12 Sep 2012, Xiao Guangrong wrote:
>> On 09/12/2012 10:03 AM, Hugh Dickins wrote:
>>
>>> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
>>> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
>>> forcing out to swap), while I happened to have CONFIG_NUMA=y.
>>>
>>> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().
>>
>>>
>>> So maybe 9/12 is just obscuring what was already a BUG, either earlier
>>> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
>>> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
>>> looking for it, maybe it's obvious - can you spot and fix it?
>>
>> Hugh,
>>
>> I think i have already found the reason,
> 
> Great, thank you.
> 
>> if i am correct, the bug was existing before my patch.
> 
> Before your patchset?  Are you sure of that?

No. :)

I have told Andrew that the fix patch need not back port in
0/3. Sorry again for my mistake.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-09-13  9:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-13 11:12 [PATCH 00/12] thp: optimize use of khugepaged_mutex and dependence of CONFIG_NUMA Xiao Guangrong
2012-08-13 11:13 ` [PATCH 01/12] thp: fix the count of THP_COLLAPSE_ALLOC Xiao Guangrong
2012-08-13 11:19   ` Kirill A. Shutemov
2012-08-13 11:13 ` [PATCH 02/12] thp: remove unnecessary check in start_khugepaged Xiao Guangrong
2012-08-13 11:14 ` [PATCH 03/12] thp: move khugepaged_mutex out of khugepaged Xiao Guangrong
2012-08-13 11:14 ` [PATCH 04/12] thp: remove unnecessary khugepaged_thread check Xiao Guangrong
2012-08-13 11:14 ` [PATCH 05/12] thp: remove wake_up_interruptible in the exit path Xiao Guangrong
2012-08-13 11:15 ` [PATCH 06/12] thp: remove some code depend on CONFIG_NUMA Xiao Guangrong
2012-08-13 11:15 ` [PATCH 07/12] thp: merge page pre-alloc in khugepaged_loop into khugepaged_do_scan Xiao Guangrong
2012-08-13 11:16 ` [PATCH 08/12] thp: release page in page pre-alloc path Xiao Guangrong
2012-08-13 11:16 ` [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page Xiao Guangrong
2012-09-12  2:03   ` Hugh Dickins
2012-09-12  2:35     ` Xiao Guangrong
2012-09-12  3:37     ` Xiao Guangrong
2012-09-13  6:27       ` Hugh Dickins
2012-09-13  6:33         ` Hugh Dickins
2012-09-13  9:26         ` Xiao Guangrong
2012-08-13 11:16 ` [PATCH 10/12] thp: remove khugepaged_loop Xiao Guangrong
2012-08-13 11:17 ` [PATCH 11/12] thp: use khugepaged_enabled to remove duplicate code Xiao Guangrong
2012-08-13 11:17 ` [PATCH 12/12] thp: remove unnecessary set_recommended_min_free_kbytes Xiao Guangrong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).