All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>, Tejun Heo <tj@kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@suse.de>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	Petr Mladek <pmladek@suse.com>
Subject: [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API
Date: Mon, 25 Jan 2016 16:45:02 +0100	[thread overview]
Message-ID: <1453736711-6703-14-git-send-email-pmladek@suse.com> (raw)
In-Reply-To: <1453736711-6703-1-git-send-email-pmladek@suse.com>

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.

Note that kthread works get associated with a single kthread worker.
They must be initialized if we want to use them with another worker.
This is needed also when the worker is restarted.

set_freezable() is not needed because the kthread worker is
created as freezable.

set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.

The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.

Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.

They delay between scans is done using a delayed work.

Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.

Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 mm/huge_memory.c | 138 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fd3a07b3e6f4..828d741ed242 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,10 +89,16 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static struct delayed_kthread_work khugepaged_do_scan_work;
+static struct kthread_work khugepaged_cleanup_work;
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
 /*
  * default collapse hugepages if there is at least one pte mapped like
  * it would have happened if the vma was large enough during page
@@ -100,7 +106,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -180,29 +185,55 @@ static void set_recommended_min_free_kbytes(void)
 	setup_per_zone_wmarks();
 }
 
+static int khugepaged_has_work(void)
+{
+	return !list_empty(&khugepaged_scan.mm_head);
+}
+
 static int start_stop_khugepaged(void)
 {
+	struct kthread_worker *worker;
 	int err = 0;
+
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (IS_ERR(khugepaged_thread)) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (khugepaged_worker)
+			goto out;
+
+		worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+		if (IS_ERR(worker)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
+		set_user_nice(worker->task, MAX_NICE);
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		/* Always initialize the works when the worker is started. */
+		init_delayed_kthread_work(&khugepaged_do_scan_work,
+					  khugepaged_do_scan_func);
+		init_kthread_work(&khugepaged_cleanup_work,
+				  khugepaged_cleanup_func);
+
+		/* Make the worker public and check for work synchronously. */
+		spin_lock(&khugepaged_mm_lock);
+		khugepaged_worker = worker;
+		if (khugepaged_has_work())
+			queue_delayed_kthread_work(worker,
+						   &khugepaged_do_scan_work,
+						   0);
+		spin_unlock(&khugepaged_mm_lock);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (khugepaged_worker) {
+		/* First, stop others from using the worker. */
+		spin_lock(&khugepaged_mm_lock);
+		worker = khugepaged_worker;
+		khugepaged_worker = NULL;
+		spin_unlock(&khugepaged_mm_lock);
+
+		cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+		queue_kthread_work(worker, &khugepaged_cleanup_work);
+		destroy_kthread_worker(worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -461,7 +492,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_scan_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+
+	spin_lock(&khugepaged_mm_lock);
+	if (khugepaged_worker && khugepaged_has_work())
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return count;
 }
@@ -488,7 +525,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_alloc_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+	wake_up_interruptible(&khugepaged_alloc_wait);
 
 	return count;
 }
@@ -1878,7 +1915,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 int __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
-	int wakeup;
+	int has_work;
 
 	mm_slot = alloc_mm_slot();
 	if (!mm_slot)
@@ -1897,13 +1934,15 @@ int __khugepaged_enter(struct mm_struct *mm)
 	 * Insert just behind the scanning cursor, to let the area settle
 	 * down a little.
 	 */
-	wakeup = list_empty(&khugepaged_scan.mm_head);
+	has_work = khugepaged_has_work();
 	list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
-	spin_unlock(&khugepaged_mm_lock);
 
 	atomic_inc(&mm->mm_count);
-	if (wakeup)
-		wake_up_interruptible(&khugepaged_wait);
+	if (khugepaged_worker && has_work)
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return 0;
 }
@@ -2142,10 +2181,10 @@ static void khugepaged_alloc_sleep(void)
 {
 	DEFINE_WAIT(wait);
 
-	add_wait_queue(&khugepaged_wait, &wait);
+	add_wait_queue(&khugepaged_alloc_wait, &wait);
 	freezable_schedule_timeout_interruptible(
 		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
-	remove_wait_queue(&khugepaged_wait, &wait);
+	remove_wait_queue(&khugepaged_alloc_wait, &wait);
 }
 
 static int khugepaged_node_load[MAX_NUMNODES];
@@ -2716,19 +2755,7 @@ breakouterloop_mmap_sem:
 	return progress;
 }
 
-static int khugepaged_has_work(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) &&
-		khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2743,7 +2770,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2760,43 +2787,30 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
-}
 
-static void khugepaged_wait_work(void)
-{
 	if (khugepaged_has_work()) {
-		if (!khugepaged_scan_sleep_millisecs)
-			return;
 
-		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
-			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
-	}
+		unsigned long delay = 0;
 
-	if (khugepaged_enabled())
-		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+		if (khugepaged_scan_sleep_millisecs)
+			delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
+
+		queue_delayed_kthread_work(khugepaged_worker,
+					   &khugepaged_do_scan_work,
+					   delay);
+	}
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6

WARNING: multiple messages have this Message-ID (diff)
From: Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
To: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>,
	"Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Jiri Kosina <jkosina-AlSwsSmVLrQ@public.gmane.org>,
	Borislav Petkov <bp-l3A5Bk7waGM@public.gmane.org>,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
Subject: [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API
Date: Mon, 25 Jan 2016 16:45:02 +0100	[thread overview]
Message-ID: <1453736711-6703-14-git-send-email-pmladek@suse.com> (raw)
In-Reply-To: <1453736711-6703-1-git-send-email-pmladek-IBi9RG/b67k@public.gmane.org>

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.

Note that kthread works get associated with a single kthread worker.
They must be initialized if we want to use them with another worker.
This is needed also when the worker is restarted.

set_freezable() is not needed because the kthread worker is
created as freezable.

set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.

The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.

Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.

They delay between scans is done using a delayed work.

Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.

Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
---
 mm/huge_memory.c | 138 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fd3a07b3e6f4..828d741ed242 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,10 +89,16 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static struct delayed_kthread_work khugepaged_do_scan_work;
+static struct kthread_work khugepaged_cleanup_work;
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
 /*
  * default collapse hugepages if there is at least one pte mapped like
  * it would have happened if the vma was large enough during page
@@ -100,7 +106,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -180,29 +185,55 @@ static void set_recommended_min_free_kbytes(void)
 	setup_per_zone_wmarks();
 }
 
+static int khugepaged_has_work(void)
+{
+	return !list_empty(&khugepaged_scan.mm_head);
+}
+
 static int start_stop_khugepaged(void)
 {
+	struct kthread_worker *worker;
 	int err = 0;
+
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (IS_ERR(khugepaged_thread)) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (khugepaged_worker)
+			goto out;
+
+		worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+		if (IS_ERR(worker)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
+		set_user_nice(worker->task, MAX_NICE);
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		/* Always initialize the works when the worker is started. */
+		init_delayed_kthread_work(&khugepaged_do_scan_work,
+					  khugepaged_do_scan_func);
+		init_kthread_work(&khugepaged_cleanup_work,
+				  khugepaged_cleanup_func);
+
+		/* Make the worker public and check for work synchronously. */
+		spin_lock(&khugepaged_mm_lock);
+		khugepaged_worker = worker;
+		if (khugepaged_has_work())
+			queue_delayed_kthread_work(worker,
+						   &khugepaged_do_scan_work,
+						   0);
+		spin_unlock(&khugepaged_mm_lock);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (khugepaged_worker) {
+		/* First, stop others from using the worker. */
+		spin_lock(&khugepaged_mm_lock);
+		worker = khugepaged_worker;
+		khugepaged_worker = NULL;
+		spin_unlock(&khugepaged_mm_lock);
+
+		cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+		queue_kthread_work(worker, &khugepaged_cleanup_work);
+		destroy_kthread_worker(worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -461,7 +492,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_scan_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+
+	spin_lock(&khugepaged_mm_lock);
+	if (khugepaged_worker && khugepaged_has_work())
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return count;
 }
@@ -488,7 +525,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_alloc_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+	wake_up_interruptible(&khugepaged_alloc_wait);
 
 	return count;
 }
@@ -1878,7 +1915,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 int __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
-	int wakeup;
+	int has_work;
 
 	mm_slot = alloc_mm_slot();
 	if (!mm_slot)
@@ -1897,13 +1934,15 @@ int __khugepaged_enter(struct mm_struct *mm)
 	 * Insert just behind the scanning cursor, to let the area settle
 	 * down a little.
 	 */
-	wakeup = list_empty(&khugepaged_scan.mm_head);
+	has_work = khugepaged_has_work();
 	list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
-	spin_unlock(&khugepaged_mm_lock);
 
 	atomic_inc(&mm->mm_count);
-	if (wakeup)
-		wake_up_interruptible(&khugepaged_wait);
+	if (khugepaged_worker && has_work)
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return 0;
 }
@@ -2142,10 +2181,10 @@ static void khugepaged_alloc_sleep(void)
 {
 	DEFINE_WAIT(wait);
 
-	add_wait_queue(&khugepaged_wait, &wait);
+	add_wait_queue(&khugepaged_alloc_wait, &wait);
 	freezable_schedule_timeout_interruptible(
 		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
-	remove_wait_queue(&khugepaged_wait, &wait);
+	remove_wait_queue(&khugepaged_alloc_wait, &wait);
 }
 
 static int khugepaged_node_load[MAX_NUMNODES];
@@ -2716,19 +2755,7 @@ breakouterloop_mmap_sem:
 	return progress;
 }
 
-static int khugepaged_has_work(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) &&
-		khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2743,7 +2770,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2760,43 +2787,30 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
-}
 
-static void khugepaged_wait_work(void)
-{
 	if (khugepaged_has_work()) {
-		if (!khugepaged_scan_sleep_millisecs)
-			return;
 
-		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
-			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
-	}
+		unsigned long delay = 0;
 
-	if (khugepaged_enabled())
-		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+		if (khugepaged_scan_sleep_millisecs)
+			delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
+
+		queue_delayed_kthread_work(khugepaged_worker,
+					   &khugepaged_do_scan_work,
+					   delay);
+	}
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6

WARNING: multiple messages have this Message-ID (diff)
From: Petr Mladek <pmladek@suse.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>, Tejun Heo <tj@kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@suse.de>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	Petr Mladek <pmladek@suse.com>
Subject: [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API
Date: Mon, 25 Jan 2016 16:45:02 +0100	[thread overview]
Message-ID: <1453736711-6703-14-git-send-email-pmladek@suse.com> (raw)
In-Reply-To: <1453736711-6703-1-git-send-email-pmladek@suse.com>

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.

Note that kthread works get associated with a single kthread worker.
They must be initialized if we want to use them with another worker.
This is needed also when the worker is restarted.

set_freezable() is not needed because the kthread worker is
created as freezable.

set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.

The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.

Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.

They delay between scans is done using a delayed work.

Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.

Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 mm/huge_memory.c | 138 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fd3a07b3e6f4..828d741ed242 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,10 +89,16 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static struct delayed_kthread_work khugepaged_do_scan_work;
+static struct kthread_work khugepaged_cleanup_work;
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
 /*
  * default collapse hugepages if there is at least one pte mapped like
  * it would have happened if the vma was large enough during page
@@ -100,7 +106,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -180,29 +185,55 @@ static void set_recommended_min_free_kbytes(void)
 	setup_per_zone_wmarks();
 }
 
+static int khugepaged_has_work(void)
+{
+	return !list_empty(&khugepaged_scan.mm_head);
+}
+
 static int start_stop_khugepaged(void)
 {
+	struct kthread_worker *worker;
 	int err = 0;
+
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (IS_ERR(khugepaged_thread)) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (khugepaged_worker)
+			goto out;
+
+		worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+		if (IS_ERR(worker)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
+		set_user_nice(worker->task, MAX_NICE);
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		/* Always initialize the works when the worker is started. */
+		init_delayed_kthread_work(&khugepaged_do_scan_work,
+					  khugepaged_do_scan_func);
+		init_kthread_work(&khugepaged_cleanup_work,
+				  khugepaged_cleanup_func);
+
+		/* Make the worker public and check for work synchronously. */
+		spin_lock(&khugepaged_mm_lock);
+		khugepaged_worker = worker;
+		if (khugepaged_has_work())
+			queue_delayed_kthread_work(worker,
+						   &khugepaged_do_scan_work,
+						   0);
+		spin_unlock(&khugepaged_mm_lock);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (khugepaged_worker) {
+		/* First, stop others from using the worker. */
+		spin_lock(&khugepaged_mm_lock);
+		worker = khugepaged_worker;
+		khugepaged_worker = NULL;
+		spin_unlock(&khugepaged_mm_lock);
+
+		cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+		queue_kthread_work(worker, &khugepaged_cleanup_work);
+		destroy_kthread_worker(worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -461,7 +492,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_scan_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+
+	spin_lock(&khugepaged_mm_lock);
+	if (khugepaged_worker && khugepaged_has_work())
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return count;
 }
@@ -488,7 +525,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_alloc_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+	wake_up_interruptible(&khugepaged_alloc_wait);
 
 	return count;
 }
@@ -1878,7 +1915,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 int __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
-	int wakeup;
+	int has_work;
 
 	mm_slot = alloc_mm_slot();
 	if (!mm_slot)
@@ -1897,13 +1934,15 @@ int __khugepaged_enter(struct mm_struct *mm)
 	 * Insert just behind the scanning cursor, to let the area settle
 	 * down a little.
 	 */
-	wakeup = list_empty(&khugepaged_scan.mm_head);
+	has_work = khugepaged_has_work();
 	list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
-	spin_unlock(&khugepaged_mm_lock);
 
 	atomic_inc(&mm->mm_count);
-	if (wakeup)
-		wake_up_interruptible(&khugepaged_wait);
+	if (khugepaged_worker && has_work)
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return 0;
 }
@@ -2142,10 +2181,10 @@ static void khugepaged_alloc_sleep(void)
 {
 	DEFINE_WAIT(wait);
 
-	add_wait_queue(&khugepaged_wait, &wait);
+	add_wait_queue(&khugepaged_alloc_wait, &wait);
 	freezable_schedule_timeout_interruptible(
 		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
-	remove_wait_queue(&khugepaged_wait, &wait);
+	remove_wait_queue(&khugepaged_alloc_wait, &wait);
 }
 
 static int khugepaged_node_load[MAX_NUMNODES];
@@ -2716,19 +2755,7 @@ breakouterloop_mmap_sem:
 	return progress;
 }
 
-static int khugepaged_has_work(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) &&
-		khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2743,7 +2770,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2760,43 +2787,30 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
-}
 
-static void khugepaged_wait_work(void)
-{
 	if (khugepaged_has_work()) {
-		if (!khugepaged_scan_sleep_millisecs)
-			return;
 
-		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
-			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
-	}
+		unsigned long delay = 0;
 
-	if (khugepaged_enabled())
-		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+		if (khugepaged_scan_sleep_millisecs)
+			delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
+
+		queue_delayed_kthread_work(khugepaged_worker,
+					   &khugepaged_do_scan_work,
+					   delay);
+	}
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-01-25 15:48 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-25 15:44 [PATCH v4 00/22] kthread: Use kthread worker API more widely Petr Mladek
2016-01-25 15:44 ` Petr Mladek
2016-01-25 15:44 ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 01/22] timer: Allow to check when the timer callback has not finished yet Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 18:44   ` Tejun Heo
2016-01-25 18:44     ` Tejun Heo
2016-01-25 15:44 ` [PATCH v4 02/22] kthread/smpboot: Do not park in kthread_create_on_cpu() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 03/22] kthread: Allow to call __kthread_create_on_node() with va_list args Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 04/22] kthread: Add create_kthread_worker*() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 18:53   ` Tejun Heo
2016-01-25 18:53     ` Tejun Heo
2016-02-16 15:44     ` Petr Mladek
2016-02-16 15:44       ` Petr Mladek
2016-02-16 16:08       ` Tejun Heo
2016-02-16 16:08         ` Tejun Heo
2016-02-16 16:08         ` Tejun Heo
2016-02-16 16:10       ` Petr Mladek
2016-02-16 16:10         ` Petr Mladek
2016-02-16 16:10         ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 05/22] kthread: Add drain_kthread_worker() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 06/22] kthread: Add destroy_kthread_worker() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 18:57   ` Tejun Heo
2016-01-25 18:57     ` Tejun Heo
2016-01-25 18:57     ` Tejun Heo
2016-02-16 16:38     ` Petr Mladek
2016-02-16 16:38       ` Petr Mladek
2016-02-16 16:38       ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 08/22] kthread: Initial support for delayed kthread work Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 19:04   ` Tejun Heo
2016-01-25 19:04     ` Tejun Heo
2016-01-25 19:04     ` Tejun Heo
2016-01-25 15:44 ` [PATCH v4 09/22] kthread: Allow to cancel " Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 19:17   ` Tejun Heo
2016-01-25 19:17     ` Tejun Heo
2016-02-19 16:22     ` Petr Mladek
2016-02-19 16:22       ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 10/22] kthread: Allow to modify delayed " Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 19:19   ` Tejun Heo
2016-01-25 19:19     ` Tejun Heo
2016-01-25 15:45 ` [PATCH v4 11/22] kthread: Better support freezable kthread workers Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 19:21   ` Tejun Heo
2016-01-25 19:21     ` Tejun Heo
2016-01-25 15:45 ` [PATCH v4 12/22] kthread: Use try_lock_kthread_work() in flush_kthread_work() Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` Petr Mladek [this message]
2016-01-25 15:45   ` [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 14/22] ring_buffer: Convert benchmark kthreads " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 15/22] hung_task: Convert hungtaskd " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 16/22] kmemleak: Convert kmemleak kthread " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 17/22] ipmi: Convert kipmi " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 18/22] IB/fmr_pool: Convert the cleanup thread " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 19/22] memstick/r592: Better synchronize debug messages in r592_io kthread Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 20/22] memstick/r592: convert r592_io kthread into kthread worker API Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 21/22] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 16:23   ` Jacob Pan
2016-01-25 16:23     ` Jacob Pan
2016-01-25 15:45 ` [PATCH v4 22/22] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 16:28   ` Jacob Pan
2016-01-25 16:28     ` Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1453736711-6703-14-git-send-email-pmladek@suse.com \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=jkosina@suse.cz \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.