All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/14] kthread: Use kthread worker API more widely
@ 2015-07-28 14:39 ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. Sometimes, it is hard to say if they are done
correctly. They are also harder to maintain if there is
a generic problem found in the area.

I have proposed a so-called kthread iterant API to improve the situation,
see https://lkml.org/lkml/2015/6/5/555. The RFC opened and/or answered
several questions.

This RFC is reaction on Tejun's suggestion to use the existing kthread
worker API instead of a new one, see https://lkml.org/lkml/2015/6/9/77.
I wanted to give it a try.


Structure of this patch set:
----------------------------

1st..6th patches: improve the existing kthread worker API

7th, 8th, 11th patches: converts three kthreads into the new API,
     namely: RCU gp kthreas, khugepaged, my favorite ring buffer
     benchmark

12th..14th patches: show how we could further improve the API

9th, 10th patches:  do some further clean up of the ring buffer
     benchmark; they allow easier conversion into the new API;
     but they might be applied independently


TODO:
-----

If people like the kthread worker API, it will need more love.
The following ideas come to my mind:

  + allow to pass void *data via struct kthread_work;
  + hide struct kthread_worker in kthread.c and make the API
    more safe
  + allow to cancel work


I have tested this patches against today's Linux tree, aka 4.2.0-rc4+.

Petr Mladek (14):
  kthread: Allow to call __kthread_create_on_node() with va_list args
  kthread: Add create_kthread_worker*()
  kthread: Add drain_kthread_worker()
  kthread: Add destroy_kthread_worker()
  kthread: Add wakeup_and_destroy_kthread_worker()
  kthread: Add kthread_worker_created()
  mm/huge_page: Convert khugepaged() into kthread worker API
  rcu: Convert RCU gp kthreads into kthread worker API
  ring_buffer: Initialize completions statically in the benchmark
  ring_buffer: Fix more races when terminating the producer in the
    benchmark
  ring_buffer: Use kthread worker API for the producer kthread in the
    benchmark
  kthread_worker: Better support freezable kthread workers
  kthread_worker: Add set_kthread_worker_user_nice()
  kthread_worker: Add set_kthread_worker_scheduler*()

 include/linux/kthread.h              |  29 +++
 kernel/kthread.c                     | 359 +++++++++++++++++++++++++++++++----
 kernel/rcu/tree.c                    | 182 +++++++++---------
 kernel/rcu/tree.h                    |   4 +-
 kernel/trace/ring_buffer_benchmark.c | 150 ++++++++-------
 mm/huge_memory.c                     |  83 ++++----
 6 files changed, 584 insertions(+), 223 deletions(-)

-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [RFC PATCH 00/14] kthread: Use kthread worker API more widely
@ 2015-07-28 14:39 ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. Sometimes, it is hard to say if they are done
correctly. They are also harder to maintain if there is
a generic problem found in the area.

I have proposed a so-called kthread iterant API to improve the situation,
see https://lkml.org/lkml/2015/6/5/555. The RFC opened and/or answered
several questions.

This RFC is reaction on Tejun's suggestion to use the existing kthread
worker API instead of a new one, see https://lkml.org/lkml/2015/6/9/77.
I wanted to give it a try.


Structure of this patch set:
----------------------------

1st..6th patches: improve the existing kthread worker API

7th, 8th, 11th patches: converts three kthreads into the new API,
     namely: RCU gp kthreas, khugepaged, my favorite ring buffer
     benchmark

12th..14th patches: show how we could further improve the API

9th, 10th patches:  do some further clean up of the ring buffer
     benchmark; they allow easier conversion into the new API;
     but they might be applied independently


TODO:
-----

If people like the kthread worker API, it will need more love.
The following ideas come to my mind:

  + allow to pass void *data via struct kthread_work;
  + hide struct kthread_worker in kthread.c and make the API
    more safe
  + allow to cancel work


I have tested this patches against today's Linux tree, aka 4.2.0-rc4+.

Petr Mladek (14):
  kthread: Allow to call __kthread_create_on_node() with va_list args
  kthread: Add create_kthread_worker*()
  kthread: Add drain_kthread_worker()
  kthread: Add destroy_kthread_worker()
  kthread: Add wakeup_and_destroy_kthread_worker()
  kthread: Add kthread_worker_created()
  mm/huge_page: Convert khugepaged() into kthread worker API
  rcu: Convert RCU gp kthreads into kthread worker API
  ring_buffer: Initialize completions statically in the benchmark
  ring_buffer: Fix more races when terminating the producer in the
    benchmark
  ring_buffer: Use kthread worker API for the producer kthread in the
    benchmark
  kthread_worker: Better support freezable kthread workers
  kthread_worker: Add set_kthread_worker_user_nice()
  kthread_worker: Add set_kthread_worker_scheduler*()

 include/linux/kthread.h              |  29 +++
 kernel/kthread.c                     | 359 +++++++++++++++++++++++++++++++----
 kernel/rcu/tree.c                    | 182 +++++++++---------
 kernel/rcu/tree.h                    |   4 +-
 kernel/trace/ring_buffer_benchmark.c | 150 ++++++++-------
 mm/huge_memory.c                     |  83 ++++----
 6 files changed, 584 insertions(+), 223 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [RFC PATCH 00/14] kthread: Use kthread worker API more widely
@ 2015-07-28 14:39 ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. Sometimes, it is hard to say if they are done
correctly. They are also harder to maintain if there is
a generic problem found in the area.

I have proposed a so-called kthread iterant API to improve the situation,
see https://lkml.org/lkml/2015/6/5/555. The RFC opened and/or answered
several questions.

This RFC is reaction on Tejun's suggestion to use the existing kthread
worker API instead of a new one, see https://lkml.org/lkml/2015/6/9/77.
I wanted to give it a try.


Structure of this patch set:
----------------------------

1st..6th patches: improve the existing kthread worker API

7th, 8th, 11th patches: converts three kthreads into the new API,
     namely: RCU gp kthreas, khugepaged, my favorite ring buffer
     benchmark

12th..14th patches: show how we could further improve the API

9th, 10th patches:  do some further clean up of the ring buffer
     benchmark; they allow easier conversion into the new API;
     but they might be applied independently


TODO:
-----

If people like the kthread worker API, it will need more love.
The following ideas come to my mind:

  + allow to pass void *data via struct kthread_work;
  + hide struct kthread_worker in kthread.c and make the API
    more safe
  + allow to cancel work


I have tested this patches against today's Linux tree, aka 4.2.0-rc4+.

Petr Mladek (14):
  kthread: Allow to call __kthread_create_on_node() with va_list args
  kthread: Add create_kthread_worker*()
  kthread: Add drain_kthread_worker()
  kthread: Add destroy_kthread_worker()
  kthread: Add wakeup_and_destroy_kthread_worker()
  kthread: Add kthread_worker_created()
  mm/huge_page: Convert khugepaged() into kthread worker API
  rcu: Convert RCU gp kthreads into kthread worker API
  ring_buffer: Initialize completions statically in the benchmark
  ring_buffer: Fix more races when terminating the producer in the
    benchmark
  ring_buffer: Use kthread worker API for the producer kthread in the
    benchmark
  kthread_worker: Better support freezable kthread workers
  kthread_worker: Add set_kthread_worker_user_nice()
  kthread_worker: Add set_kthread_worker_scheduler*()

 include/linux/kthread.h              |  29 +++
 kernel/kthread.c                     | 359 +++++++++++++++++++++++++++++++----
 kernel/rcu/tree.c                    | 182 +++++++++---------
 kernel/rcu/tree.h                    |   4 +-
 kernel/trace/ring_buffer_benchmark.c | 150 ++++++++-------
 mm/huge_memory.c                     |  83 ++++----
 6 files changed, 584 insertions(+), 223 deletions(-)

-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [RFC PATCH 01/14] kthread: Allow to call __kthread_create_on_node() with va_list args
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

kthread_create_on_node() implements a bunch of logic to create
the kthread. It is already called by kthread_create_on_cpu().

We are going to add a new API that will allow to standardize kthreads
and define safe points for termination, freezing, parking, and even
signal handling. It will want to call kthread_create_on_node()
with va_list args.

This patch does only a refactoring and does not modify the existing
behavior.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 71 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 10e489c448fe..fca7cd124512 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -242,32 +242,10 @@ static void create_kthread(struct kthread_create_info *create)
 	}
 }
 
-/**
- * kthread_create_on_node - create a kthread.
- * @threadfn: the function to run until signal_pending(current).
- * @data: data ptr for @threadfn.
- * @node: memory node number.
- * @namefmt: printf-style name for the thread.
- *
- * Description: This helper function creates and names a kernel
- * thread.  The thread will be stopped: use wake_up_process() to start
- * it.  See also kthread_run().
- *
- * If thread is going to be bound on a particular cpu, give its node
- * in @node, to get NUMA affinity for kthread stack, or else give -1.
- * When woken, the thread will run @threadfn() with @data as its
- * argument. @threadfn() can either call do_exit() directly if it is a
- * standalone thread for which no one will call kthread_stop(), or
- * return when 'kthread_should_stop()' is true (which means
- * kthread_stop() has been called).  The return value should be zero
- * or a negative error number; it will be passed to kthread_stop().
- *
- * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
- */
-struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
-					   void *data, int node,
-					   const char namefmt[],
-					   ...)
+static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
+						    void *data, int node,
+						    const char namefmt[],
+						    va_list args)
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 	struct task_struct *task;
@@ -308,11 +286,8 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	task = create->result;
 	if (!IS_ERR(task)) {
 		static const struct sched_param param = { .sched_priority = 0 };
-		va_list args;
 
-		va_start(args, namefmt);
 		vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
-		va_end(args);
 		/*
 		 * root may have changed our (kthreadd's) priority or CPU mask.
 		 * The kernel thread should not inherit these properties.
@@ -323,6 +298,44 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	kfree(create);
 	return task;
 }
+
+
+/**
+ * kthread_create_on_node - create a kthread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @node: memory node number.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread.  The thread will be stopped: use wake_up_process() to start
+ * it.  See also kthread_run().
+ *
+ * If thread is going to be bound on a particular cpu, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give -1.
+ * When woken, the thread will run @threadfn() with @data as its
+ * argument. @threadfn() can either call do_exit() directly if it is a
+ * standalone thread for which no one will call kthread_stop(), or
+ * return when 'kthread_should_stop()' is true (which means
+ * kthread_stop() has been called).  The return value should be zero
+ * or a negative error number; it will be passed to kthread_stop().
+ *
+ * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
+ */
+struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
+					   void *data, int node,
+					   const char namefmt[],
+					   ...)
+{
+	struct task_struct *task;
+	va_list args;
+
+	va_start(args, namefmt);
+	task = __kthread_create_on_node(threadfn, data, node, namefmt, args);
+	va_end(args);
+
+	return task;
+}
 EXPORT_SYMBOL(kthread_create_on_node);
 
 static void __kthread_bind(struct task_struct *p, unsigned int cpu, long state)
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 01/14] kthread: Allow to call __kthread_create_on_node() with va_list args
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

kthread_create_on_node() implements a bunch of logic to create
the kthread. It is already called by kthread_create_on_cpu().

We are going to add a new API that will allow to standardize kthreads
and define safe points for termination, freezing, parking, and even
signal handling. It will want to call kthread_create_on_node()
with va_list args.

This patch does only a refactoring and does not modify the existing
behavior.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 71 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 10e489c448fe..fca7cd124512 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -242,32 +242,10 @@ static void create_kthread(struct kthread_create_info *create)
 	}
 }
 
-/**
- * kthread_create_on_node - create a kthread.
- * @threadfn: the function to run until signal_pending(current).
- * @data: data ptr for @threadfn.
- * @node: memory node number.
- * @namefmt: printf-style name for the thread.
- *
- * Description: This helper function creates and names a kernel
- * thread.  The thread will be stopped: use wake_up_process() to start
- * it.  See also kthread_run().
- *
- * If thread is going to be bound on a particular cpu, give its node
- * in @node, to get NUMA affinity for kthread stack, or else give -1.
- * When woken, the thread will run @threadfn() with @data as its
- * argument. @threadfn() can either call do_exit() directly if it is a
- * standalone thread for which no one will call kthread_stop(), or
- * return when 'kthread_should_stop()' is true (which means
- * kthread_stop() has been called).  The return value should be zero
- * or a negative error number; it will be passed to kthread_stop().
- *
- * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
- */
-struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
-					   void *data, int node,
-					   const char namefmt[],
-					   ...)
+static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
+						    void *data, int node,
+						    const char namefmt[],
+						    va_list args)
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 	struct task_struct *task;
@@ -308,11 +286,8 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	task = create->result;
 	if (!IS_ERR(task)) {
 		static const struct sched_param param = { .sched_priority = 0 };
-		va_list args;
 
-		va_start(args, namefmt);
 		vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
-		va_end(args);
 		/*
 		 * root may have changed our (kthreadd's) priority or CPU mask.
 		 * The kernel thread should not inherit these properties.
@@ -323,6 +298,44 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	kfree(create);
 	return task;
 }
+
+
+/**
+ * kthread_create_on_node - create a kthread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @node: memory node number.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread.  The thread will be stopped: use wake_up_process() to start
+ * it.  See also kthread_run().
+ *
+ * If thread is going to be bound on a particular cpu, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give -1.
+ * When woken, the thread will run @threadfn() with @data as its
+ * argument. @threadfn() can either call do_exit() directly if it is a
+ * standalone thread for which no one will call kthread_stop(), or
+ * return when 'kthread_should_stop()' is true (which means
+ * kthread_stop() has been called).  The return value should be zero
+ * or a negative error number; it will be passed to kthread_stop().
+ *
+ * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
+ */
+struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
+					   void *data, int node,
+					   const char namefmt[],
+					   ...)
+{
+	struct task_struct *task;
+	va_list args;
+
+	va_start(args, namefmt);
+	task = __kthread_create_on_node(threadfn, data, node, namefmt, args);
+	va_end(args);
+
+	return task;
+}
 EXPORT_SYMBOL(kthread_create_on_node);
 
 static void __kthread_bind(struct task_struct *p, unsigned int cpu, long state)
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 02/14] kthread: Add create_kthread_worker*()
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines create_kthread_worker_on_node() and
create_kthread_worker() functions that hide implementation details.

It enforces using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think
that we do not want any alternative main thread because it would be
hard to support consistency with the rest of the kthread worker API.

The naming is inspired by the workqueues API like the reset of the
kthread worker API.

This patch does _not_ convert existing kthread workers. The kthread worker
API need more improvements first, e.g. a function to destroy the worker.
We should not need to access @worker->task and other struct kthread_worker
members directly.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  8 ++++++++
 kernel/kthread.c        | 43 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 13d55206ccf6..fc8a7d253c40 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -123,6 +123,14 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
+__printf(3, 4)
+int create_kthread_worker_on_node(struct kthread_worker *worker,
+				  int node,
+				  const char namefmt[], ...);
+
+#define create_kthread_worker(worker, namefmt, arg...)			\
+	create_kthread_worker_on_node(worker, -1, namefmt, ##arg)
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index fca7cd124512..fe9421728f76 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -561,7 +561,11 @@ int kthread_worker_fn(void *worker_ptr)
 	struct kthread_worker *worker = worker_ptr;
 	struct kthread_work *work;
 
-	WARN_ON(worker->task);
+	/*
+	 * FIXME: Update the check and remove the assignment when all kthread
+	 * worker users are created using create_kthread_worker*() functions.
+	 */
+	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
@@ -595,6 +599,43 @@ repeat:
 }
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
+/**
+ * create_kthread_worker_on_node - create a kthread worker.
+ * @worker: initialized kthread worker struct.
+ * @node: memory node number.
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * If the worker is going to be bound on a particular CPU, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give -1.
+ */
+int create_kthread_worker_on_node(struct kthread_worker *worker,
+				  int node,
+				  const char namefmt[], ...)
+{
+	struct task_struct *task;
+	va_list args;
+
+	if (worker->task)
+		return -EINVAL;
+
+	va_start(args, namefmt);
+	task = __kthread_create_on_node(kthread_worker_fn, worker, node,
+					namefmt, args);
+	va_end(args);
+
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+
+	spin_lock_irq(&worker->lock);
+	worker->task = task;
+	spin_unlock_irq(&worker->lock);
+
+	wake_up_process(task);
+
+	return 0;
+}
+EXPORT_SYMBOL(create_kthread_worker_on_node);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 02/14] kthread: Add create_kthread_worker*()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines create_kthread_worker_on_node() and
create_kthread_worker() functions that hide implementation details.

It enforces using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think
that we do not want any alternative main thread because it would be
hard to support consistency with the rest of the kthread worker API.

The naming is inspired by the workqueues API like the reset of the
kthread worker API.

This patch does _not_ convert existing kthread workers. The kthread worker
API need more improvements first, e.g. a function to destroy the worker.
We should not need to access @worker->task and other struct kthread_worker
members directly.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  8 ++++++++
 kernel/kthread.c        | 43 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 13d55206ccf6..fc8a7d253c40 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -123,6 +123,14 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
+__printf(3, 4)
+int create_kthread_worker_on_node(struct kthread_worker *worker,
+				  int node,
+				  const char namefmt[], ...);
+
+#define create_kthread_worker(worker, namefmt, arg...)			\
+	create_kthread_worker_on_node(worker, -1, namefmt, ##arg)
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index fca7cd124512..fe9421728f76 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -561,7 +561,11 @@ int kthread_worker_fn(void *worker_ptr)
 	struct kthread_worker *worker = worker_ptr;
 	struct kthread_work *work;
 
-	WARN_ON(worker->task);
+	/*
+	 * FIXME: Update the check and remove the assignment when all kthread
+	 * worker users are created using create_kthread_worker*() functions.
+	 */
+	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
@@ -595,6 +599,43 @@ repeat:
 }
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
+/**
+ * create_kthread_worker_on_node - create a kthread worker.
+ * @worker: initialized kthread worker struct.
+ * @node: memory node number.
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * If the worker is going to be bound on a particular CPU, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give -1.
+ */
+int create_kthread_worker_on_node(struct kthread_worker *worker,
+				  int node,
+				  const char namefmt[], ...)
+{
+	struct task_struct *task;
+	va_list args;
+
+	if (worker->task)
+		return -EINVAL;
+
+	va_start(args, namefmt);
+	task = __kthread_create_on_node(kthread_worker_fn, worker, node,
+					namefmt, args);
+	va_end(args);
+
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+
+	spin_lock_irq(&worker->lock);
+	worker->task = task;
+	spin_unlock_irq(&worker->lock);
+
+	wake_up_process(task);
+
+	return 0;
+}
+EXPORT_SYMBOL(create_kthread_worker_on_node);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

flush_kthread_worker() returns when the currently queued works are proceed.
But some other works might have been queued in the meantime.

This patch adds drain_kthread_work() that is inspired by drain_workqueue().
It returns when the queue is completely empty. Also it affects the behavior
of queue_kthread_work(). Only currently running work is allowed to queue
another work when the draining is in progress. A warning is printed when
some work is being queued from other context or when the draining takes
too long.

Note that drain() will typically be called when the queue should stay
empty, e.g. when the worker is going to be destroyed. In this case,
the caller should block all users from producing more work. This is
why the warning is printed. But some more works might be needed
to proceed the already existing works. This is why re-queuing
is allowed.

Callers also have to block existing works from an infinite re-queuing.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |   1 +
 kernel/kthread.c        | 121 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index fc8a7d253c40..974d70193907 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -68,6 +68,7 @@ struct kthread_worker {
 	struct list_head	work_list;
 	struct task_struct	*task;
 	struct kthread_work	*current_work;
+	int			nr_drainers;
 };
 
 struct kthread_work {
diff --git a/kernel/kthread.c b/kernel/kthread.c
index fe9421728f76..872f17e383c4 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -51,6 +51,7 @@ enum KTHREAD_BITS {
 	KTHREAD_SHOULD_STOP,
 	KTHREAD_SHOULD_PARK,
 	KTHREAD_IS_PARKED,
+	KTHREAD_IS_WORKER,
 };
 
 #define __to_kthread(vfork)	\
@@ -538,6 +539,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
 	lockdep_set_class_and_name(&worker->lock, key, name);
 	INIT_LIST_HEAD(&worker->work_list);
 	worker->task = NULL;
+	worker->nr_drainers = 0;
 }
 EXPORT_SYMBOL_GPL(__init_kthread_worker);
 
@@ -613,6 +615,7 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 				  const char namefmt[], ...)
 {
 	struct task_struct *task;
+	struct kthread *kthread;
 	va_list args;
 
 	if (worker->task)
@@ -626,6 +629,9 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 	if (IS_ERR(task))
 		return PTR_ERR(task);
 
+	kthread = to_kthread(task);
+	set_bit(KTHREAD_IS_WORKER, &kthread->flags);
+
 	spin_lock_irq(&worker->lock);
 	worker->task = task;
 	spin_unlock_irq(&worker->lock);
@@ -649,6 +655,56 @@ static void insert_kthread_work(struct kthread_worker *worker,
 		wake_up_process(worker->task);
 }
 
+/*
+ * Queue @work without the check for drainers.
+ * Must be called under @worker->lock.
+ */
+static bool __queue_kthread_work(struct kthread_worker *worker,
+			  struct kthread_work *work)
+{
+	lockdep_assert_held(&worker->lock);
+
+	if (list_empty(&work->node)) {
+		insert_kthread_work(worker, work, &worker->work_list);
+		return true;
+	}
+
+	return false;
+}
+
+/* return struct kthread_worker if %current is a kthread worker */
+static struct kthread_worker *current_kthread_worker(void)
+{
+	struct kthread *k;
+
+	if (!(current->flags & PF_KTHREAD))
+		goto fail;
+
+	k = to_kthread(current);
+	if (test_bit(KTHREAD_IS_WORKER, &k->flags))
+		return k->data;
+
+fail:
+	return NULL;
+}
+
+
+/*
+ * Test whether @work is being queued from another work
+ * executing on the same kthread.
+ */
+static bool is_chained_work(struct kthread_worker *worker)
+{
+	struct kthread_worker *current_worker;
+
+	current_worker = current_kthread_worker();
+	/*
+	 * Return %true if I'm a kthread worker executing a work item on
+	 * the given @worker.
+	 */
+	return current_worker && current_worker == worker;
+}
+
 /**
  * queue_kthread_work - queue a kthread_work
  * @worker: target kthread_worker
@@ -665,10 +721,14 @@ bool queue_kthread_work(struct kthread_worker *worker,
 	unsigned long flags;
 
 	spin_lock_irqsave(&worker->lock, flags);
-	if (list_empty(&work->node)) {
-		insert_kthread_work(worker, work, &worker->work_list);
-		ret = true;
-	}
+
+	/* if draining, only works from the same kthread worker are allowed */
+	if (unlikely(worker->nr_drainers) &&
+	    WARN_ON_ONCE(!is_chained_work(worker)))
+		goto fail;
+
+	ret = __queue_kthread_work(worker, work);
+fail:
 	spin_unlock_irqrestore(&worker->lock, flags);
 	return ret;
 }
@@ -740,7 +800,58 @@ void flush_kthread_worker(struct kthread_worker *worker)
 		COMPLETION_INITIALIZER_ONSTACK(fwork.done),
 	};
 
-	queue_kthread_work(worker, &fwork.work);
+	/* flush() is and can be used when draining */
+	spin_lock_irq(&worker->lock);
+	__queue_kthread_work(worker, &fwork.work);
+	spin_unlock_irq(&worker->lock);
+
 	wait_for_completion(&fwork.done);
 }
 EXPORT_SYMBOL_GPL(flush_kthread_worker);
+
+/**
+ * drain_kthread_worker - drain a kthread worker
+ * @worker: worker to be drained
+ *
+ * Wait until there is none work queued for the given kthread worker.
+ * Only currently running work on @worker can queue further work items
+ * on it.  @worker is flushed repeatedly until it becomes empty.
+ * The number of flushing is determined by the depth of chaining
+ * and should be relatively short.  Whine if it takes too long.
+ *
+ * The caller is responsible for blocking all existing works
+ * from an infinite re-queuing!
+ *
+ * Also the caller is responsible for blocking all the kthread
+ * worker users from queuing any new work. It is especially
+ * important if the queue has to stay empty once this function
+ * finishes.
+ */
+void drain_kthread_worker(struct kthread_worker *worker)
+{
+	int flush_cnt = 0;
+
+	spin_lock_irq(&worker->lock);
+	worker->nr_drainers++;
+
+	while (!list_empty(&worker->work_list)) {
+		/*
+		 * Unlock, so we could move forward. Note that queuing
+		 * is limited by @nr_drainers > 0.
+		 */
+		spin_unlock_irq(&worker->lock);
+
+		flush_kthread_worker(worker);
+
+		if (++flush_cnt == 10 ||
+		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
+			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
+				worker->task->comm, flush_cnt);
+
+		spin_lock_irq(&worker->lock);
+	}
+
+	worker->nr_drainers--;
+	spin_unlock_irq(&worker->lock);
+}
+EXPORT_SYMBOL(drain_kthread_worker);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

flush_kthread_worker() returns when the currently queued works are proceed.
But some other works might have been queued in the meantime.

This patch adds drain_kthread_work() that is inspired by drain_workqueue().
It returns when the queue is completely empty. Also it affects the behavior
of queue_kthread_work(). Only currently running work is allowed to queue
another work when the draining is in progress. A warning is printed when
some work is being queued from other context or when the draining takes
too long.

Note that drain() will typically be called when the queue should stay
empty, e.g. when the worker is going to be destroyed. In this case,
the caller should block all users from producing more work. This is
why the warning is printed. But some more works might be needed
to proceed the already existing works. This is why re-queuing
is allowed.

Callers also have to block existing works from an infinite re-queuing.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |   1 +
 kernel/kthread.c        | 121 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index fc8a7d253c40..974d70193907 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -68,6 +68,7 @@ struct kthread_worker {
 	struct list_head	work_list;
 	struct task_struct	*task;
 	struct kthread_work	*current_work;
+	int			nr_drainers;
 };
 
 struct kthread_work {
diff --git a/kernel/kthread.c b/kernel/kthread.c
index fe9421728f76..872f17e383c4 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -51,6 +51,7 @@ enum KTHREAD_BITS {
 	KTHREAD_SHOULD_STOP,
 	KTHREAD_SHOULD_PARK,
 	KTHREAD_IS_PARKED,
+	KTHREAD_IS_WORKER,
 };
 
 #define __to_kthread(vfork)	\
@@ -538,6 +539,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
 	lockdep_set_class_and_name(&worker->lock, key, name);
 	INIT_LIST_HEAD(&worker->work_list);
 	worker->task = NULL;
+	worker->nr_drainers = 0;
 }
 EXPORT_SYMBOL_GPL(__init_kthread_worker);
 
@@ -613,6 +615,7 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 				  const char namefmt[], ...)
 {
 	struct task_struct *task;
+	struct kthread *kthread;
 	va_list args;
 
 	if (worker->task)
@@ -626,6 +629,9 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 	if (IS_ERR(task))
 		return PTR_ERR(task);
 
+	kthread = to_kthread(task);
+	set_bit(KTHREAD_IS_WORKER, &kthread->flags);
+
 	spin_lock_irq(&worker->lock);
 	worker->task = task;
 	spin_unlock_irq(&worker->lock);
@@ -649,6 +655,56 @@ static void insert_kthread_work(struct kthread_worker *worker,
 		wake_up_process(worker->task);
 }
 
+/*
+ * Queue @work without the check for drainers.
+ * Must be called under @worker->lock.
+ */
+static bool __queue_kthread_work(struct kthread_worker *worker,
+			  struct kthread_work *work)
+{
+	lockdep_assert_held(&worker->lock);
+
+	if (list_empty(&work->node)) {
+		insert_kthread_work(worker, work, &worker->work_list);
+		return true;
+	}
+
+	return false;
+}
+
+/* return struct kthread_worker if %current is a kthread worker */
+static struct kthread_worker *current_kthread_worker(void)
+{
+	struct kthread *k;
+
+	if (!(current->flags & PF_KTHREAD))
+		goto fail;
+
+	k = to_kthread(current);
+	if (test_bit(KTHREAD_IS_WORKER, &k->flags))
+		return k->data;
+
+fail:
+	return NULL;
+}
+
+
+/*
+ * Test whether @work is being queued from another work
+ * executing on the same kthread.
+ */
+static bool is_chained_work(struct kthread_worker *worker)
+{
+	struct kthread_worker *current_worker;
+
+	current_worker = current_kthread_worker();
+	/*
+	 * Return %true if I'm a kthread worker executing a work item on
+	 * the given @worker.
+	 */
+	return current_worker && current_worker == worker;
+}
+
 /**
  * queue_kthread_work - queue a kthread_work
  * @worker: target kthread_worker
@@ -665,10 +721,14 @@ bool queue_kthread_work(struct kthread_worker *worker,
 	unsigned long flags;
 
 	spin_lock_irqsave(&worker->lock, flags);
-	if (list_empty(&work->node)) {
-		insert_kthread_work(worker, work, &worker->work_list);
-		ret = true;
-	}
+
+	/* if draining, only works from the same kthread worker are allowed */
+	if (unlikely(worker->nr_drainers) &&
+	    WARN_ON_ONCE(!is_chained_work(worker)))
+		goto fail;
+
+	ret = __queue_kthread_work(worker, work);
+fail:
 	spin_unlock_irqrestore(&worker->lock, flags);
 	return ret;
 }
@@ -740,7 +800,58 @@ void flush_kthread_worker(struct kthread_worker *worker)
 		COMPLETION_INITIALIZER_ONSTACK(fwork.done),
 	};
 
-	queue_kthread_work(worker, &fwork.work);
+	/* flush() is and can be used when draining */
+	spin_lock_irq(&worker->lock);
+	__queue_kthread_work(worker, &fwork.work);
+	spin_unlock_irq(&worker->lock);
+
 	wait_for_completion(&fwork.done);
 }
 EXPORT_SYMBOL_GPL(flush_kthread_worker);
+
+/**
+ * drain_kthread_worker - drain a kthread worker
+ * @worker: worker to be drained
+ *
+ * Wait until there is none work queued for the given kthread worker.
+ * Only currently running work on @worker can queue further work items
+ * on it.  @worker is flushed repeatedly until it becomes empty.
+ * The number of flushing is determined by the depth of chaining
+ * and should be relatively short.  Whine if it takes too long.
+ *
+ * The caller is responsible for blocking all existing works
+ * from an infinite re-queuing!
+ *
+ * Also the caller is responsible for blocking all the kthread
+ * worker users from queuing any new work. It is especially
+ * important if the queue has to stay empty once this function
+ * finishes.
+ */
+void drain_kthread_worker(struct kthread_worker *worker)
+{
+	int flush_cnt = 0;
+
+	spin_lock_irq(&worker->lock);
+	worker->nr_drainers++;
+
+	while (!list_empty(&worker->work_list)) {
+		/*
+		 * Unlock, so we could move forward. Note that queuing
+		 * is limited by @nr_drainers > 0.
+		 */
+		spin_unlock_irq(&worker->lock);
+
+		flush_kthread_worker(worker);
+
+		if (++flush_cnt == 10 ||
+		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
+			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
+				worker->task->comm, flush_cnt);
+
+		spin_lock_irq(&worker->lock);
+	}
+
+	worker->nr_drainers--;
+	spin_unlock_irq(&worker->lock);
+}
+EXPORT_SYMBOL(drain_kthread_worker);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 04/14] kthread: Add destroy_kthread_worker()
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

The current kthread worker users call flush() and stop() explicitly.
The new function will make it easier and will do it better.

Note that flush() does not guarantee that the queue is empty. drain()
is more safe. It returns when the queue is empty. Also is causes
that queue() ignores unexpected works and warns about it.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  2 ++
 kernel/kthread.c        | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 974d70193907..a0b811c95c75 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -137,4 +137,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+void destroy_kthread_worker(struct kthread_worker *worker);
+
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 872f17e383c4..4f6b20710eb3 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -855,3 +855,23 @@ void drain_kthread_worker(struct kthread_worker *worker)
 	spin_unlock_irq(&worker->lock);
 }
 EXPORT_SYMBOL(drain_kthread_worker);
+
+/**
+ * destroy_kthread_worker - destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Destroy @worker. It should be idle when this is called.
+ */
+void destroy_kthread_worker(struct kthread_worker *worker)
+{
+	struct task_struct *task;
+
+	task = worker->task;
+	if (WARN_ON(!task))
+		return;
+
+	drain_kthread_worker(worker);
+
+	WARN_ON(kthread_stop(task));
+}
+EXPORT_SYMBOL(destroy_kthread_worker);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 04/14] kthread: Add destroy_kthread_worker()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

The current kthread worker users call flush() and stop() explicitly.
The new function will make it easier and will do it better.

Note that flush() does not guarantee that the queue is empty. drain()
is more safe. It returns when the queue is empty. Also is causes
that queue() ignores unexpected works and warns about it.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  2 ++
 kernel/kthread.c        | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 974d70193907..a0b811c95c75 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -137,4 +137,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+void destroy_kthread_worker(struct kthread_worker *worker);
+
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 872f17e383c4..4f6b20710eb3 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -855,3 +855,23 @@ void drain_kthread_worker(struct kthread_worker *worker)
 	spin_unlock_irq(&worker->lock);
 }
 EXPORT_SYMBOL(drain_kthread_worker);
+
+/**
+ * destroy_kthread_worker - destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Destroy @worker. It should be idle when this is called.
+ */
+void destroy_kthread_worker(struct kthread_worker *worker)
+{
+	struct task_struct *task;
+
+	task = worker->task;
+	if (WARN_ON(!task))
+		return;
+
+	drain_kthread_worker(worker);
+
+	WARN_ON(kthread_stop(task));
+}
+EXPORT_SYMBOL(destroy_kthread_worker);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 05/14] kthread: Add wakeup_and_destroy_kthread_worker()
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Most kthreads are sleeping lots of time. They do some job either
in regular intervals or when there is an event. Many of them combine
the two approaches.

The job is either a "single" operation, e.g. check and make a huge page.
Or the kthread is serving several requests, e.g. handling several NFS
callbacks.

Anyway, the single thread could process only one request at a time
and there might be more pending requests. Some kthreads use a more
complex algorithms to prioritize the pending work, e.g. a red-black
tree used by dmcrypt_write().

I want to say that only some kthreads can be solved the "ideal" way
when a work is queued when it is needed. Instead, many kthreads will
use self-queuing works that will monitor the state and wait for
the job inside the work. It means that we will need to wakeup
the currently processing job when the worker is going to be
destroyed. This is where this function will be useful.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  1 +
 kernel/kthread.c        | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index a0b811c95c75..24d72bac27db 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -138,5 +138,6 @@ void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
 void destroy_kthread_worker(struct kthread_worker *worker);
+void wakeup_and_destroy_kthread_worker(struct kthread_worker *worker);
 
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4f6b20710eb3..053c9dfa58ac 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -875,3 +875,28 @@ void destroy_kthread_worker(struct kthread_worker *worker)
 	WARN_ON(kthread_stop(task));
 }
 EXPORT_SYMBOL(destroy_kthread_worker);
+
+/**
+ * wakeup_and_destroy_kthread_worker - wake up and destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Wakeup potentially sleeping work and destroy the @worker. All users should
+ * be aware that they should not produce more work anymore. It is especially
+ * useful for self-queuing works that are waiting for some job inside the work.
+ * They are supposed to wake up, check the situation, and stop re-queuing.
+ */
+void wakeup_and_destroy_kthread_worker(struct kthread_worker *worker)
+{
+	struct task_struct *task = worker->task;
+
+	if (WARN_ON(!task))
+		return;
+
+	spin_lock_irq(&worker->lock);
+	if (worker->current_work)
+		wake_up_process(worker->task);
+	spin_unlock_irq(&worker->lock);
+
+	destroy_kthread_worker(worker);
+}
+EXPORT_SYMBOL(wakeup_and_destroy_kthread_worker);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 05/14] kthread: Add wakeup_and_destroy_kthread_worker()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Most kthreads are sleeping lots of time. They do some job either
in regular intervals or when there is an event. Many of them combine
the two approaches.

The job is either a "single" operation, e.g. check and make a huge page.
Or the kthread is serving several requests, e.g. handling several NFS
callbacks.

Anyway, the single thread could process only one request at a time
and there might be more pending requests. Some kthreads use a more
complex algorithms to prioritize the pending work, e.g. a red-black
tree used by dmcrypt_write().

I want to say that only some kthreads can be solved the "ideal" way
when a work is queued when it is needed. Instead, many kthreads will
use self-queuing works that will monitor the state and wait for
the job inside the work. It means that we will need to wakeup
the currently processing job when the worker is going to be
destroyed. This is where this function will be useful.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  1 +
 kernel/kthread.c        | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index a0b811c95c75..24d72bac27db 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -138,5 +138,6 @@ void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
 void destroy_kthread_worker(struct kthread_worker *worker);
+void wakeup_and_destroy_kthread_worker(struct kthread_worker *worker);
 
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4f6b20710eb3..053c9dfa58ac 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -875,3 +875,28 @@ void destroy_kthread_worker(struct kthread_worker *worker)
 	WARN_ON(kthread_stop(task));
 }
 EXPORT_SYMBOL(destroy_kthread_worker);
+
+/**
+ * wakeup_and_destroy_kthread_worker - wake up and destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Wakeup potentially sleeping work and destroy the @worker. All users should
+ * be aware that they should not produce more work anymore. It is especially
+ * useful for self-queuing works that are waiting for some job inside the work.
+ * They are supposed to wake up, check the situation, and stop re-queuing.
+ */
+void wakeup_and_destroy_kthread_worker(struct kthread_worker *worker)
+{
+	struct task_struct *task = worker->task;
+
+	if (WARN_ON(!task))
+		return;
+
+	spin_lock_irq(&worker->lock);
+	if (worker->current_work)
+		wake_up_process(worker->task);
+	spin_unlock_irq(&worker->lock);
+
+	destroy_kthread_worker(worker);
+}
+EXPORT_SYMBOL(wakeup_and_destroy_kthread_worker);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 06/14] kthread: Add kthread_worker_created()
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

I would like to make cleaner kthread worker API and hide the definition
of struct kthread_worker. It will prevent any custom hacks and make
the API more secure.

This patch provides an API to check if the worker has been created
and hides the implementation details.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 24d72bac27db..02d3cc9ad923 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -122,6 +122,11 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 		(work)->func = (fn);					\
 	} while (0)
 
+static inline bool kthread_worker_created(struct kthread_worker *worker)
+{
+	return (worker && worker->task);
+}
+
 int kthread_worker_fn(void *worker_ptr);
 
 __printf(3, 4)
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 06/14] kthread: Add kthread_worker_created()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

I would like to make cleaner kthread worker API and hide the definition
of struct kthread_worker. It will prevent any custom hacks and make
the API more secure.

This patch provides an API to check if the worker has been created
and hides the implementation details.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 24d72bac27db..02d3cc9ad923 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -122,6 +122,11 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 		(work)->func = (fn);					\
 	} while (0)
 
+static inline bool kthread_worker_created(struct kthread_worker *worker)
+{
+	return (worker && worker->task);
+}
+
 int kthread_worker_fn(void *worker_ptr);
 
 __printf(3, 4)
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup
the worker when it is already created and someone
calls start() once again.

Note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 mm/huge_memory.c | 91 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 57 insertions(+), 34 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c107094f79ba..55733735a487 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -54,7 +54,17 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_init_func(struct kthread_work *dummy);
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_wait_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static DEFINE_KTHREAD_WORKER(khugepaged_worker);
+static DEFINE_KTHREAD_WORK(khugepaged_init_work, khugepaged_init_func);
+static DEFINE_KTHREAD_WORK(khugepaged_do_scan_work, khugepaged_do_scan_func);
+static DEFINE_KTHREAD_WORK(khugepaged_wait_work, khugepaged_wait_func);
+static DEFINE_KTHREAD_WORK(khugepaged_cleanup_work, khugepaged_cleanup_func);
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
 static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
@@ -65,7 +75,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -146,25 +155,34 @@ static int start_stop_khugepaged(void)
 {
 	int err = 0;
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (unlikely(IS_ERR(khugepaged_thread))) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (kthread_worker_created(&khugepaged_worker))
+			goto out;
+
+		err = create_kthread_worker(&khugepaged_worker,
+					    "khugepaged");
+
+		if (unlikely(err)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		queue_kthread_work(&khugepaged_worker,
+				   &khugepaged_init_work);
+
+		if (list_empty(&khugepaged_scan.mm_head))
+			queue_kthread_work(&khugepaged_worker,
+					   &khugepaged_wait_work);
+		else
+			queue_kthread_work(&khugepaged_worker,
+					   &khugepaged_do_scan_work);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (kthread_worker_created(&khugepaged_worker)) {
+		queue_kthread_work(&khugepaged_worker,
+				   &khugepaged_cleanup_work);
+		wakeup_and_destroy_kthread_worker(&khugepaged_worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -2780,11 +2798,17 @@ static int khugepaged_has_work(void)
 
 static int khugepaged_wait_event(void)
 {
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
+	return (!list_empty(&khugepaged_scan.mm_head) ||
+		!khugepaged_enabled());
+}
+
+static void khugepaged_init_func(struct kthread_work *dummy)
+{
+	set_freezable();
+	set_user_nice(current, MAX_NICE);
 }
 
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2799,7 +2823,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2816,43 +2840,42 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
+
+	if (khugepaged_enabled())
+		queue_kthread_work(&khugepaged_worker, &khugepaged_wait_work);
 }
 
-static void khugepaged_wait_work(void)
+static void khugepaged_wait_func(struct kthread_work *dummy)
 {
 	if (khugepaged_has_work()) {
 		if (!khugepaged_scan_sleep_millisecs)
-			return;
+			goto out;
 
 		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
+					     !khugepaged_enabled(),
 			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
+		goto out;
 	}
 
 	if (khugepaged_enabled())
 		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+
+out:
+	if (khugepaged_enabled())
+		queue_kthread_work(&khugepaged_worker,
+				   &khugepaged_do_scan_work);
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup
the worker when it is already created and someone
calls start() once again.

Note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 mm/huge_memory.c | 91 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 57 insertions(+), 34 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c107094f79ba..55733735a487 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -54,7 +54,17 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_init_func(struct kthread_work *dummy);
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_wait_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static DEFINE_KTHREAD_WORKER(khugepaged_worker);
+static DEFINE_KTHREAD_WORK(khugepaged_init_work, khugepaged_init_func);
+static DEFINE_KTHREAD_WORK(khugepaged_do_scan_work, khugepaged_do_scan_func);
+static DEFINE_KTHREAD_WORK(khugepaged_wait_work, khugepaged_wait_func);
+static DEFINE_KTHREAD_WORK(khugepaged_cleanup_work, khugepaged_cleanup_func);
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
 static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
@@ -65,7 +75,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -146,25 +155,34 @@ static int start_stop_khugepaged(void)
 {
 	int err = 0;
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (unlikely(IS_ERR(khugepaged_thread))) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (kthread_worker_created(&khugepaged_worker))
+			goto out;
+
+		err = create_kthread_worker(&khugepaged_worker,
+					    "khugepaged");
+
+		if (unlikely(err)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		queue_kthread_work(&khugepaged_worker,
+				   &khugepaged_init_work);
+
+		if (list_empty(&khugepaged_scan.mm_head))
+			queue_kthread_work(&khugepaged_worker,
+					   &khugepaged_wait_work);
+		else
+			queue_kthread_work(&khugepaged_worker,
+					   &khugepaged_do_scan_work);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (kthread_worker_created(&khugepaged_worker)) {
+		queue_kthread_work(&khugepaged_worker,
+				   &khugepaged_cleanup_work);
+		wakeup_and_destroy_kthread_worker(&khugepaged_worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -2780,11 +2798,17 @@ static int khugepaged_has_work(void)
 
 static int khugepaged_wait_event(void)
 {
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
+	return (!list_empty(&khugepaged_scan.mm_head) ||
+		!khugepaged_enabled());
+}
+
+static void khugepaged_init_func(struct kthread_work *dummy)
+{
+	set_freezable();
+	set_user_nice(current, MAX_NICE);
 }
 
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2799,7 +2823,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2816,43 +2840,42 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
+
+	if (khugepaged_enabled())
+		queue_kthread_work(&khugepaged_worker, &khugepaged_wait_work);
 }
 
-static void khugepaged_wait_work(void)
+static void khugepaged_wait_func(struct kthread_work *dummy)
 {
 	if (khugepaged_has_work()) {
 		if (!khugepaged_scan_sleep_millisecs)
-			return;
+			goto out;
 
 		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
+					     !khugepaged_enabled(),
 			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
+		goto out;
 	}
 
 	if (khugepaged_enabled())
 		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+
+out:
+	if (khugepaged_enabled())
+		queue_kthread_work(&khugepaged_worker,
+				   &khugepaged_do_scan_work);
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 08/14] rcu: Convert RCU gp kthreads into kthread worker API
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts RCU gp threads into the kthread worker API.
They modify the scheduling, have their own logic to bind the process.
They provide functions that are critical for the system to work
and thus deserve a dedicated kthread. In fact, they most likely
could not be implemented using workqueues because workqueues
are implemented using RCU.

The conversion is rather straightforward. It moves the code from
the main cycle into a single work because they should be done
together.

Note that we would like to provide more helper functions in
the kthread worker API and hide access to worker.task in the
long term. But it is not completely solved in this RFC.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/rcu/tree.c | 175 +++++++++++++++++++++++++++++-------------------------
 kernel/rcu/tree.h |   4 +-
 2 files changed, 96 insertions(+), 83 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65137bc28b2b..475bd59509ed 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -485,7 +485,7 @@ void show_rcu_gp_kthreads(void)
 
 	for_each_rcu_flavor(rsp) {
 		pr_info("%s: wait state: %d ->state: %#lx\n",
-			rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+			rsp->name, rsp->gp_state, rsp->gp_worker.task->state);
 		/* sched_show_task(rsp->gp_kthread); */
 	}
 }
@@ -1586,9 +1586,9 @@ static int rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(struct rcu_state *rsp)
 {
-	if (current == rsp->gp_kthread ||
+	if (current == rsp->gp_worker.task ||
 	    !READ_ONCE(rsp->gp_flags) ||
-	    !rsp->gp_kthread)
+	    !rsp->gp_worker.task)
 		return;
 	wake_up(&rsp->gp_wq);
 }
@@ -2017,101 +2017,109 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	raw_spin_unlock_irq(&rnp->lock);
 }
 
+static void rcu_gp_kthread_init_func(struct kthread_work *work)
+{
+	struct rcu_state *rsp = container_of(work, struct rcu_state,
+					     gp_init_work);
+
+	rcu_bind_gp_kthread();
+
+	queue_kthread_work(&rsp->gp_worker, &rsp->gp_work);
+}
+
 /*
- * Body of kthread that handles grace periods.
+ * Main work of kthread that handles grace periods.
  */
-static int __noreturn rcu_gp_kthread(void *arg)
+static void rcu_gp_kthread_func(struct kthread_work *work)
 {
 	int fqs_state;
 	int gf;
 	unsigned long j;
 	int ret;
-	struct rcu_state *rsp = arg;
+	struct rcu_state *rsp = container_of(work, struct rcu_state, gp_work);
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
-	rcu_bind_gp_kthread();
+	/* Handle grace-period start. */
 	for (;;) {
+		trace_rcu_grace_period(rsp->name,
+				       READ_ONCE(rsp->gpnum),
+				       TPS("reqwait"));
+		rsp->gp_state = RCU_GP_WAIT_GPS;
+		wait_event_interruptible(rsp->gp_wq,
+					 READ_ONCE(rsp->gp_flags) &
+					 RCU_GP_FLAG_INIT);
+		/* Locking provides needed memory barrier. */
+		if (rcu_gp_init(rsp))
+			break;
+		cond_resched_rcu_qs();
+		WRITE_ONCE(rsp->gp_activity, jiffies);
+		WARN_ON(signal_pending(current));
+		trace_rcu_grace_period(rsp->name,
+				       READ_ONCE(rsp->gpnum),
+				       TPS("reqwaitsig"));
+	}
 
-		/* Handle grace-period start. */
-		for (;;) {
+	/* Handle quiescent-state forcing. */
+	fqs_state = RCU_SAVE_DYNTICK;
+	j = jiffies_till_first_fqs;
+	if (j > HZ) {
+		j = HZ;
+		jiffies_till_first_fqs = HZ;
+	}
+	ret = 0;
+	for (;;) {
+		if (!ret)
+			rsp->jiffies_force_qs = jiffies + j;
+		trace_rcu_grace_period(rsp->name,
+				       READ_ONCE(rsp->gpnum),
+				       TPS("fqswait"));
+		rsp->gp_state = RCU_GP_WAIT_FQS;
+		ret = wait_event_interruptible_timeout(rsp->gp_wq,
+				((gf = READ_ONCE(rsp->gp_flags)) &
+				 RCU_GP_FLAG_FQS) ||
+				(!READ_ONCE(rnp->qsmask) &&
+				 !rcu_preempt_blocked_readers_cgp(rnp)),
+				j);
+		/* Locking provides needed memory barriers. */
+		/* If grace period done, leave loop. */
+		if (!READ_ONCE(rnp->qsmask) &&
+		    !rcu_preempt_blocked_readers_cgp(rnp))
+			break;
+		/* If time for quiescent-state forcing, do it. */
+		if (ULONG_CMP_GE(jiffies, rsp->jiffies_force_qs) ||
+		    (gf & RCU_GP_FLAG_FQS)) {
 			trace_rcu_grace_period(rsp->name,
 					       READ_ONCE(rsp->gpnum),
-					       TPS("reqwait"));
-			rsp->gp_state = RCU_GP_WAIT_GPS;
-			wait_event_interruptible(rsp->gp_wq,
-						 READ_ONCE(rsp->gp_flags) &
-						 RCU_GP_FLAG_INIT);
-			/* Locking provides needed memory barrier. */
-			if (rcu_gp_init(rsp))
-				break;
+					       TPS("fqsstart"));
+			fqs_state = rcu_gp_fqs(rsp, fqs_state);
+			trace_rcu_grace_period(rsp->name,
+					       READ_ONCE(rsp->gpnum),
+					       TPS("fqsend"));
+			cond_resched_rcu_qs();
+			WRITE_ONCE(rsp->gp_activity, jiffies);
+		} else {
+			/* Deal with stray signal. */
 			cond_resched_rcu_qs();
 			WRITE_ONCE(rsp->gp_activity, jiffies);
 			WARN_ON(signal_pending(current));
 			trace_rcu_grace_period(rsp->name,
 					       READ_ONCE(rsp->gpnum),
-					       TPS("reqwaitsig"));
+					       TPS("fqswaitsig"));
 		}
-
-		/* Handle quiescent-state forcing. */
-		fqs_state = RCU_SAVE_DYNTICK;
-		j = jiffies_till_first_fqs;
+		j = jiffies_till_next_fqs;
 		if (j > HZ) {
 			j = HZ;
-			jiffies_till_first_fqs = HZ;
+			jiffies_till_next_fqs = HZ;
+		} else if (j < 1) {
+			j = 1;
+			jiffies_till_next_fqs = 1;
 		}
-		ret = 0;
-		for (;;) {
-			if (!ret)
-				rsp->jiffies_force_qs = jiffies + j;
-			trace_rcu_grace_period(rsp->name,
-					       READ_ONCE(rsp->gpnum),
-					       TPS("fqswait"));
-			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = wait_event_interruptible_timeout(rsp->gp_wq,
-					((gf = READ_ONCE(rsp->gp_flags)) &
-					 RCU_GP_FLAG_FQS) ||
-					(!READ_ONCE(rnp->qsmask) &&
-					 !rcu_preempt_blocked_readers_cgp(rnp)),
-					j);
-			/* Locking provides needed memory barriers. */
-			/* If grace period done, leave loop. */
-			if (!READ_ONCE(rnp->qsmask) &&
-			    !rcu_preempt_blocked_readers_cgp(rnp))
-				break;
-			/* If time for quiescent-state forcing, do it. */
-			if (ULONG_CMP_GE(jiffies, rsp->jiffies_force_qs) ||
-			    (gf & RCU_GP_FLAG_FQS)) {
-				trace_rcu_grace_period(rsp->name,
-						       READ_ONCE(rsp->gpnum),
-						       TPS("fqsstart"));
-				fqs_state = rcu_gp_fqs(rsp, fqs_state);
-				trace_rcu_grace_period(rsp->name,
-						       READ_ONCE(rsp->gpnum),
-						       TPS("fqsend"));
-				cond_resched_rcu_qs();
-				WRITE_ONCE(rsp->gp_activity, jiffies);
-			} else {
-				/* Deal with stray signal. */
-				cond_resched_rcu_qs();
-				WRITE_ONCE(rsp->gp_activity, jiffies);
-				WARN_ON(signal_pending(current));
-				trace_rcu_grace_period(rsp->name,
-						       READ_ONCE(rsp->gpnum),
-						       TPS("fqswaitsig"));
-			}
-			j = jiffies_till_next_fqs;
-			if (j > HZ) {
-				j = HZ;
-				jiffies_till_next_fqs = HZ;
-			} else if (j < 1) {
-				j = 1;
-				jiffies_till_next_fqs = 1;
-			}
-		}
-
-		/* Handle grace-period end. */
-		rcu_gp_cleanup(rsp);
 	}
+
+	/* Handle grace-period end. */
+	rcu_gp_cleanup(rsp);
+
+	queue_kthread_work(&rsp->gp_worker, &rsp->gp_work);
 }
 
 /*
@@ -2129,7 +2137,7 @@ static bool
 rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
 		      struct rcu_data *rdp)
 {
-	if (!rsp->gp_kthread || !cpu_needs_another_gp(rsp, rdp)) {
+	if (!rsp->gp_worker.task || !cpu_needs_another_gp(rsp, rdp)) {
 		/*
 		 * Either we have not yet spawned the grace-period
 		 * task, this CPU does not need another grace period,
@@ -3909,7 +3917,7 @@ static int __init rcu_spawn_gp_kthread(void)
 	struct rcu_node *rnp;
 	struct rcu_state *rsp;
 	struct sched_param sp;
-	struct task_struct *t;
+	int ret;
 
 	/* Force priority into range. */
 	if (IS_ENABLED(CONFIG_RCU_BOOST) && kthread_prio < 1)
@@ -3924,16 +3932,19 @@ static int __init rcu_spawn_gp_kthread(void)
 
 	rcu_scheduler_fully_active = 1;
 	for_each_rcu_flavor(rsp) {
-		t = kthread_create(rcu_gp_kthread, rsp, "%s", rsp->name);
-		BUG_ON(IS_ERR(t));
+		init_kthread_worker(&rsp->gp_worker);
+		init_kthread_work(&rsp->gp_init_work, rcu_gp_kthread_init_func);
+		init_kthread_work(&rsp->gp_work, rcu_gp_kthread_func);
+		ret = create_kthread_worker(&rsp->gp_worker, "%s", rsp->name);
+		BUG_ON(ret);
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
-		rsp->gp_kthread = t;
 		if (kthread_prio) {
 			sp.sched_priority = kthread_prio;
-			sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+			sched_setscheduler_nocheck(rsp->gp_worker.task,
+						   SCHED_FIFO, &sp);
 		}
-		wake_up_process(t);
+		queue_kthread_work(&rsp->gp_worker, &rsp->gp_init_work);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
 	rcu_spawn_nocb_kthreads();
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4adb7ca0bf47..2f318d406a53 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -457,7 +457,9 @@ struct rcu_state {
 	u8	boost;				/* Subject to priority boost. */
 	unsigned long gpnum;			/* Current gp number. */
 	unsigned long completed;		/* # of last completed gp. */
-	struct task_struct *gp_kthread;		/* Task for grace periods. */
+	struct kthread_worker gp_worker;	/* Worker for grace periods */
+	struct kthread_work gp_init_work;	/* Init work for handling gp */
+	struct kthread_work gp_work;		/* Main work for handling gp */
 	wait_queue_head_t gp_wq;		/* Where GP task waits. */
 	short gp_flags;				/* Commands for GP task. */
 	short gp_state;				/* GP kthread sleep state. */
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 08/14] rcu: Convert RCU gp kthreads into kthread worker API
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts RCU gp threads into the kthread worker API.
They modify the scheduling, have their own logic to bind the process.
They provide functions that are critical for the system to work
and thus deserve a dedicated kthread. In fact, they most likely
could not be implemented using workqueues because workqueues
are implemented using RCU.

The conversion is rather straightforward. It moves the code from
the main cycle into a single work because they should be done
together.

Note that we would like to provide more helper functions in
the kthread worker API and hide access to worker.task in the
long term. But it is not completely solved in this RFC.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/rcu/tree.c | 175 +++++++++++++++++++++++++++++-------------------------
 kernel/rcu/tree.h |   4 +-
 2 files changed, 96 insertions(+), 83 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65137bc28b2b..475bd59509ed 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -485,7 +485,7 @@ void show_rcu_gp_kthreads(void)
 
 	for_each_rcu_flavor(rsp) {
 		pr_info("%s: wait state: %d ->state: %#lx\n",
-			rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+			rsp->name, rsp->gp_state, rsp->gp_worker.task->state);
 		/* sched_show_task(rsp->gp_kthread); */
 	}
 }
@@ -1586,9 +1586,9 @@ static int rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(struct rcu_state *rsp)
 {
-	if (current == rsp->gp_kthread ||
+	if (current == rsp->gp_worker.task ||
 	    !READ_ONCE(rsp->gp_flags) ||
-	    !rsp->gp_kthread)
+	    !rsp->gp_worker.task)
 		return;
 	wake_up(&rsp->gp_wq);
 }
@@ -2017,101 +2017,109 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	raw_spin_unlock_irq(&rnp->lock);
 }
 
+static void rcu_gp_kthread_init_func(struct kthread_work *work)
+{
+	struct rcu_state *rsp = container_of(work, struct rcu_state,
+					     gp_init_work);
+
+	rcu_bind_gp_kthread();
+
+	queue_kthread_work(&rsp->gp_worker, &rsp->gp_work);
+}
+
 /*
- * Body of kthread that handles grace periods.
+ * Main work of kthread that handles grace periods.
  */
-static int __noreturn rcu_gp_kthread(void *arg)
+static void rcu_gp_kthread_func(struct kthread_work *work)
 {
 	int fqs_state;
 	int gf;
 	unsigned long j;
 	int ret;
-	struct rcu_state *rsp = arg;
+	struct rcu_state *rsp = container_of(work, struct rcu_state, gp_work);
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
-	rcu_bind_gp_kthread();
+	/* Handle grace-period start. */
 	for (;;) {
+		trace_rcu_grace_period(rsp->name,
+				       READ_ONCE(rsp->gpnum),
+				       TPS("reqwait"));
+		rsp->gp_state = RCU_GP_WAIT_GPS;
+		wait_event_interruptible(rsp->gp_wq,
+					 READ_ONCE(rsp->gp_flags) &
+					 RCU_GP_FLAG_INIT);
+		/* Locking provides needed memory barrier. */
+		if (rcu_gp_init(rsp))
+			break;
+		cond_resched_rcu_qs();
+		WRITE_ONCE(rsp->gp_activity, jiffies);
+		WARN_ON(signal_pending(current));
+		trace_rcu_grace_period(rsp->name,
+				       READ_ONCE(rsp->gpnum),
+				       TPS("reqwaitsig"));
+	}
 
-		/* Handle grace-period start. */
-		for (;;) {
+	/* Handle quiescent-state forcing. */
+	fqs_state = RCU_SAVE_DYNTICK;
+	j = jiffies_till_first_fqs;
+	if (j > HZ) {
+		j = HZ;
+		jiffies_till_first_fqs = HZ;
+	}
+	ret = 0;
+	for (;;) {
+		if (!ret)
+			rsp->jiffies_force_qs = jiffies + j;
+		trace_rcu_grace_period(rsp->name,
+				       READ_ONCE(rsp->gpnum),
+				       TPS("fqswait"));
+		rsp->gp_state = RCU_GP_WAIT_FQS;
+		ret = wait_event_interruptible_timeout(rsp->gp_wq,
+				((gf = READ_ONCE(rsp->gp_flags)) &
+				 RCU_GP_FLAG_FQS) ||
+				(!READ_ONCE(rnp->qsmask) &&
+				 !rcu_preempt_blocked_readers_cgp(rnp)),
+				j);
+		/* Locking provides needed memory barriers. */
+		/* If grace period done, leave loop. */
+		if (!READ_ONCE(rnp->qsmask) &&
+		    !rcu_preempt_blocked_readers_cgp(rnp))
+			break;
+		/* If time for quiescent-state forcing, do it. */
+		if (ULONG_CMP_GE(jiffies, rsp->jiffies_force_qs) ||
+		    (gf & RCU_GP_FLAG_FQS)) {
 			trace_rcu_grace_period(rsp->name,
 					       READ_ONCE(rsp->gpnum),
-					       TPS("reqwait"));
-			rsp->gp_state = RCU_GP_WAIT_GPS;
-			wait_event_interruptible(rsp->gp_wq,
-						 READ_ONCE(rsp->gp_flags) &
-						 RCU_GP_FLAG_INIT);
-			/* Locking provides needed memory barrier. */
-			if (rcu_gp_init(rsp))
-				break;
+					       TPS("fqsstart"));
+			fqs_state = rcu_gp_fqs(rsp, fqs_state);
+			trace_rcu_grace_period(rsp->name,
+					       READ_ONCE(rsp->gpnum),
+					       TPS("fqsend"));
+			cond_resched_rcu_qs();
+			WRITE_ONCE(rsp->gp_activity, jiffies);
+		} else {
+			/* Deal with stray signal. */
 			cond_resched_rcu_qs();
 			WRITE_ONCE(rsp->gp_activity, jiffies);
 			WARN_ON(signal_pending(current));
 			trace_rcu_grace_period(rsp->name,
 					       READ_ONCE(rsp->gpnum),
-					       TPS("reqwaitsig"));
+					       TPS("fqswaitsig"));
 		}
-
-		/* Handle quiescent-state forcing. */
-		fqs_state = RCU_SAVE_DYNTICK;
-		j = jiffies_till_first_fqs;
+		j = jiffies_till_next_fqs;
 		if (j > HZ) {
 			j = HZ;
-			jiffies_till_first_fqs = HZ;
+			jiffies_till_next_fqs = HZ;
+		} else if (j < 1) {
+			j = 1;
+			jiffies_till_next_fqs = 1;
 		}
-		ret = 0;
-		for (;;) {
-			if (!ret)
-				rsp->jiffies_force_qs = jiffies + j;
-			trace_rcu_grace_period(rsp->name,
-					       READ_ONCE(rsp->gpnum),
-					       TPS("fqswait"));
-			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = wait_event_interruptible_timeout(rsp->gp_wq,
-					((gf = READ_ONCE(rsp->gp_flags)) &
-					 RCU_GP_FLAG_FQS) ||
-					(!READ_ONCE(rnp->qsmask) &&
-					 !rcu_preempt_blocked_readers_cgp(rnp)),
-					j);
-			/* Locking provides needed memory barriers. */
-			/* If grace period done, leave loop. */
-			if (!READ_ONCE(rnp->qsmask) &&
-			    !rcu_preempt_blocked_readers_cgp(rnp))
-				break;
-			/* If time for quiescent-state forcing, do it. */
-			if (ULONG_CMP_GE(jiffies, rsp->jiffies_force_qs) ||
-			    (gf & RCU_GP_FLAG_FQS)) {
-				trace_rcu_grace_period(rsp->name,
-						       READ_ONCE(rsp->gpnum),
-						       TPS("fqsstart"));
-				fqs_state = rcu_gp_fqs(rsp, fqs_state);
-				trace_rcu_grace_period(rsp->name,
-						       READ_ONCE(rsp->gpnum),
-						       TPS("fqsend"));
-				cond_resched_rcu_qs();
-				WRITE_ONCE(rsp->gp_activity, jiffies);
-			} else {
-				/* Deal with stray signal. */
-				cond_resched_rcu_qs();
-				WRITE_ONCE(rsp->gp_activity, jiffies);
-				WARN_ON(signal_pending(current));
-				trace_rcu_grace_period(rsp->name,
-						       READ_ONCE(rsp->gpnum),
-						       TPS("fqswaitsig"));
-			}
-			j = jiffies_till_next_fqs;
-			if (j > HZ) {
-				j = HZ;
-				jiffies_till_next_fqs = HZ;
-			} else if (j < 1) {
-				j = 1;
-				jiffies_till_next_fqs = 1;
-			}
-		}
-
-		/* Handle grace-period end. */
-		rcu_gp_cleanup(rsp);
 	}
+
+	/* Handle grace-period end. */
+	rcu_gp_cleanup(rsp);
+
+	queue_kthread_work(&rsp->gp_worker, &rsp->gp_work);
 }
 
 /*
@@ -2129,7 +2137,7 @@ static bool
 rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
 		      struct rcu_data *rdp)
 {
-	if (!rsp->gp_kthread || !cpu_needs_another_gp(rsp, rdp)) {
+	if (!rsp->gp_worker.task || !cpu_needs_another_gp(rsp, rdp)) {
 		/*
 		 * Either we have not yet spawned the grace-period
 		 * task, this CPU does not need another grace period,
@@ -3909,7 +3917,7 @@ static int __init rcu_spawn_gp_kthread(void)
 	struct rcu_node *rnp;
 	struct rcu_state *rsp;
 	struct sched_param sp;
-	struct task_struct *t;
+	int ret;
 
 	/* Force priority into range. */
 	if (IS_ENABLED(CONFIG_RCU_BOOST) && kthread_prio < 1)
@@ -3924,16 +3932,19 @@ static int __init rcu_spawn_gp_kthread(void)
 
 	rcu_scheduler_fully_active = 1;
 	for_each_rcu_flavor(rsp) {
-		t = kthread_create(rcu_gp_kthread, rsp, "%s", rsp->name);
-		BUG_ON(IS_ERR(t));
+		init_kthread_worker(&rsp->gp_worker);
+		init_kthread_work(&rsp->gp_init_work, rcu_gp_kthread_init_func);
+		init_kthread_work(&rsp->gp_work, rcu_gp_kthread_func);
+		ret = create_kthread_worker(&rsp->gp_worker, "%s", rsp->name);
+		BUG_ON(ret);
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
-		rsp->gp_kthread = t;
 		if (kthread_prio) {
 			sp.sched_priority = kthread_prio;
-			sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+			sched_setscheduler_nocheck(rsp->gp_worker.task,
+						   SCHED_FIFO, &sp);
 		}
-		wake_up_process(t);
+		queue_kthread_work(&rsp->gp_worker, &rsp->gp_init_work);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
 	rcu_spawn_nocb_kthreads();
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4adb7ca0bf47..2f318d406a53 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -457,7 +457,9 @@ struct rcu_state {
 	u8	boost;				/* Subject to priority boost. */
 	unsigned long gpnum;			/* Current gp number. */
 	unsigned long completed;		/* # of last completed gp. */
-	struct task_struct *gp_kthread;		/* Task for grace periods. */
+	struct kthread_worker gp_worker;	/* Worker for grace periods */
+	struct kthread_work gp_init_work;	/* Init work for handling gp */
+	struct kthread_work gp_work;		/* Main work for handling gp */
 	wait_queue_head_t gp_wq;		/* Where GP task waits. */
 	short gp_flags;				/* Commands for GP task. */
 	short gp_state;				/* GP kthread sleep state. */
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

It looks strange to initialize the completions repeatedly.

This patch uses static initialization. It simplifies the code
and even helps to get rid of two memory barriers.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index a1503a027ee2..ccb1a0b95f64 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -24,8 +24,8 @@ struct rb_page {
 static int wakeup_interval = 100;
 
 static int reader_finish;
-static struct completion read_start;
-static struct completion read_done;
+static DECLARE_COMPLETION(read_start);
+static DECLARE_COMPLETION(read_done);
 
 static struct ring_buffer *buffer;
 static struct task_struct *producer;
@@ -270,11 +270,6 @@ static void ring_buffer_producer(void)
 	trace_printk("End ring buffer hammer\n");
 
 	if (consumer) {
-		/* Init both completions here to avoid races */
-		init_completion(&read_start);
-		init_completion(&read_done);
-		/* the completions must be visible before the finish var */
-		smp_wmb();
 		reader_finish = 1;
 		/* finish var visible before waking up the consumer */
 		smp_wmb();
@@ -389,13 +384,10 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	init_completion(&read_start);
-
 	while (!kthread_should_stop() && !kill_test) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
-			smp_wmb();
 			wake_up_process(consumer);
 			wait_for_completion(&read_start);
 		}
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

It looks strange to initialize the completions repeatedly.

This patch uses static initialization. It simplifies the code
and even helps to get rid of two memory barriers.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index a1503a027ee2..ccb1a0b95f64 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -24,8 +24,8 @@ struct rb_page {
 static int wakeup_interval = 100;
 
 static int reader_finish;
-static struct completion read_start;
-static struct completion read_done;
+static DECLARE_COMPLETION(read_start);
+static DECLARE_COMPLETION(read_done);
 
 static struct ring_buffer *buffer;
 static struct task_struct *producer;
@@ -270,11 +270,6 @@ static void ring_buffer_producer(void)
 	trace_printk("End ring buffer hammer\n");
 
 	if (consumer) {
-		/* Init both completions here to avoid races */
-		init_completion(&read_start);
-		init_completion(&read_done);
-		/* the completions must be visible before the finish var */
-		smp_wmb();
 		reader_finish = 1;
 		/* finish var visible before waking up the consumer */
 		smp_wmb();
@@ -389,13 +384,10 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	init_completion(&read_start);
-
 	while (!kthread_should_stop() && !kill_test) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
-			smp_wmb();
 			wake_up_process(consumer);
 			wait_for_completion(&read_start);
 		}
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

The commit b44754d8262d3aab8 ("ring_buffer: Allow to exit the ring
buffer benchmark immediately") added a hack into ring_buffer_producer()
that set @kill_test when kthread_should_stop() returned true. It improved
the situation a lot. It stopped the kthread in most cases because
the producer spent most of the time in the patched while cycle.

But there are still few possible races when kthread_should_stop()
is set outside of the cycle. Then we do not set @kill_test and
some other checks pass.

This patch adds a better fix. It renames @test_kill/TEST_KILL() into
a better descriptive @test_error/TEST_ERROR(). Also it introduces
break_test() function that checks for both @test_error and
kthread_should_stop(). Finally, the new function is used
on many locations when the check for @test_error is not enough.

Also it adds a missing check into ring_buffer_producer_thread()
between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
Otherwise, we might miss a wakeup from kthread_stop().

Finally, it adds the same check also into ring_buffer_consumer()
between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
Well, I added this one just for paranoid reasons. If we are here
the producer should have been destroyed before and it should have
set @reader_finish. But better be safe.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 65 ++++++++++++++++++++----------------
 1 file changed, 37 insertions(+), 28 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index ccb1a0b95f64..10e0ec9b797f 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -60,12 +60,12 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 
 static int read_events;
 
-static int kill_test;
+static int test_error;
 
-#define KILL_TEST()				\
+#define TEST_ERROR()				\
 	do {					\
-		if (!kill_test) {		\
-			kill_test = 1;		\
+		if (!test_error) {		\
+			test_error = 1;		\
 			WARN_ON(1);		\
 		}				\
 	} while (0)
@@ -75,6 +75,11 @@ enum event_status {
 	EVENT_DROPPED,
 };
 
+static bool break_test(void)
+{
+	return test_error || kthread_should_stop();
+}
+
 static enum event_status read_event(int cpu)
 {
 	struct ring_buffer_event *event;
@@ -87,7 +92,7 @@ static enum event_status read_event(int cpu)
 
 	entry = ring_buffer_event_data(event);
 	if (*entry != cpu) {
-		KILL_TEST();
+		TEST_ERROR();
 		return EVENT_DROPPED;
 	}
 
@@ -115,10 +120,13 @@ static enum event_status read_page(int cpu)
 		rpage = bpage;
 		/* The commit may have missed event flags set, clear them */
 		commit = local_read(&rpage->commit) & 0xfffff;
-		for (i = 0; i < commit && !kill_test; i += inc) {
+		for (i = 0; i < commit ; i += inc) {
+
+			if (break_test())
+				break;
 
 			if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 
@@ -128,7 +136,7 @@ static enum event_status read_page(int cpu)
 			case RINGBUF_TYPE_PADDING:
 				/* failed writes may be discarded events */
 				if (!event->time_delta)
-					KILL_TEST();
+					TEST_ERROR();
 				inc = event->array[0] + 4;
 				break;
 			case RINGBUF_TYPE_TIME_EXTEND:
@@ -137,12 +145,12 @@ static enum event_status read_page(int cpu)
 			case 0:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				if (!event->array[0]) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				inc = event->array[0] + 4;
@@ -150,17 +158,17 @@ static enum event_status read_page(int cpu)
 			default:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				inc = ((event->type_len + 1) * 4);
 			}
-			if (kill_test)
+			if (test_error)
 				break;
 
 			if (inc <= 0) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 		}
@@ -178,7 +186,7 @@ static void ring_buffer_consumer(void)
 	read_events ^= 1;
 
 	read = 0;
-	while (!reader_finish && !kill_test) {
+	while (!reader_finish && !break_test()) {
 		int found;
 
 		do {
@@ -193,17 +201,18 @@ static void ring_buffer_consumer(void)
 				else
 					stat = read_page(cpu);
 
-				if (kill_test)
+				if (break_test())
 					break;
 				if (stat == EVENT_FOUND)
 					found = 1;
 			}
-		} while (found && !kill_test);
+		} while (found && !break_test());
 
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (reader_finish)
+		if (reader_finish || break_test()) {
+			__set_current_state(TASK_RUNNING);
 			break;
-
+		}
 		schedule();
 	}
 	reader_finish = 0;
@@ -263,10 +272,7 @@ static void ring_buffer_producer(void)
 		if (cnt % wakeup_interval)
 			cond_resched();
 #endif
-		if (kthread_should_stop())
-			kill_test = 1;
-
-	} while (ktime_before(end_time, timeout) && !kill_test);
+	} while (ktime_before(end_time, timeout) && !break_test());
 	trace_printk("End ring buffer hammer\n");
 
 	if (consumer) {
@@ -282,7 +288,7 @@ static void ring_buffer_producer(void)
 	entries = ring_buffer_entries(buffer);
 	overruns = ring_buffer_overruns(buffer);
 
-	if (kill_test && !kthread_should_stop())
+	if (test_error)
 		trace_printk("ERROR!\n");
 
 	if (!disable_reader) {
@@ -363,15 +369,14 @@ static void wait_to_die(void)
 
 static int ring_buffer_consumer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		complete(&read_start);
 
 		ring_buffer_consumer();
 
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (kthread_should_stop() || kill_test)
+		if (break_test())
 			break;
-
 		schedule();
 	}
 	__set_current_state(TASK_RUNNING);
@@ -384,7 +389,7 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
@@ -393,11 +398,15 @@ static int ring_buffer_producer_thread(void *arg)
 		}
 
 		ring_buffer_producer();
-		if (kill_test)
+		if (break_test())
 			goto out_kill;
 
 		trace_printk("Sleeping for 10 secs\n");
 		set_current_state(TASK_INTERRUPTIBLE);
+		if (break_test()) {
+			__set_current_state(TASK_RUNNING);
+			goto out_kill;
+		}
 		schedule_timeout(HZ * SLEEP_TIME);
 	}
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

The commit b44754d8262d3aab8 ("ring_buffer: Allow to exit the ring
buffer benchmark immediately") added a hack into ring_buffer_producer()
that set @kill_test when kthread_should_stop() returned true. It improved
the situation a lot. It stopped the kthread in most cases because
the producer spent most of the time in the patched while cycle.

But there are still few possible races when kthread_should_stop()
is set outside of the cycle. Then we do not set @kill_test and
some other checks pass.

This patch adds a better fix. It renames @test_kill/TEST_KILL() into
a better descriptive @test_error/TEST_ERROR(). Also it introduces
break_test() function that checks for both @test_error and
kthread_should_stop(). Finally, the new function is used
on many locations when the check for @test_error is not enough.

Also it adds a missing check into ring_buffer_producer_thread()
between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
Otherwise, we might miss a wakeup from kthread_stop().

Finally, it adds the same check also into ring_buffer_consumer()
between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
Well, I added this one just for paranoid reasons. If we are here
the producer should have been destroyed before and it should have
set @reader_finish. But better be safe.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 65 ++++++++++++++++++++----------------
 1 file changed, 37 insertions(+), 28 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index ccb1a0b95f64..10e0ec9b797f 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -60,12 +60,12 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 
 static int read_events;
 
-static int kill_test;
+static int test_error;
 
-#define KILL_TEST()				\
+#define TEST_ERROR()				\
 	do {					\
-		if (!kill_test) {		\
-			kill_test = 1;		\
+		if (!test_error) {		\
+			test_error = 1;		\
 			WARN_ON(1);		\
 		}				\
 	} while (0)
@@ -75,6 +75,11 @@ enum event_status {
 	EVENT_DROPPED,
 };
 
+static bool break_test(void)
+{
+	return test_error || kthread_should_stop();
+}
+
 static enum event_status read_event(int cpu)
 {
 	struct ring_buffer_event *event;
@@ -87,7 +92,7 @@ static enum event_status read_event(int cpu)
 
 	entry = ring_buffer_event_data(event);
 	if (*entry != cpu) {
-		KILL_TEST();
+		TEST_ERROR();
 		return EVENT_DROPPED;
 	}
 
@@ -115,10 +120,13 @@ static enum event_status read_page(int cpu)
 		rpage = bpage;
 		/* The commit may have missed event flags set, clear them */
 		commit = local_read(&rpage->commit) & 0xfffff;
-		for (i = 0; i < commit && !kill_test; i += inc) {
+		for (i = 0; i < commit ; i += inc) {
+
+			if (break_test())
+				break;
 
 			if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 
@@ -128,7 +136,7 @@ static enum event_status read_page(int cpu)
 			case RINGBUF_TYPE_PADDING:
 				/* failed writes may be discarded events */
 				if (!event->time_delta)
-					KILL_TEST();
+					TEST_ERROR();
 				inc = event->array[0] + 4;
 				break;
 			case RINGBUF_TYPE_TIME_EXTEND:
@@ -137,12 +145,12 @@ static enum event_status read_page(int cpu)
 			case 0:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				if (!event->array[0]) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				inc = event->array[0] + 4;
@@ -150,17 +158,17 @@ static enum event_status read_page(int cpu)
 			default:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				inc = ((event->type_len + 1) * 4);
 			}
-			if (kill_test)
+			if (test_error)
 				break;
 
 			if (inc <= 0) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 		}
@@ -178,7 +186,7 @@ static void ring_buffer_consumer(void)
 	read_events ^= 1;
 
 	read = 0;
-	while (!reader_finish && !kill_test) {
+	while (!reader_finish && !break_test()) {
 		int found;
 
 		do {
@@ -193,17 +201,18 @@ static void ring_buffer_consumer(void)
 				else
 					stat = read_page(cpu);
 
-				if (kill_test)
+				if (break_test())
 					break;
 				if (stat == EVENT_FOUND)
 					found = 1;
 			}
-		} while (found && !kill_test);
+		} while (found && !break_test());
 
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (reader_finish)
+		if (reader_finish || break_test()) {
+			__set_current_state(TASK_RUNNING);
 			break;
-
+		}
 		schedule();
 	}
 	reader_finish = 0;
@@ -263,10 +272,7 @@ static void ring_buffer_producer(void)
 		if (cnt % wakeup_interval)
 			cond_resched();
 #endif
-		if (kthread_should_stop())
-			kill_test = 1;
-
-	} while (ktime_before(end_time, timeout) && !kill_test);
+	} while (ktime_before(end_time, timeout) && !break_test());
 	trace_printk("End ring buffer hammer\n");
 
 	if (consumer) {
@@ -282,7 +288,7 @@ static void ring_buffer_producer(void)
 	entries = ring_buffer_entries(buffer);
 	overruns = ring_buffer_overruns(buffer);
 
-	if (kill_test && !kthread_should_stop())
+	if (test_error)
 		trace_printk("ERROR!\n");
 
 	if (!disable_reader) {
@@ -363,15 +369,14 @@ static void wait_to_die(void)
 
 static int ring_buffer_consumer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		complete(&read_start);
 
 		ring_buffer_consumer();
 
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (kthread_should_stop() || kill_test)
+		if (break_test())
 			break;
-
 		schedule();
 	}
 	__set_current_state(TASK_RUNNING);
@@ -384,7 +389,7 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
@@ -393,11 +398,15 @@ static int ring_buffer_producer_thread(void *arg)
 		}
 
 		ring_buffer_producer();
-		if (kill_test)
+		if (break_test())
 			goto out_kill;
 
 		trace_printk("Sleeping for 10 secs\n");
 		set_current_state(TASK_INTERRUPTIBLE);
+		if (break_test()) {
+			__set_current_state(TASK_RUNNING);
+			goto out_kill;
+		}
 		schedule_timeout(HZ * SLEEP_TIME);
 	}
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 11/14] ring_buffer: Use kthread worker API for the producer kthread in the benchmark
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the ring buffer benchmark producer into a kthread
worker because it modifies the scheduling priority and policy.
Also, it is a benchmark. It makes CPU very busy. It will most likely
run only limited time. IMHO, it does not make sense to mess the system
workqueues with it.

The thread is split into two independent works. It might look more
complicated but it helped me to find a race in the sleeping part
that was fixed separately.

kthread_should_stop() could not longer be used inside the works
because it defines the life of the worker and it needs to stay
usable until all works are done. Instead, we add @test_end
global variable. It is set during normal termination in compare
with @test_error.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 81 ++++++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 32 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 10e0ec9b797f..86514babe07f 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -26,9 +26,14 @@ static int wakeup_interval = 100;
 static int reader_finish;
 static DECLARE_COMPLETION(read_start);
 static DECLARE_COMPLETION(read_done);
-
 static struct ring_buffer *buffer;
-static struct task_struct *producer;
+
+static void rb_producer_hammer_func(struct kthread_work *dummy);
+static void rb_producer_sleep_func(struct kthread_work *dummy);
+static DEFINE_KTHREAD_WORKER(rb_producer_worker);
+static DEFINE_KTHREAD_WORK(rb_producer_hammer_work, rb_producer_hammer_func);
+static DEFINE_KTHREAD_WORK(rb_producer_sleep_work, rb_producer_sleep_func);
+
 static struct task_struct *consumer;
 static unsigned long read;
 
@@ -61,6 +66,7 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 static int read_events;
 
 static int test_error;
+static int test_end;
 
 #define TEST_ERROR()				\
 	do {					\
@@ -77,7 +83,11 @@ enum event_status {
 
 static bool break_test(void)
 {
-	return test_error || kthread_should_stop();
+	/*
+	 * FIXME: The test for kthread_should_stop() will get obsoleted
+	 * once the consumer is too converted into the kthread worker API.
+	 */
+	return test_error || test_end || kthread_should_stop();
 }
 
 static enum event_status read_event(int cpu)
@@ -387,34 +397,40 @@ static int ring_buffer_consumer_thread(void *arg)
 	return 0;
 }
 
-static int ring_buffer_producer_thread(void *arg)
+static void rb_producer_hammer_func(struct kthread_work *dummy)
 {
-	while (!break_test()) {
-		ring_buffer_reset(buffer);
+	if (break_test())
+		return;
 
-		if (consumer) {
-			wake_up_process(consumer);
-			wait_for_completion(&read_start);
-		}
+	ring_buffer_reset(buffer);
 
-		ring_buffer_producer();
-		if (break_test())
-			goto out_kill;
+	if (consumer) {
+		wake_up_process(consumer);
+		wait_for_completion(&read_start);
+	}
 
-		trace_printk("Sleeping for 10 secs\n");
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (break_test()) {
-			__set_current_state(TASK_RUNNING);
-			goto out_kill;
-		}
-		schedule_timeout(HZ * SLEEP_TIME);
+	ring_buffer_producer();
+
+	if (break_test())
+		return;
+
+	queue_kthread_work(&rb_producer_worker, &rb_producer_sleep_work);
+}
+
+static void rb_producer_sleep_func(struct kthread_work *dummy)
+{
+	trace_printk("Sleeping for 10 secs\n");
+	set_current_state(TASK_INTERRUPTIBLE);
+	if (break_test()) {
+		set_current_state(TASK_RUNNING);
+		return;
 	}
+	schedule_timeout(HZ * SLEEP_TIME);
 
-out_kill:
-	if (!kthread_should_stop())
-		wait_to_die();
+	if (break_test())
+		return;
 
-	return 0;
+	queue_kthread_work(&rb_producer_worker, &rb_producer_hammer_work);
 }
 
 static int __init ring_buffer_benchmark_init(void)
@@ -434,13 +450,12 @@ static int __init ring_buffer_benchmark_init(void)
 			goto out_fail;
 	}
 
-	producer = kthread_run(ring_buffer_producer_thread,
-			       NULL, "rb_producer");
-	ret = PTR_ERR(producer);
-
-	if (IS_ERR(producer))
+	ret = create_kthread_worker(&rb_producer_worker, "rb_producer");
+	if (ret)
 		goto out_kill;
 
+	queue_kthread_work(&rb_producer_worker, &rb_producer_hammer_work);
+
 	/*
 	 * Run them as low-prio background tasks by default:
 	 */
@@ -458,9 +473,10 @@ static int __init ring_buffer_benchmark_init(void)
 		struct sched_param param = {
 			.sched_priority = producer_fifo
 		};
-		sched_setscheduler(producer, SCHED_FIFO, &param);
+		sched_setscheduler(rb_producer_worker.task,
+				   SCHED_FIFO, &param);
 	} else
-		set_user_nice(producer, producer_nice);
+		set_user_nice(rb_producer_worker.task, producer_nice);
 
 	return 0;
 
@@ -475,7 +491,8 @@ static int __init ring_buffer_benchmark_init(void)
 
 static void __exit ring_buffer_benchmark_exit(void)
 {
-	kthread_stop(producer);
+	test_end = 1;
+	wakeup_and_destroy_kthread_worker(&rb_producer_worker);
 	if (consumer)
 		kthread_stop(consumer);
 	ring_buffer_free(buffer);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 11/14] ring_buffer: Use kthread worker API for the producer kthread in the benchmark
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the ring buffer benchmark producer into a kthread
worker because it modifies the scheduling priority and policy.
Also, it is a benchmark. It makes CPU very busy. It will most likely
run only limited time. IMHO, it does not make sense to mess the system
workqueues with it.

The thread is split into two independent works. It might look more
complicated but it helped me to find a race in the sleeping part
that was fixed separately.

kthread_should_stop() could not longer be used inside the works
because it defines the life of the worker and it needs to stay
usable until all works are done. Instead, we add @test_end
global variable. It is set during normal termination in compare
with @test_error.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 81 ++++++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 32 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 10e0ec9b797f..86514babe07f 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -26,9 +26,14 @@ static int wakeup_interval = 100;
 static int reader_finish;
 static DECLARE_COMPLETION(read_start);
 static DECLARE_COMPLETION(read_done);
-
 static struct ring_buffer *buffer;
-static struct task_struct *producer;
+
+static void rb_producer_hammer_func(struct kthread_work *dummy);
+static void rb_producer_sleep_func(struct kthread_work *dummy);
+static DEFINE_KTHREAD_WORKER(rb_producer_worker);
+static DEFINE_KTHREAD_WORK(rb_producer_hammer_work, rb_producer_hammer_func);
+static DEFINE_KTHREAD_WORK(rb_producer_sleep_work, rb_producer_sleep_func);
+
 static struct task_struct *consumer;
 static unsigned long read;
 
@@ -61,6 +66,7 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 static int read_events;
 
 static int test_error;
+static int test_end;
 
 #define TEST_ERROR()				\
 	do {					\
@@ -77,7 +83,11 @@ enum event_status {
 
 static bool break_test(void)
 {
-	return test_error || kthread_should_stop();
+	/*
+	 * FIXME: The test for kthread_should_stop() will get obsoleted
+	 * once the consumer is too converted into the kthread worker API.
+	 */
+	return test_error || test_end || kthread_should_stop();
 }
 
 static enum event_status read_event(int cpu)
@@ -387,34 +397,40 @@ static int ring_buffer_consumer_thread(void *arg)
 	return 0;
 }
 
-static int ring_buffer_producer_thread(void *arg)
+static void rb_producer_hammer_func(struct kthread_work *dummy)
 {
-	while (!break_test()) {
-		ring_buffer_reset(buffer);
+	if (break_test())
+		return;
 
-		if (consumer) {
-			wake_up_process(consumer);
-			wait_for_completion(&read_start);
-		}
+	ring_buffer_reset(buffer);
 
-		ring_buffer_producer();
-		if (break_test())
-			goto out_kill;
+	if (consumer) {
+		wake_up_process(consumer);
+		wait_for_completion(&read_start);
+	}
 
-		trace_printk("Sleeping for 10 secs\n");
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (break_test()) {
-			__set_current_state(TASK_RUNNING);
-			goto out_kill;
-		}
-		schedule_timeout(HZ * SLEEP_TIME);
+	ring_buffer_producer();
+
+	if (break_test())
+		return;
+
+	queue_kthread_work(&rb_producer_worker, &rb_producer_sleep_work);
+}
+
+static void rb_producer_sleep_func(struct kthread_work *dummy)
+{
+	trace_printk("Sleeping for 10 secs\n");
+	set_current_state(TASK_INTERRUPTIBLE);
+	if (break_test()) {
+		set_current_state(TASK_RUNNING);
+		return;
 	}
+	schedule_timeout(HZ * SLEEP_TIME);
 
-out_kill:
-	if (!kthread_should_stop())
-		wait_to_die();
+	if (break_test())
+		return;
 
-	return 0;
+	queue_kthread_work(&rb_producer_worker, &rb_producer_hammer_work);
 }
 
 static int __init ring_buffer_benchmark_init(void)
@@ -434,13 +450,12 @@ static int __init ring_buffer_benchmark_init(void)
 			goto out_fail;
 	}
 
-	producer = kthread_run(ring_buffer_producer_thread,
-			       NULL, "rb_producer");
-	ret = PTR_ERR(producer);
-
-	if (IS_ERR(producer))
+	ret = create_kthread_worker(&rb_producer_worker, "rb_producer");
+	if (ret)
 		goto out_kill;
 
+	queue_kthread_work(&rb_producer_worker, &rb_producer_hammer_work);
+
 	/*
 	 * Run them as low-prio background tasks by default:
 	 */
@@ -458,9 +473,10 @@ static int __init ring_buffer_benchmark_init(void)
 		struct sched_param param = {
 			.sched_priority = producer_fifo
 		};
-		sched_setscheduler(producer, SCHED_FIFO, &param);
+		sched_setscheduler(rb_producer_worker.task,
+				   SCHED_FIFO, &param);
 	} else
-		set_user_nice(producer, producer_nice);
+		set_user_nice(rb_producer_worker.task, producer_nice);
 
 	return 0;
 
@@ -475,7 +491,8 @@ static int __init ring_buffer_benchmark_init(void)
 
 static void __exit ring_buffer_benchmark_exit(void)
 {
-	kthread_stop(producer);
+	test_end = 1;
+	wakeup_and_destroy_kthread_worker(&rb_producer_worker);
 	if (consumer)
 		kthread_stop(consumer);
 	ring_buffer_free(buffer);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 12/14] kthread_worker: Better support freezable kthread workers
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h              | 13 +++++++++----
 kernel/kthread.c                     |  8 +++++++-
 kernel/rcu/tree.c                    |  3 ++-
 kernel/trace/ring_buffer_benchmark.c |  2 +-
 mm/huge_memory.c                     |  2 +-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 02d3cc9ad923..d916b024e986 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -63,7 +63,12 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
 struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
 
+enum {
+	KTW_FREEZABLE		= 1 << 2,	/* freeze during suspend */
+};
+
 struct kthread_worker {
+	unsigned int		flags;
 	spinlock_t		lock;
 	struct list_head	work_list;
 	struct task_struct	*task;
@@ -129,13 +134,13 @@ static inline bool kthread_worker_created(struct kthread_worker *worker)
 
 int kthread_worker_fn(void *worker_ptr);
 
-__printf(3, 4)
+__printf(4, 5)
 int create_kthread_worker_on_node(struct kthread_worker *worker,
-				  int node,
+				  unsigned int flags, int node,
 				  const char namefmt[], ...);
 
-#define create_kthread_worker(worker, namefmt, arg...)			\
-	create_kthread_worker_on_node(worker, -1, namefmt, ##arg)
+#define create_kthread_worker(worker, flags, namefmt, arg...)		\
+	create_kthread_worker_on_node(worker, flags, -1, namefmt, ##arg)
 
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 053c9dfa58ac..d02509e17f7e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -535,6 +535,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
 				const char *name,
 				struct lock_class_key *key)
 {
+	worker->flags = 0;
 	spin_lock_init(&worker->lock);
 	lockdep_set_class_and_name(&worker->lock, key, name);
 	INIT_LIST_HEAD(&worker->work_list);
@@ -569,6 +570,10 @@ int kthread_worker_fn(void *worker_ptr)
 	 */
 	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
+
+	if (worker->flags & KTW_FREEZABLE)
+		set_freezable();
+
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
 
@@ -611,7 +616,7 @@ EXPORT_SYMBOL_GPL(kthread_worker_fn);
  * in @node, to get NUMA affinity for kthread stack, or else give -1.
  */
 int create_kthread_worker_on_node(struct kthread_worker *worker,
-				  int node,
+				  unsigned int flags, int node,
 				  const char namefmt[], ...)
 {
 	struct task_struct *task;
@@ -633,6 +638,7 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 	set_bit(KTHREAD_IS_WORKER, &kthread->flags);
 
 	spin_lock_irq(&worker->lock);
+	worker->flags = flags;
 	worker->task = task;
 	spin_unlock_irq(&worker->lock);
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 475bd59509ed..3a286f3b8b3c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3935,7 +3935,8 @@ static int __init rcu_spawn_gp_kthread(void)
 		init_kthread_worker(&rsp->gp_worker);
 		init_kthread_work(&rsp->gp_init_work, rcu_gp_kthread_init_func);
 		init_kthread_work(&rsp->gp_work, rcu_gp_kthread_func);
-		ret = create_kthread_worker(&rsp->gp_worker, "%s", rsp->name);
+		ret = create_kthread_worker(&rsp->gp_worker, 0,
+					    "%s", rsp->name);
 		BUG_ON(ret);
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 86514babe07f..5036d284885c 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -450,7 +450,7 @@ static int __init ring_buffer_benchmark_init(void)
 			goto out_fail;
 	}
 
-	ret = create_kthread_worker(&rb_producer_worker, "rb_producer");
+	ret = create_kthread_worker(&rb_producer_worker, 0, "rb_producer");
 	if (ret)
 		goto out_kill;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 55733735a487..51a514161f2b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -159,6 +159,7 @@ static int start_stop_khugepaged(void)
 			goto out;
 
 		err = create_kthread_worker(&khugepaged_worker,
+					    KTW_FREEZABLE,
 					    "khugepaged");
 
 		if (unlikely(err)) {
@@ -2804,7 +2805,6 @@ static int khugepaged_wait_event(void)
 
 static void khugepaged_init_func(struct kthread_work *dummy)
 {
-	set_freezable();
 	set_user_nice(current, MAX_NICE);
 }
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 12/14] kthread_worker: Better support freezable kthread workers
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h              | 13 +++++++++----
 kernel/kthread.c                     |  8 +++++++-
 kernel/rcu/tree.c                    |  3 ++-
 kernel/trace/ring_buffer_benchmark.c |  2 +-
 mm/huge_memory.c                     |  2 +-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 02d3cc9ad923..d916b024e986 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -63,7 +63,12 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
 struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
 
+enum {
+	KTW_FREEZABLE		= 1 << 2,	/* freeze during suspend */
+};
+
 struct kthread_worker {
+	unsigned int		flags;
 	spinlock_t		lock;
 	struct list_head	work_list;
 	struct task_struct	*task;
@@ -129,13 +134,13 @@ static inline bool kthread_worker_created(struct kthread_worker *worker)
 
 int kthread_worker_fn(void *worker_ptr);
 
-__printf(3, 4)
+__printf(4, 5)
 int create_kthread_worker_on_node(struct kthread_worker *worker,
-				  int node,
+				  unsigned int flags, int node,
 				  const char namefmt[], ...);
 
-#define create_kthread_worker(worker, namefmt, arg...)			\
-	create_kthread_worker_on_node(worker, -1, namefmt, ##arg)
+#define create_kthread_worker(worker, flags, namefmt, arg...)		\
+	create_kthread_worker_on_node(worker, flags, -1, namefmt, ##arg)
 
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 053c9dfa58ac..d02509e17f7e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -535,6 +535,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
 				const char *name,
 				struct lock_class_key *key)
 {
+	worker->flags = 0;
 	spin_lock_init(&worker->lock);
 	lockdep_set_class_and_name(&worker->lock, key, name);
 	INIT_LIST_HEAD(&worker->work_list);
@@ -569,6 +570,10 @@ int kthread_worker_fn(void *worker_ptr)
 	 */
 	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
+
+	if (worker->flags & KTW_FREEZABLE)
+		set_freezable();
+
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
 
@@ -611,7 +616,7 @@ EXPORT_SYMBOL_GPL(kthread_worker_fn);
  * in @node, to get NUMA affinity for kthread stack, or else give -1.
  */
 int create_kthread_worker_on_node(struct kthread_worker *worker,
-				  int node,
+				  unsigned int flags, int node,
 				  const char namefmt[], ...)
 {
 	struct task_struct *task;
@@ -633,6 +638,7 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 	set_bit(KTHREAD_IS_WORKER, &kthread->flags);
 
 	spin_lock_irq(&worker->lock);
+	worker->flags = flags;
 	worker->task = task;
 	spin_unlock_irq(&worker->lock);
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 475bd59509ed..3a286f3b8b3c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3935,7 +3935,8 @@ static int __init rcu_spawn_gp_kthread(void)
 		init_kthread_worker(&rsp->gp_worker);
 		init_kthread_work(&rsp->gp_init_work, rcu_gp_kthread_init_func);
 		init_kthread_work(&rsp->gp_work, rcu_gp_kthread_func);
-		ret = create_kthread_worker(&rsp->gp_worker, "%s", rsp->name);
+		ret = create_kthread_worker(&rsp->gp_worker, 0,
+					    "%s", rsp->name);
 		BUG_ON(ret);
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 86514babe07f..5036d284885c 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -450,7 +450,7 @@ static int __init ring_buffer_benchmark_init(void)
 			goto out_fail;
 	}
 
-	ret = create_kthread_worker(&rb_producer_worker, "rb_producer");
+	ret = create_kthread_worker(&rb_producer_worker, 0, "rb_producer");
 	if (ret)
 		goto out_kill;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 55733735a487..51a514161f2b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -159,6 +159,7 @@ static int start_stop_khugepaged(void)
 			goto out;
 
 		err = create_kthread_worker(&khugepaged_worker,
+					    KTW_FREEZABLE,
 					    "khugepaged");
 
 		if (unlikely(err)) {
@@ -2804,7 +2805,6 @@ static int khugepaged_wait_event(void)
 
 static void khugepaged_init_func(struct kthread_work *dummy)
 {
-	set_freezable();
 	set_user_nice(current, MAX_NICE);
 }
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
  2015-07-28 14:39 ` Petr Mladek
@ 2015-07-28 14:39   ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

kthread worker API will be used for kthreads that need to modify
the scheduling priority.

This patch adds a function that allows to make it easily, safe way,
and hides implementation details. It might even help to get rid
of an init work.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h              |  2 ++
 kernel/kthread.c                     | 14 ++++++++++++++
 kernel/trace/ring_buffer_benchmark.c |  3 ++-
 mm/huge_memory.c                     | 10 +---------
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index d916b024e986..b75847e1a4c9 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -142,6 +142,8 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 #define create_kthread_worker(worker, flags, namefmt, arg...)		\
 	create_kthread_worker_on_node(worker, flags, -1, namefmt, ##arg)
 
+void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice);
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index d02509e17f7e..ab2e235b6144 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -648,6 +648,20 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL(create_kthread_worker_on_node);
 
+/*
+ * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
+ * @worker: target kthread_worker
+ * @nice: niceness value
+ */
+void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
+{
+	struct task_struct *task = worker->task;
+
+	WARN_ON(!task);
+	set_user_nice(task, nice);
+}
+EXPORT_SYMBOL(set_kthread_worker_user_nice);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 5036d284885c..73e4c7f11a2c 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -476,7 +476,8 @@ static int __init ring_buffer_benchmark_init(void)
 		sched_setscheduler(rb_producer_worker.task,
 				   SCHED_FIFO, &param);
 	} else
-		set_user_nice(rb_producer_worker.task, producer_nice);
+		set_kthread_worker_user_nice(&rb_producer_worker,
+					     producer_nice);
 
 	return 0;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 51a514161f2b..1d5f990c55ab 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -55,12 +55,10 @@ static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
 
-static void khugepaged_init_func(struct kthread_work *dummy);
 static void khugepaged_do_scan_func(struct kthread_work *dummy);
 static void khugepaged_wait_func(struct kthread_work *dummy);
 static void khugepaged_cleanup_func(struct kthread_work *dummy);
 static DEFINE_KTHREAD_WORKER(khugepaged_worker);
-static DEFINE_KTHREAD_WORK(khugepaged_init_work, khugepaged_init_func);
 static DEFINE_KTHREAD_WORK(khugepaged_do_scan_work, khugepaged_do_scan_func);
 static DEFINE_KTHREAD_WORK(khugepaged_wait_work, khugepaged_wait_func);
 static DEFINE_KTHREAD_WORK(khugepaged_cleanup_work, khugepaged_cleanup_func);
@@ -167,8 +165,7 @@ static int start_stop_khugepaged(void)
 			goto out;
 		}
 
-		queue_kthread_work(&khugepaged_worker,
-				   &khugepaged_init_work);
+		set_kthread_worker_user_nice(&khugepaged_worker, MAX_NICE);
 
 		if (list_empty(&khugepaged_scan.mm_head))
 			queue_kthread_work(&khugepaged_worker,
@@ -2803,11 +2800,6 @@ static int khugepaged_wait_event(void)
 		!khugepaged_enabled());
 }
 
-static void khugepaged_init_func(struct kthread_work *dummy)
-{
-	set_user_nice(current, MAX_NICE);
-}
-
 static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

kthread worker API will be used for kthreads that need to modify
the scheduling priority.

This patch adds a function that allows to make it easily, safe way,
and hides implementation details. It might even help to get rid
of an init work.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h              |  2 ++
 kernel/kthread.c                     | 14 ++++++++++++++
 kernel/trace/ring_buffer_benchmark.c |  3 ++-
 mm/huge_memory.c                     | 10 +---------
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index d916b024e986..b75847e1a4c9 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -142,6 +142,8 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 #define create_kthread_worker(worker, flags, namefmt, arg...)		\
 	create_kthread_worker_on_node(worker, flags, -1, namefmt, ##arg)
 
+void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice);
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index d02509e17f7e..ab2e235b6144 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -648,6 +648,20 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL(create_kthread_worker_on_node);
 
+/*
+ * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
+ * @worker: target kthread_worker
+ * @nice: niceness value
+ */
+void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
+{
+	struct task_struct *task = worker->task;
+
+	WARN_ON(!task);
+	set_user_nice(task, nice);
+}
+EXPORT_SYMBOL(set_kthread_worker_user_nice);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 5036d284885c..73e4c7f11a2c 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -476,7 +476,8 @@ static int __init ring_buffer_benchmark_init(void)
 		sched_setscheduler(rb_producer_worker.task,
 				   SCHED_FIFO, &param);
 	} else
-		set_user_nice(rb_producer_worker.task, producer_nice);
+		set_kthread_worker_user_nice(&rb_producer_worker,
+					     producer_nice);
 
 	return 0;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 51a514161f2b..1d5f990c55ab 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -55,12 +55,10 @@ static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
 
-static void khugepaged_init_func(struct kthread_work *dummy);
 static void khugepaged_do_scan_func(struct kthread_work *dummy);
 static void khugepaged_wait_func(struct kthread_work *dummy);
 static void khugepaged_cleanup_func(struct kthread_work *dummy);
 static DEFINE_KTHREAD_WORKER(khugepaged_worker);
-static DEFINE_KTHREAD_WORK(khugepaged_init_work, khugepaged_init_func);
 static DEFINE_KTHREAD_WORK(khugepaged_do_scan_work, khugepaged_do_scan_func);
 static DEFINE_KTHREAD_WORK(khugepaged_wait_work, khugepaged_wait_func);
 static DEFINE_KTHREAD_WORK(khugepaged_cleanup_work, khugepaged_cleanup_func);
@@ -167,8 +165,7 @@ static int start_stop_khugepaged(void)
 			goto out;
 		}
 
-		queue_kthread_work(&khugepaged_worker,
-				   &khugepaged_init_work);
+		set_kthread_worker_user_nice(&khugepaged_worker, MAX_NICE);
 
 		if (list_empty(&khugepaged_scan.mm_head))
 			queue_kthread_work(&khugepaged_worker,
@@ -2803,11 +2800,6 @@ static int khugepaged_wait_event(void)
 		!khugepaged_enabled());
 }
 
-static void khugepaged_init_func(struct kthread_work *dummy)
-{
-	set_user_nice(current, MAX_NICE);
-}
-
 static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

The kthread worker API will be used for kthreads that need to modify
the scheduling policy.

This patch adds a function that allows to make it easily, safe way,
and hides implementation details. It might even help to get rid
of an init work.

It uses @sched_priority as a parameter instead of struct sched_param.
The structure has been there already in the initial kernel git commit
(April 2005) and always included only one member: sched_priority.
So, it rather looks like an overkill that is better to hide.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h              |  5 +++
 kernel/kthread.c                     | 59 ++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree.c                    | 10 +++---
 kernel/trace/ring_buffer_benchmark.c | 11 +++----
 4 files changed, 72 insertions(+), 13 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b75847e1a4c9..d503dc16613c 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -144,6 +144,11 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 
 void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice);
 
+int set_kthread_worker_scheduler(struct kthread_worker *worker,
+				 int policy, int sched_priority);
+int set_kthread_worker_scheduler_nocheck(struct kthread_worker *worker,
+					 int policy, int sched_priority);
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ab2e235b6144..4ab31b914676 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -662,6 +662,65 @@ void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
 }
 EXPORT_SYMBOL(set_kthread_worker_user_nice);
 
+static int
+__set_kthread_worker_scheduler(struct kthread_worker *worker,
+			       int policy, int sched_priority, bool check)
+{
+	struct task_struct *task = worker->task;
+	const struct sched_param sp = {
+		.sched_priority = sched_priority
+	};
+	int ret;
+
+	WARN_ON(!task);
+
+	if (check)
+		ret = sched_setscheduler(task, policy, &sp);
+	else
+		ret = sched_setscheduler_nocheck(task, policy, &sp);
+
+	return ret;
+}
+
+/**
+ * set_kthread_worker_scheduler - change the scheduling policy and/or RT
+ *	priority of a kthread worker.
+ * @worker: target kthread_worker
+ * @policy: new policy
+ * @sched_priority: new RT priority
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+int set_kthread_worker_scheduler(struct kthread_worker *worker,
+				 int policy, int sched_priority)
+{
+	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
+					      true);
+}
+EXPORT_SYMBOL(set_kthread_worker_scheduler);
+
+/**
+ * set_kthread_worker_scheduler_nocheck - change the scheduling policy and/or RT
+ *	priority of a kthread worker.
+ * @worker: target kthread_worker
+ * @policy: new policy
+ * @sched_priority: new RT priority
+ *
+ * Just like set_kthread_worker_sheduler(), only don't bother checking
+ * if the current context has permission. For example, this is needed
+ * in stop_machine(): we create temporary high priority worker threads,
+ * but our caller might not have that capability.
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+int set_kthread_worker_scheduler_nocheck(struct kthread_worker *worker,
+					 int policy, int sched_priority)
+{
+	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
+					      false);
+}
+EXPORT_SYMBOL(set_kthread_worker_scheduler_nocheck);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3a286f3b8b3c..d882464c71d7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3916,7 +3916,6 @@ static int __init rcu_spawn_gp_kthread(void)
 	int kthread_prio_in = kthread_prio;
 	struct rcu_node *rnp;
 	struct rcu_state *rsp;
-	struct sched_param sp;
 	int ret;
 
 	/* Force priority into range. */
@@ -3940,11 +3939,10 @@ static int __init rcu_spawn_gp_kthread(void)
 		BUG_ON(ret);
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
-		if (kthread_prio) {
-			sp.sched_priority = kthread_prio;
-			sched_setscheduler_nocheck(rsp->gp_worker.task,
-						   SCHED_FIFO, &sp);
-		}
+		if (kthread_prio)
+			set_kthread_worker_scheduler_nocheck(&rsp->gp_worker,
+							     SCHED_FIFO,
+							     kthread_prio);
 		queue_kthread_work(&rsp->gp_worker, &rsp->gp_init_work);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 73e4c7f11a2c..89028165bb22 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -469,13 +469,10 @@ static int __init ring_buffer_benchmark_init(void)
 			set_user_nice(consumer, consumer_nice);
 	}
 
-	if (producer_fifo >= 0) {
-		struct sched_param param = {
-			.sched_priority = producer_fifo
-		};
-		sched_setscheduler(rb_producer_worker.task,
-				   SCHED_FIFO, &param);
-	} else
+	if (producer_fifo >= 0)
+		set_kthread_worker_scheduler(&rb_producer_worker,
+					     SCHED_FIFO, producer_fifo);
+	else
 		set_kthread_worker_user_nice(&rb_producer_worker,
 					     producer_nice);
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Petr Mladek

The kthread worker API will be used for kthreads that need to modify
the scheduling policy.

This patch adds a function that allows to make it easily, safe way,
and hides implementation details. It might even help to get rid
of an init work.

It uses @sched_priority as a parameter instead of struct sched_param.
The structure has been there already in the initial kernel git commit
(April 2005) and always included only one member: sched_priority.
So, it rather looks like an overkill that is better to hide.

Signed-off-by: Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
---
 include/linux/kthread.h              |  5 +++
 kernel/kthread.c                     | 59 ++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree.c                    | 10 +++---
 kernel/trace/ring_buffer_benchmark.c | 11 +++----
 4 files changed, 72 insertions(+), 13 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b75847e1a4c9..d503dc16613c 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -144,6 +144,11 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 
 void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice);
 
+int set_kthread_worker_scheduler(struct kthread_worker *worker,
+				 int policy, int sched_priority);
+int set_kthread_worker_scheduler_nocheck(struct kthread_worker *worker,
+					 int policy, int sched_priority);
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ab2e235b6144..4ab31b914676 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -662,6 +662,65 @@ void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
 }
 EXPORT_SYMBOL(set_kthread_worker_user_nice);
 
+static int
+__set_kthread_worker_scheduler(struct kthread_worker *worker,
+			       int policy, int sched_priority, bool check)
+{
+	struct task_struct *task = worker->task;
+	const struct sched_param sp = {
+		.sched_priority = sched_priority
+	};
+	int ret;
+
+	WARN_ON(!task);
+
+	if (check)
+		ret = sched_setscheduler(task, policy, &sp);
+	else
+		ret = sched_setscheduler_nocheck(task, policy, &sp);
+
+	return ret;
+}
+
+/**
+ * set_kthread_worker_scheduler - change the scheduling policy and/or RT
+ *	priority of a kthread worker.
+ * @worker: target kthread_worker
+ * @policy: new policy
+ * @sched_priority: new RT priority
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+int set_kthread_worker_scheduler(struct kthread_worker *worker,
+				 int policy, int sched_priority)
+{
+	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
+					      true);
+}
+EXPORT_SYMBOL(set_kthread_worker_scheduler);
+
+/**
+ * set_kthread_worker_scheduler_nocheck - change the scheduling policy and/or RT
+ *	priority of a kthread worker.
+ * @worker: target kthread_worker
+ * @policy: new policy
+ * @sched_priority: new RT priority
+ *
+ * Just like set_kthread_worker_sheduler(), only don't bother checking
+ * if the current context has permission. For example, this is needed
+ * in stop_machine(): we create temporary high priority worker threads,
+ * but our caller might not have that capability.
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+int set_kthread_worker_scheduler_nocheck(struct kthread_worker *worker,
+					 int policy, int sched_priority)
+{
+	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
+					      false);
+}
+EXPORT_SYMBOL(set_kthread_worker_scheduler_nocheck);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3a286f3b8b3c..d882464c71d7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3916,7 +3916,6 @@ static int __init rcu_spawn_gp_kthread(void)
 	int kthread_prio_in = kthread_prio;
 	struct rcu_node *rnp;
 	struct rcu_state *rsp;
-	struct sched_param sp;
 	int ret;
 
 	/* Force priority into range. */
@@ -3940,11 +3939,10 @@ static int __init rcu_spawn_gp_kthread(void)
 		BUG_ON(ret);
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
-		if (kthread_prio) {
-			sp.sched_priority = kthread_prio;
-			sched_setscheduler_nocheck(rsp->gp_worker.task,
-						   SCHED_FIFO, &sp);
-		}
+		if (kthread_prio)
+			set_kthread_worker_scheduler_nocheck(&rsp->gp_worker,
+							     SCHED_FIFO,
+							     kthread_prio);
 		queue_kthread_work(&rsp->gp_worker, &rsp->gp_init_work);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 73e4c7f11a2c..89028165bb22 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -469,13 +469,10 @@ static int __init ring_buffer_benchmark_init(void)
 			set_user_nice(consumer, consumer_nice);
 	}
 
-	if (producer_fifo >= 0) {
-		struct sched_param param = {
-			.sched_priority = producer_fifo
-		};
-		sched_setscheduler(rb_producer_worker.task,
-				   SCHED_FIFO, &param);
-	} else
+	if (producer_fifo >= 0)
+		set_kthread_worker_scheduler(&rb_producer_worker,
+					     SCHED_FIFO, producer_fifo);
+	else
 		set_kthread_worker_user_nice(&rb_producer_worker,
 					     producer_nice);
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
@ 2015-07-28 14:39   ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-28 14:39 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel, Petr Mladek

The kthread worker API will be used for kthreads that need to modify
the scheduling policy.

This patch adds a function that allows to make it easily, safe way,
and hides implementation details. It might even help to get rid
of an init work.

It uses @sched_priority as a parameter instead of struct sched_param.
The structure has been there already in the initial kernel git commit
(April 2005) and always included only one member: sched_priority.
So, it rather looks like an overkill that is better to hide.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h              |  5 +++
 kernel/kthread.c                     | 59 ++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree.c                    | 10 +++---
 kernel/trace/ring_buffer_benchmark.c | 11 +++----
 4 files changed, 72 insertions(+), 13 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b75847e1a4c9..d503dc16613c 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -144,6 +144,11 @@ int create_kthread_worker_on_node(struct kthread_worker *worker,
 
 void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice);
 
+int set_kthread_worker_scheduler(struct kthread_worker *worker,
+				 int policy, int sched_priority);
+int set_kthread_worker_scheduler_nocheck(struct kthread_worker *worker,
+					 int policy, int sched_priority);
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ab2e235b6144..4ab31b914676 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -662,6 +662,65 @@ void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
 }
 EXPORT_SYMBOL(set_kthread_worker_user_nice);
 
+static int
+__set_kthread_worker_scheduler(struct kthread_worker *worker,
+			       int policy, int sched_priority, bool check)
+{
+	struct task_struct *task = worker->task;
+	const struct sched_param sp = {
+		.sched_priority = sched_priority
+	};
+	int ret;
+
+	WARN_ON(!task);
+
+	if (check)
+		ret = sched_setscheduler(task, policy, &sp);
+	else
+		ret = sched_setscheduler_nocheck(task, policy, &sp);
+
+	return ret;
+}
+
+/**
+ * set_kthread_worker_scheduler - change the scheduling policy and/or RT
+ *	priority of a kthread worker.
+ * @worker: target kthread_worker
+ * @policy: new policy
+ * @sched_priority: new RT priority
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+int set_kthread_worker_scheduler(struct kthread_worker *worker,
+				 int policy, int sched_priority)
+{
+	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
+					      true);
+}
+EXPORT_SYMBOL(set_kthread_worker_scheduler);
+
+/**
+ * set_kthread_worker_scheduler_nocheck - change the scheduling policy and/or RT
+ *	priority of a kthread worker.
+ * @worker: target kthread_worker
+ * @policy: new policy
+ * @sched_priority: new RT priority
+ *
+ * Just like set_kthread_worker_sheduler(), only don't bother checking
+ * if the current context has permission. For example, this is needed
+ * in stop_machine(): we create temporary high priority worker threads,
+ * but our caller might not have that capability.
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+int set_kthread_worker_scheduler_nocheck(struct kthread_worker *worker,
+					 int policy, int sched_priority)
+{
+	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
+					      false);
+}
+EXPORT_SYMBOL(set_kthread_worker_scheduler_nocheck);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3a286f3b8b3c..d882464c71d7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3916,7 +3916,6 @@ static int __init rcu_spawn_gp_kthread(void)
 	int kthread_prio_in = kthread_prio;
 	struct rcu_node *rnp;
 	struct rcu_state *rsp;
-	struct sched_param sp;
 	int ret;
 
 	/* Force priority into range. */
@@ -3940,11 +3939,10 @@ static int __init rcu_spawn_gp_kthread(void)
 		BUG_ON(ret);
 		rnp = rcu_get_root(rsp);
 		raw_spin_lock_irqsave(&rnp->lock, flags);
-		if (kthread_prio) {
-			sp.sched_priority = kthread_prio;
-			sched_setscheduler_nocheck(rsp->gp_worker.task,
-						   SCHED_FIFO, &sp);
-		}
+		if (kthread_prio)
+			set_kthread_worker_scheduler_nocheck(&rsp->gp_worker,
+							     SCHED_FIFO,
+							     kthread_prio);
 		queue_kthread_work(&rsp->gp_worker, &rsp->gp_init_work);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 73e4c7f11a2c..89028165bb22 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -469,13 +469,10 @@ static int __init ring_buffer_benchmark_init(void)
 			set_user_nice(consumer, consumer_nice);
 	}
 
-	if (producer_fifo >= 0) {
-		struct sched_param param = {
-			.sched_priority = producer_fifo
-		};
-		sched_setscheduler(rb_producer_worker.task,
-				   SCHED_FIFO, &param);
-	} else
+	if (producer_fifo >= 0)
+		set_kthread_worker_scheduler(&rb_producer_worker,
+					     SCHED_FIFO, producer_fifo);
+	else
 		set_kthread_worker_user_nice(&rb_producer_worker,
 					     producer_nice);
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-28 17:18     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:20PM +0200, Petr Mladek wrote:
> +/*
> + * Test whether @work is being queued from another work
> + * executing on the same kthread.
> + */
> +static bool is_chained_work(struct kthread_worker *worker)
> +{
> +	struct kthread_worker *current_worker;
> +
> +	current_worker = current_kthread_worker();
> +	/*
> +	 * Return %true if I'm a kthread worker executing a work item on
> +	 * the given @worker.
> +	 */
> +	return current_worker && current_worker == worker;
> +}

I'm not sure full-on chained work detection is necessary here.
kthread worker's usages tend to be significantly simpler and draining
is only gonna be used for destruction.

> +void drain_kthread_worker(struct kthread_worker *worker)
> +{
> +	int flush_cnt = 0;
> +
> +	spin_lock_irq(&worker->lock);
> +	worker->nr_drainers++;
> +
> +	while (!list_empty(&worker->work_list)) {
> +		/*
> +		 * Unlock, so we could move forward. Note that queuing
> +		 * is limited by @nr_drainers > 0.
> +		 */
> +		spin_unlock_irq(&worker->lock);
> +
> +		flush_kthread_worker(worker);
> +
> +		if (++flush_cnt == 10 ||
> +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> +				worker->task->comm, flush_cnt);
> +
> +		spin_lock_irq(&worker->lock);
> +	}

I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-28 17:18     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Jul 28, 2015 at 04:39:20PM +0200, Petr Mladek wrote:
> +/*
> + * Test whether @work is being queued from another work
> + * executing on the same kthread.
> + */
> +static bool is_chained_work(struct kthread_worker *worker)
> +{
> +	struct kthread_worker *current_worker;
> +
> +	current_worker = current_kthread_worker();
> +	/*
> +	 * Return %true if I'm a kthread worker executing a work item on
> +	 * the given @worker.
> +	 */
> +	return current_worker && current_worker == worker;
> +}

I'm not sure full-on chained work detection is necessary here.
kthread worker's usages tend to be significantly simpler and draining
is only gonna be used for destruction.

> +void drain_kthread_worker(struct kthread_worker *worker)
> +{
> +	int flush_cnt = 0;
> +
> +	spin_lock_irq(&worker->lock);
> +	worker->nr_drainers++;
> +
> +	while (!list_empty(&worker->work_list)) {
> +		/*
> +		 * Unlock, so we could move forward. Note that queuing
> +		 * is limited by @nr_drainers > 0.
> +		 */
> +		spin_unlock_irq(&worker->lock);
> +
> +		flush_kthread_worker(worker);
> +
> +		if (++flush_cnt == 10 ||
> +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> +				worker->task->comm, flush_cnt);
> +
> +		spin_lock_irq(&worker->lock);
> +	}

I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-28 17:18     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:20PM +0200, Petr Mladek wrote:
> +/*
> + * Test whether @work is being queued from another work
> + * executing on the same kthread.
> + */
> +static bool is_chained_work(struct kthread_worker *worker)
> +{
> +	struct kthread_worker *current_worker;
> +
> +	current_worker = current_kthread_worker();
> +	/*
> +	 * Return %true if I'm a kthread worker executing a work item on
> +	 * the given @worker.
> +	 */
> +	return current_worker && current_worker == worker;
> +}

I'm not sure full-on chained work detection is necessary here.
kthread worker's usages tend to be significantly simpler and draining
is only gonna be used for destruction.

> +void drain_kthread_worker(struct kthread_worker *worker)
> +{
> +	int flush_cnt = 0;
> +
> +	spin_lock_irq(&worker->lock);
> +	worker->nr_drainers++;
> +
> +	while (!list_empty(&worker->work_list)) {
> +		/*
> +		 * Unlock, so we could move forward. Note that queuing
> +		 * is limited by @nr_drainers > 0.
> +		 */
> +		spin_unlock_irq(&worker->lock);
> +
> +		flush_kthread_worker(worker);
> +
> +		if (++flush_cnt == 10 ||
> +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> +				worker->task->comm, flush_cnt);
> +
> +		spin_lock_irq(&worker->lock);
> +	}

I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 05/14] kthread: Add wakeup_and_destroy_kthread_worker()
@ 2015-07-28 17:23     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:22PM +0200, Petr Mladek wrote:
...
> +void wakeup_and_destroy_kthread_worker(struct kthread_worker *worker)
> +{
> +	struct task_struct *task = worker->task;
> +
> +	if (WARN_ON(!task))
> +		return;
> +
> +	spin_lock_irq(&worker->lock);
> +	if (worker->current_work)
> +		wake_up_process(worker->task);
> +	spin_unlock_irq(&worker->lock);
> +
> +	destroy_kthread_worker(worker);
> +}

I don't know.  Wouldn't it make far more sense to convert those wake
up events with queueings?  It seems backwards to be converting things
to work item based interface and then insert work items which wait for
external events.  More on this later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 05/14] kthread: Add wakeup_and_destroy_kthread_worker()
@ 2015-07-28 17:23     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Jul 28, 2015 at 04:39:22PM +0200, Petr Mladek wrote:
...
> +void wakeup_and_destroy_kthread_worker(struct kthread_worker *worker)
> +{
> +	struct task_struct *task = worker->task;
> +
> +	if (WARN_ON(!task))
> +		return;
> +
> +	spin_lock_irq(&worker->lock);
> +	if (worker->current_work)
> +		wake_up_process(worker->task);
> +	spin_unlock_irq(&worker->lock);
> +
> +	destroy_kthread_worker(worker);
> +}

I don't know.  Wouldn't it make far more sense to convert those wake
up events with queueings?  It seems backwards to be converting things
to work item based interface and then insert work items which wait for
external events.  More on this later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 05/14] kthread: Add wakeup_and_destroy_kthread_worker()
@ 2015-07-28 17:23     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:22PM +0200, Petr Mladek wrote:
...
> +void wakeup_and_destroy_kthread_worker(struct kthread_worker *worker)
> +{
> +	struct task_struct *task = worker->task;
> +
> +	if (WARN_ON(!task))
> +		return;
> +
> +	spin_lock_irq(&worker->lock);
> +	if (worker->current_work)
> +		wake_up_process(worker->task);
> +	spin_unlock_irq(&worker->lock);
> +
> +	destroy_kthread_worker(worker);
> +}

I don't know.  Wouldn't it make far more sense to convert those wake
up events with queueings?  It seems backwards to be converting things
to work item based interface and then insert work items which wait for
external events.  More on this later.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 06/14] kthread: Add kthread_worker_created()
@ 2015-07-28 17:26     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:23PM +0200, Petr Mladek wrote:
> I would like to make cleaner kthread worker API and hide the definition
> of struct kthread_worker. It will prevent any custom hacks and make
> the API more secure.
> 
> This patch provides an API to check if the worker has been created
> and hides the implementation details.

Maybe it'd be a better idea to make create_kthread_worker() allocate
and return pointer to struct kthread_worker?  You're adding
create/destroy interface anyway, it won't need a separate created
query function and the synchronization rules would be self-evident.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 06/14] kthread: Add kthread_worker_created()
@ 2015-07-28 17:26     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Jul 28, 2015 at 04:39:23PM +0200, Petr Mladek wrote:
> I would like to make cleaner kthread worker API and hide the definition
> of struct kthread_worker. It will prevent any custom hacks and make
> the API more secure.
> 
> This patch provides an API to check if the worker has been created
> and hides the implementation details.

Maybe it'd be a better idea to make create_kthread_worker() allocate
and return pointer to struct kthread_worker?  You're adding
create/destroy interface anyway, it won't need a separate created
query function and the synchronization rules would be self-evident.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 06/14] kthread: Add kthread_worker_created()
@ 2015-07-28 17:26     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:23PM +0200, Petr Mladek wrote:
> I would like to make cleaner kthread worker API and hide the definition
> of struct kthread_worker. It will prevent any custom hacks and make
> the API more secure.
> 
> This patch provides an API to check if the worker has been created
> and hides the implementation details.

Maybe it'd be a better idea to make create_kthread_worker() allocate
and return pointer to struct kthread_worker?  You're adding
create/destroy interface anyway, it won't need a separate created
query function and the synchronization rules would be self-evident.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API
  2015-07-28 14:39   ` Petr Mladek
@ 2015-07-28 17:36     ` Tejun Heo
  -1 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:24PM +0200, Petr Mladek wrote:
> -static void khugepaged_wait_work(void)
> +static void khugepaged_wait_func(struct kthread_work *dummy)
>  {
>  	if (khugepaged_has_work()) {
>  		if (!khugepaged_scan_sleep_millisecs)
> -			return;
> +			goto out;
>  
>  		wait_event_freezable_timeout(khugepaged_wait,
> -					     kthread_should_stop(),
> +					     !khugepaged_enabled(),
>  			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
> -		return;
> +		goto out;
>  	}
>  
>  	if (khugepaged_enabled())
>  		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
> +
> +out:
> +	if (khugepaged_enabled())
> +		queue_kthread_work(&khugepaged_worker,
> +				   &khugepaged_do_scan_work);
>  }

There gotta be a better way to do this.  It's outright weird to
convert it over to work item based interface and then handle idle
periods by injecting wait work items.  If there's an external event
which wakes up the worker, convert that to a queueing event.  If it's
a timed event, implement a delayed work and queue that with delay.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2015-07-28 17:36     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Tue, Jul 28, 2015 at 04:39:24PM +0200, Petr Mladek wrote:
> -static void khugepaged_wait_work(void)
> +static void khugepaged_wait_func(struct kthread_work *dummy)
>  {
>  	if (khugepaged_has_work()) {
>  		if (!khugepaged_scan_sleep_millisecs)
> -			return;
> +			goto out;
>  
>  		wait_event_freezable_timeout(khugepaged_wait,
> -					     kthread_should_stop(),
> +					     !khugepaged_enabled(),
>  			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
> -		return;
> +		goto out;
>  	}
>  
>  	if (khugepaged_enabled())
>  		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
> +
> +out:
> +	if (khugepaged_enabled())
> +		queue_kthread_work(&khugepaged_worker,
> +				   &khugepaged_do_scan_work);
>  }

There gotta be a better way to do this.  It's outright weird to
convert it over to work item based interface and then handle idle
periods by injecting wait work items.  If there's an external event
which wakes up the worker, convert that to a queueing event.  If it's
a timed event, implement a delayed work and queue that with delay.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/14] rcu: Convert RCU gp kthreads into kthread worker API
  2015-07-28 14:39   ` Petr Mladek
@ 2015-07-28 17:37     ` Tejun Heo
  -1 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 04:39:25PM +0200, Petr Mladek wrote:
...
> -static int __noreturn rcu_gp_kthread(void *arg)
> +static void rcu_gp_kthread_func(struct kthread_work *work)
>  {
>  	int fqs_state;
>  	int gf;
>  	unsigned long j;
>  	int ret;
> -	struct rcu_state *rsp = arg;
> +	struct rcu_state *rsp = container_of(work, struct rcu_state, gp_work);
>  	struct rcu_node *rnp = rcu_get_root(rsp);
>  
> -	rcu_bind_gp_kthread();
> +	/* Handle grace-period start. */
>  	for (;;) {
> +		trace_rcu_grace_period(rsp->name,
> +				       READ_ONCE(rsp->gpnum),
> +				       TPS("reqwait"));
> +		rsp->gp_state = RCU_GP_WAIT_GPS;
> +		wait_event_interruptible(rsp->gp_wq,
> +					 READ_ONCE(rsp->gp_flags) &
> +					 RCU_GP_FLAG_INIT);

Same here.  Why not convert the waker into a queueing event?

> +		/* Locking provides needed memory barrier. */
> +		if (rcu_gp_init(rsp))
> +			break;
> +		cond_resched_rcu_qs();
> +		WRITE_ONCE(rsp->gp_activity, jiffies);
> +		WARN_ON(signal_pending(current));
> +		trace_rcu_grace_period(rsp->name,
> +				       READ_ONCE(rsp->gpnum),
> +				       TPS("reqwaitsig"));
> +	}

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/14] rcu: Convert RCU gp kthreads into kthread worker API
@ 2015-07-28 17:37     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 04:39:25PM +0200, Petr Mladek wrote:
...
> -static int __noreturn rcu_gp_kthread(void *arg)
> +static void rcu_gp_kthread_func(struct kthread_work *work)
>  {
>  	int fqs_state;
>  	int gf;
>  	unsigned long j;
>  	int ret;
> -	struct rcu_state *rsp = arg;
> +	struct rcu_state *rsp = container_of(work, struct rcu_state, gp_work);
>  	struct rcu_node *rnp = rcu_get_root(rsp);
>  
> -	rcu_bind_gp_kthread();
> +	/* Handle grace-period start. */
>  	for (;;) {
> +		trace_rcu_grace_period(rsp->name,
> +				       READ_ONCE(rsp->gpnum),
> +				       TPS("reqwait"));
> +		rsp->gp_state = RCU_GP_WAIT_GPS;
> +		wait_event_interruptible(rsp->gp_wq,
> +					 READ_ONCE(rsp->gp_flags) &
> +					 RCU_GP_FLAG_INIT);

Same here.  Why not convert the waker into a queueing event?

> +		/* Locking provides needed memory barrier. */
> +		if (rcu_gp_init(rsp))
> +			break;
> +		cond_resched_rcu_qs();
> +		WRITE_ONCE(rsp->gp_activity, jiffies);
> +		WARN_ON(signal_pending(current));
> +		trace_rcu_grace_period(rsp->name,
> +				       READ_ONCE(rsp->gpnum),
> +				       TPS("reqwaitsig"));
> +	}

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-28 17:40     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:40 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 04:39:30PM +0200, Petr Mladek wrote:
...
> +/*
> + * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
> + * @worker: target kthread_worker
> + * @nice: niceness value
> + */
> +void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
> +{
> +	struct task_struct *task = worker->task;
> +
> +	WARN_ON(!task);
> +	set_user_nice(task, nice);
> +}
> +EXPORT_SYMBOL(set_kthread_worker_user_nice);

kthread_worker is explcitly associated with a single kthread.  Why do
we want to create explicit wrappers for kthread operations?  This is
encapsulation for encapsulation's sake.  It doesn't buy us anything at
all.  Just let the user access the associated kthread and operate on
it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-28 17:40     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:40 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Jul 28, 2015 at 04:39:30PM +0200, Petr Mladek wrote:
...
> +/*
> + * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
> + * @worker: target kthread_worker
> + * @nice: niceness value
> + */
> +void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
> +{
> +	struct task_struct *task = worker->task;
> +
> +	WARN_ON(!task);
> +	set_user_nice(task, nice);
> +}
> +EXPORT_SYMBOL(set_kthread_worker_user_nice);

kthread_worker is explcitly associated with a single kthread.  Why do
we want to create explicit wrappers for kthread operations?  This is
encapsulation for encapsulation's sake.  It doesn't buy us anything at
all.  Just let the user access the associated kthread and operate on
it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-28 17:40     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:40 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 04:39:30PM +0200, Petr Mladek wrote:
...
> +/*
> + * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
> + * @worker: target kthread_worker
> + * @nice: niceness value
> + */
> +void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
> +{
> +	struct task_struct *task = worker->task;
> +
> +	WARN_ON(!task);
> +	set_user_nice(task, nice);
> +}
> +EXPORT_SYMBOL(set_kthread_worker_user_nice);

kthread_worker is explcitly associated with a single kthread.  Why do
we want to create explicit wrappers for kthread operations?  This is
encapsulation for encapsulation's sake.  It doesn't buy us anything at
all.  Just let the user access the associated kthread and operate on
it.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
  2015-07-28 14:39   ` Petr Mladek
@ 2015-07-28 17:41     ` Tejun Heo
  -1 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:41 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 04:39:31PM +0200, Petr Mladek wrote:
> +/**
> + * set_kthread_worker_scheduler - change the scheduling policy and/or RT
> + *	priority of a kthread worker.
> + * @worker: target kthread_worker
> + * @policy: new policy
> + * @sched_priority: new RT priority
> + *
> + * Return: 0 on success. An error code otherwise.
> + */
> +int set_kthread_worker_scheduler(struct kthread_worker *worker,
> +				 int policy, int sched_priority)
> +{
> +	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
> +					      true);
> +}

Ditto.  I don't get why we would want these thin wrappers.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
@ 2015-07-28 17:41     ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-28 17:41 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 04:39:31PM +0200, Petr Mladek wrote:
> +/**
> + * set_kthread_worker_scheduler - change the scheduling policy and/or RT
> + *	priority of a kthread worker.
> + * @worker: target kthread_worker
> + * @policy: new policy
> + * @sched_priority: new RT priority
> + *
> + * Return: 0 on success. An error code otherwise.
> + */
> +int set_kthread_worker_scheduler(struct kthread_worker *worker,
> +				 int policy, int sched_priority)
> +{
> +	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
> +					      true);
> +}

Ditto.  I don't get why we would want these thin wrappers.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
@ 2015-07-28 19:48       ` Peter Zijlstra
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Zijlstra @ 2015-07-28 19:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Andrew Morton, Oleg Nesterov, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 01:41:54PM -0400, Tejun Heo wrote:
> On Tue, Jul 28, 2015 at 04:39:31PM +0200, Petr Mladek wrote:
> > +/**
> > + * set_kthread_worker_scheduler - change the scheduling policy and/or RT
> > + *	priority of a kthread worker.
> > + * @worker: target kthread_worker
> > + * @policy: new policy
> > + * @sched_priority: new RT priority
> > + *
> > + * Return: 0 on success. An error code otherwise.
> > + */
> > +int set_kthread_worker_scheduler(struct kthread_worker *worker,
> > +				 int policy, int sched_priority)
> > +{
> > +	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
> > +					      true);
> > +}
> 
> Ditto.  I don't get why we would want these thin wrappers.

On top of which this is an obsolete interface :-)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
@ 2015-07-28 19:48       ` Peter Zijlstra
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Zijlstra @ 2015-07-28 19:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Andrew Morton, Oleg Nesterov, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Jul 28, 2015 at 01:41:54PM -0400, Tejun Heo wrote:
> On Tue, Jul 28, 2015 at 04:39:31PM +0200, Petr Mladek wrote:
> > +/**
> > + * set_kthread_worker_scheduler - change the scheduling policy and/or RT
> > + *	priority of a kthread worker.
> > + * @worker: target kthread_worker
> > + * @policy: new policy
> > + * @sched_priority: new RT priority
> > + *
> > + * Return: 0 on success. An error code otherwise.
> > + */
> > +int set_kthread_worker_scheduler(struct kthread_worker *worker,
> > +				 int policy, int sched_priority)
> > +{
> > +	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
> > +					      true);
> > +}
> 
> Ditto.  I don't get why we would want these thin wrappers.

On top of which this is an obsolete interface :-)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*()
@ 2015-07-28 19:48       ` Peter Zijlstra
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Zijlstra @ 2015-07-28 19:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Andrew Morton, Oleg Nesterov, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, Jul 28, 2015 at 01:41:54PM -0400, Tejun Heo wrote:
> On Tue, Jul 28, 2015 at 04:39:31PM +0200, Petr Mladek wrote:
> > +/**
> > + * set_kthread_worker_scheduler - change the scheduling policy and/or RT
> > + *	priority of a kthread worker.
> > + * @worker: target kthread_worker
> > + * @policy: new policy
> > + * @sched_priority: new RT priority
> > + *
> > + * Return: 0 on success. An error code otherwise.
> > + */
> > +int set_kthread_worker_scheduler(struct kthread_worker *worker,
> > +				 int policy, int sched_priority)
> > +{
> > +	return __set_kthread_worker_scheduler(worker, policy, sched_priority,
> > +					      true);
> > +}
> 
> Ditto.  I don't get why we would want these thin wrappers.

On top of which this is an obsolete interface :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-29 10:04       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 10:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:18:22, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:20PM +0200, Petr Mladek wrote:
> > +/*
> > + * Test whether @work is being queued from another work
> > + * executing on the same kthread.
> > + */
> > +static bool is_chained_work(struct kthread_worker *worker)
> > +{
> > +	struct kthread_worker *current_worker;
> > +
> > +	current_worker = current_kthread_worker();
> > +	/*
> > +	 * Return %true if I'm a kthread worker executing a work item on
> > +	 * the given @worker.
> > +	 */
> > +	return current_worker && current_worker == worker;
> > +}
> 
> I'm not sure full-on chained work detection is necessary here.
> kthread worker's usages tend to be significantly simpler and draining
> is only gonna be used for destruction.

I think that it might be useful to detect bugs when someone
depends on the worker when it is being destroyed. For example,
I tried to convert "khubd" kthread and there was not easy to
double check that this worked as expected.

I actually think about replacing

    WARN_ON_ONCE(!is_chained_work(worker)))

with

    WARN_ON(!is_chained_work(worker)))

in queue_kthread_work, so that we get the warning for all misused
workers.

> > +void drain_kthread_worker(struct kthread_worker *worker)
> > +{
> > +	int flush_cnt = 0;
> > +
> > +	spin_lock_irq(&worker->lock);
> > +	worker->nr_drainers++;
> > +
> > +	while (!list_empty(&worker->work_list)) {
> > +		/*
> > +		 * Unlock, so we could move forward. Note that queuing
> > +		 * is limited by @nr_drainers > 0.
> > +		 */
> > +		spin_unlock_irq(&worker->lock);
> > +
> > +		flush_kthread_worker(worker);
> > +
> > +		if (++flush_cnt == 10 ||
> > +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> > +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> > +				worker->task->comm, flush_cnt);
> > +
> > +		spin_lock_irq(&worker->lock);
> > +	}
> 
> I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").

This would print the warning only for one broken worker. But I do not
have strong opinion about it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-29 10:04       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 10:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue 2015-07-28 13:18:22, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:20PM +0200, Petr Mladek wrote:
> > +/*
> > + * Test whether @work is being queued from another work
> > + * executing on the same kthread.
> > + */
> > +static bool is_chained_work(struct kthread_worker *worker)
> > +{
> > +	struct kthread_worker *current_worker;
> > +
> > +	current_worker = current_kthread_worker();
> > +	/*
> > +	 * Return %true if I'm a kthread worker executing a work item on
> > +	 * the given @worker.
> > +	 */
> > +	return current_worker && current_worker == worker;
> > +}
> 
> I'm not sure full-on chained work detection is necessary here.
> kthread worker's usages tend to be significantly simpler and draining
> is only gonna be used for destruction.

I think that it might be useful to detect bugs when someone
depends on the worker when it is being destroyed. For example,
I tried to convert "khubd" kthread and there was not easy to
double check that this worked as expected.

I actually think about replacing

    WARN_ON_ONCE(!is_chained_work(worker)))

with

    WARN_ON(!is_chained_work(worker)))

in queue_kthread_work, so that we get the warning for all misused
workers.

> > +void drain_kthread_worker(struct kthread_worker *worker)
> > +{
> > +	int flush_cnt = 0;
> > +
> > +	spin_lock_irq(&worker->lock);
> > +	worker->nr_drainers++;
> > +
> > +	while (!list_empty(&worker->work_list)) {
> > +		/*
> > +		 * Unlock, so we could move forward. Note that queuing
> > +		 * is limited by @nr_drainers > 0.
> > +		 */
> > +		spin_unlock_irq(&worker->lock);
> > +
> > +		flush_kthread_worker(worker);
> > +
> > +		if (++flush_cnt == 10 ||
> > +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> > +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> > +				worker->task->comm, flush_cnt);
> > +
> > +		spin_lock_irq(&worker->lock);
> > +	}
> 
> I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").

This would print the warning only for one broken worker. But I do not
have strong opinion about it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-29 10:04       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 10:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:18:22, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:20PM +0200, Petr Mladek wrote:
> > +/*
> > + * Test whether @work is being queued from another work
> > + * executing on the same kthread.
> > + */
> > +static bool is_chained_work(struct kthread_worker *worker)
> > +{
> > +	struct kthread_worker *current_worker;
> > +
> > +	current_worker = current_kthread_worker();
> > +	/*
> > +	 * Return %true if I'm a kthread worker executing a work item on
> > +	 * the given @worker.
> > +	 */
> > +	return current_worker && current_worker == worker;
> > +}
> 
> I'm not sure full-on chained work detection is necessary here.
> kthread worker's usages tend to be significantly simpler and draining
> is only gonna be used for destruction.

I think that it might be useful to detect bugs when someone
depends on the worker when it is being destroyed. For example,
I tried to convert "khubd" kthread and there was not easy to
double check that this worked as expected.

I actually think about replacing

    WARN_ON_ONCE(!is_chained_work(worker)))

with

    WARN_ON(!is_chained_work(worker)))

in queue_kthread_work, so that we get the warning for all misused
workers.

> > +void drain_kthread_worker(struct kthread_worker *worker)
> > +{
> > +	int flush_cnt = 0;
> > +
> > +	spin_lock_irq(&worker->lock);
> > +	worker->nr_drainers++;
> > +
> > +	while (!list_empty(&worker->work_list)) {
> > +		/*
> > +		 * Unlock, so we could move forward. Note that queuing
> > +		 * is limited by @nr_drainers > 0.
> > +		 */
> > +		spin_unlock_irq(&worker->lock);
> > +
> > +		flush_kthread_worker(worker);
> > +
> > +		if (++flush_cnt == 10 ||
> > +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> > +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> > +				worker->task->comm, flush_cnt);
> > +
> > +		spin_lock_irq(&worker->lock);
> > +	}
> 
> I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").

This would print the warning only for one broken worker. But I do not
have strong opinion about it.

Best Regards,
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 06/14] kthread: Add kthread_worker_created()
@ 2015-07-29 10:07       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 10:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:26:57, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:23PM +0200, Petr Mladek wrote:
> > I would like to make cleaner kthread worker API and hide the definition
> > of struct kthread_worker. It will prevent any custom hacks and make
> > the API more secure.
> > 
> > This patch provides an API to check if the worker has been created
> > and hides the implementation details.
> 
> Maybe it'd be a better idea to make create_kthread_worker() allocate
> and return pointer to struct kthread_worker?  You're adding
> create/destroy interface anyway, it won't need a separate created
> query function and the synchronization rules would be self-evident.

Makes sense. I actually did it this way in one temporary version and reverted
it from some ugly reason.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 06/14] kthread: Add kthread_worker_created()
@ 2015-07-29 10:07       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 10:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue 2015-07-28 13:26:57, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:23PM +0200, Petr Mladek wrote:
> > I would like to make cleaner kthread worker API and hide the definition
> > of struct kthread_worker. It will prevent any custom hacks and make
> > the API more secure.
> > 
> > This patch provides an API to check if the worker has been created
> > and hides the implementation details.
> 
> Maybe it'd be a better idea to make create_kthread_worker() allocate
> and return pointer to struct kthread_worker?  You're adding
> create/destroy interface anyway, it won't need a separate created
> query function and the synchronization rules would be self-evident.

Makes sense. I actually did it this way in one temporary version and reverted
it from some ugly reason.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 06/14] kthread: Add kthread_worker_created()
@ 2015-07-29 10:07       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 10:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:26:57, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:23PM +0200, Petr Mladek wrote:
> > I would like to make cleaner kthread worker API and hide the definition
> > of struct kthread_worker. It will prevent any custom hacks and make
> > the API more secure.
> > 
> > This patch provides an API to check if the worker has been created
> > and hides the implementation details.
> 
> Maybe it'd be a better idea to make create_kthread_worker() allocate
> and return pointer to struct kthread_worker?  You're adding
> create/destroy interface anyway, it won't need a separate created
> query function and the synchronization rules would be self-evident.

Makes sense. I actually did it this way in one temporary version and reverted
it from some ugly reason.

Best Regards,
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-29 11:23       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 11:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:40:58, Tejun Heo wrote:
> On Tue, Jul 28, 2015 at 04:39:30PM +0200, Petr Mladek wrote:
> ...
> > +/*
> > + * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
> > + * @worker: target kthread_worker
> > + * @nice: niceness value
> > + */
> > +void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
> > +{
> > +	struct task_struct *task = worker->task;
> > +
> > +	WARN_ON(!task);
> > +	set_user_nice(task, nice);
> > +}
> > +EXPORT_SYMBOL(set_kthread_worker_user_nice);
> 
> kthread_worker is explcitly associated with a single kthread.  Why do
> we want to create explicit wrappers for kthread operations?  This is
> encapsulation for encapsulation's sake.  It doesn't buy us anything at
> all.  Just let the user access the associated kthread and operate on
> it.

My plan is to make the API cleaner and hide struct kthread_worker
definition into kthread.c. It would prevent anyone doing any hacks
with it. BTW, we do the same with struct workqueue_struct.

Another possibility would be to add helper function to get the
associated task struct but this might cause inconsistencies when
the worker is restarted.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-29 11:23       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 11:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue 2015-07-28 13:40:58, Tejun Heo wrote:
> On Tue, Jul 28, 2015 at 04:39:30PM +0200, Petr Mladek wrote:
> ...
> > +/*
> > + * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
> > + * @worker: target kthread_worker
> > + * @nice: niceness value
> > + */
> > +void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
> > +{
> > +	struct task_struct *task = worker->task;
> > +
> > +	WARN_ON(!task);
> > +	set_user_nice(task, nice);
> > +}
> > +EXPORT_SYMBOL(set_kthread_worker_user_nice);
> 
> kthread_worker is explcitly associated with a single kthread.  Why do
> we want to create explicit wrappers for kthread operations?  This is
> encapsulation for encapsulation's sake.  It doesn't buy us anything at
> all.  Just let the user access the associated kthread and operate on
> it.

My plan is to make the API cleaner and hide struct kthread_worker
definition into kthread.c. It would prevent anyone doing any hacks
with it. BTW, we do the same with struct workqueue_struct.

Another possibility would be to add helper function to get the
associated task struct but this might cause inconsistencies when
the worker is restarted.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-29 11:23       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 11:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:40:58, Tejun Heo wrote:
> On Tue, Jul 28, 2015 at 04:39:30PM +0200, Petr Mladek wrote:
> ...
> > +/*
> > + * set_kthread_worker_user_nice - set scheduling priority for the kthread worker
> > + * @worker: target kthread_worker
> > + * @nice: niceness value
> > + */
> > +void set_kthread_worker_user_nice(struct kthread_worker *worker, long nice)
> > +{
> > +	struct task_struct *task = worker->task;
> > +
> > +	WARN_ON(!task);
> > +	set_user_nice(task, nice);
> > +}
> > +EXPORT_SYMBOL(set_kthread_worker_user_nice);
> 
> kthread_worker is explcitly associated with a single kthread.  Why do
> we want to create explicit wrappers for kthread operations?  This is
> encapsulation for encapsulation's sake.  It doesn't buy us anything at
> all.  Just let the user access the associated kthread and operate on
> it.

My plan is to make the API cleaner and hide struct kthread_worker
definition into kthread.c. It would prevent anyone doing any hacks
with it. BTW, we do the same with struct workqueue_struct.

Another possibility would be to add helper function to get the
associated task struct but this might cause inconsistencies when
the worker is restarted.

Best Regards,
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2015-07-29 11:32       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 11:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:36:35, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:24PM +0200, Petr Mladek wrote:
> > -static void khugepaged_wait_work(void)
> > +static void khugepaged_wait_func(struct kthread_work *dummy)
> >  {
> >  	if (khugepaged_has_work()) {
> >  		if (!khugepaged_scan_sleep_millisecs)
> > -			return;
> > +			goto out;
> >  
> >  		wait_event_freezable_timeout(khugepaged_wait,
> > -					     kthread_should_stop(),
> > +					     !khugepaged_enabled(),
> >  			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
> > -		return;
> > +		goto out;
> >  	}
> >  
> >  	if (khugepaged_enabled())
> >  		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
> > +
> > +out:
> > +	if (khugepaged_enabled())
> > +		queue_kthread_work(&khugepaged_worker,
> > +				   &khugepaged_do_scan_work);
> >  }
> 
> There gotta be a better way to do this.  It's outright weird to
> convert it over to work item based interface and then handle idle
> periods by injecting wait work items.  If there's an external event
> which wakes up the worker, convert that to a queueing event.  If it's
> a timed event, implement a delayed work and queue that with delay.

I am going to give it a try.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2015-07-29 11:32       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 11:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue 2015-07-28 13:36:35, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:24PM +0200, Petr Mladek wrote:
> > -static void khugepaged_wait_work(void)
> > +static void khugepaged_wait_func(struct kthread_work *dummy)
> >  {
> >  	if (khugepaged_has_work()) {
> >  		if (!khugepaged_scan_sleep_millisecs)
> > -			return;
> > +			goto out;
> >  
> >  		wait_event_freezable_timeout(khugepaged_wait,
> > -					     kthread_should_stop(),
> > +					     !khugepaged_enabled(),
> >  			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
> > -		return;
> > +		goto out;
> >  	}
> >  
> >  	if (khugepaged_enabled())
> >  		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
> > +
> > +out:
> > +	if (khugepaged_enabled())
> > +		queue_kthread_work(&khugepaged_worker,
> > +				   &khugepaged_do_scan_work);
> >  }
> 
> There gotta be a better way to do this.  It's outright weird to
> convert it over to work item based interface and then handle idle
> periods by injecting wait work items.  If there's an external event
> which wakes up the worker, convert that to a queueing event.  If it's
> a timed event, implement a delayed work and queue that with delay.

I am going to give it a try.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2015-07-29 11:32       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-07-29 11:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue 2015-07-28 13:36:35, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 28, 2015 at 04:39:24PM +0200, Petr Mladek wrote:
> > -static void khugepaged_wait_work(void)
> > +static void khugepaged_wait_func(struct kthread_work *dummy)
> >  {
> >  	if (khugepaged_has_work()) {
> >  		if (!khugepaged_scan_sleep_millisecs)
> > -			return;
> > +			goto out;
> >  
> >  		wait_event_freezable_timeout(khugepaged_wait,
> > -					     kthread_should_stop(),
> > +					     !khugepaged_enabled(),
> >  			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
> > -		return;
> > +		goto out;
> >  	}
> >  
> >  	if (khugepaged_enabled())
> >  		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
> > +
> > +out:
> > +	if (khugepaged_enabled())
> > +		queue_kthread_work(&khugepaged_worker,
> > +				   &khugepaged_do_scan_work);
> >  }
> 
> There gotta be a better way to do this.  It's outright weird to
> convert it over to work item based interface and then handle idle
> periods by injecting wait work items.  If there's an external event
> which wakes up the worker, convert that to a queueing event.  If it's
> a timed event, implement a delayed work and queue that with delay.

I am going to give it a try.

Best Regards,
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-29 15:03         ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-29 15:03 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello, Petr.

On Wed, Jul 29, 2015 at 12:04:57PM +0200, Petr Mladek wrote:
> > I'm not sure full-on chained work detection is necessary here.
> > kthread worker's usages tend to be significantly simpler and draining
> > is only gonna be used for destruction.
> 
> I think that it might be useful to detect bugs when someone
> depends on the worker when it is being destroyed. For example,
> I tried to convert "khubd" kthread and there was not easy to
> double check that this worked as expected.
> 
> I actually think about replacing
> 
>     WARN_ON_ONCE(!is_chained_work(worker)))
> 
> with
> 
>     WARN_ON(!is_chained_work(worker)))
> 
> in queue_kthread_work, so that we get the warning for all misused
> workers.

This is a partial soluation no matter what you do especially for
destruction path as there's nothing which prevents draining and
destruction winning the race and then external queueing coming in
afterwards.  For use-after-free, slab debug should work pretty well.
I really don't think we need anything special here.

> > > +	while (!list_empty(&worker->work_list)) {
> > > +		/*
> > > +		 * Unlock, so we could move forward. Note that queuing
> > > +		 * is limited by @nr_drainers > 0.
> > > +		 */
> > > +		spin_unlock_irq(&worker->lock);
> > > +
> > > +		flush_kthread_worker(worker);
> > > +
> > > +		if (++flush_cnt == 10 ||
> > > +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> > > +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> > > +				worker->task->comm, flush_cnt);
> > > +
> > > +		spin_lock_irq(&worker->lock);
> > > +	}
> > 
> > I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").
> 
> This would print the warning only for one broken worker. But I do not
> have strong opinion about it.

I really think that'd be a good enough protection here.  It's
indicative an outright kernel bug and things tend to go awry and/or
badly reported after the initial failure anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-29 15:03         ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-29 15:03 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Petr.

On Wed, Jul 29, 2015 at 12:04:57PM +0200, Petr Mladek wrote:
> > I'm not sure full-on chained work detection is necessary here.
> > kthread worker's usages tend to be significantly simpler and draining
> > is only gonna be used for destruction.
> 
> I think that it might be useful to detect bugs when someone
> depends on the worker when it is being destroyed. For example,
> I tried to convert "khubd" kthread and there was not easy to
> double check that this worked as expected.
> 
> I actually think about replacing
> 
>     WARN_ON_ONCE(!is_chained_work(worker)))
> 
> with
> 
>     WARN_ON(!is_chained_work(worker)))
> 
> in queue_kthread_work, so that we get the warning for all misused
> workers.

This is a partial soluation no matter what you do especially for
destruction path as there's nothing which prevents draining and
destruction winning the race and then external queueing coming in
afterwards.  For use-after-free, slab debug should work pretty well.
I really don't think we need anything special here.

> > > +	while (!list_empty(&worker->work_list)) {
> > > +		/*
> > > +		 * Unlock, so we could move forward. Note that queuing
> > > +		 * is limited by @nr_drainers > 0.
> > > +		 */
> > > +		spin_unlock_irq(&worker->lock);
> > > +
> > > +		flush_kthread_worker(worker);
> > > +
> > > +		if (++flush_cnt == 10 ||
> > > +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> > > +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> > > +				worker->task->comm, flush_cnt);
> > > +
> > > +		spin_lock_irq(&worker->lock);
> > > +	}
> > 
> > I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").
> 
> This would print the warning only for one broken worker. But I do not
> have strong opinion about it.

I really think that'd be a good enough protection here.  It's
indicative an outright kernel bug and things tend to go awry and/or
badly reported after the initial failure anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 03/14] kthread: Add drain_kthread_worker()
@ 2015-07-29 15:03         ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-29 15:03 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello, Petr.

On Wed, Jul 29, 2015 at 12:04:57PM +0200, Petr Mladek wrote:
> > I'm not sure full-on chained work detection is necessary here.
> > kthread worker's usages tend to be significantly simpler and draining
> > is only gonna be used for destruction.
> 
> I think that it might be useful to detect bugs when someone
> depends on the worker when it is being destroyed. For example,
> I tried to convert "khubd" kthread and there was not easy to
> double check that this worked as expected.
> 
> I actually think about replacing
> 
>     WARN_ON_ONCE(!is_chained_work(worker)))
> 
> with
> 
>     WARN_ON(!is_chained_work(worker)))
> 
> in queue_kthread_work, so that we get the warning for all misused
> workers.

This is a partial soluation no matter what you do especially for
destruction path as there's nothing which prevents draining and
destruction winning the race and then external queueing coming in
afterwards.  For use-after-free, slab debug should work pretty well.
I really don't think we need anything special here.

> > > +	while (!list_empty(&worker->work_list)) {
> > > +		/*
> > > +		 * Unlock, so we could move forward. Note that queuing
> > > +		 * is limited by @nr_drainers > 0.
> > > +		 */
> > > +		spin_unlock_irq(&worker->lock);
> > > +
> > > +		flush_kthread_worker(worker);
> > > +
> > > +		if (++flush_cnt == 10 ||
> > > +		    (flush_cnt % 100 == 0 && flush_cnt <= 1000))
> > > +			pr_warn("kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
> > > +				worker->task->comm, flush_cnt);
> > > +
> > > +		spin_lock_irq(&worker->lock);
> > > +	}
> > 
> > I'd just do something like WARN_ONCE(flush_cnt++ > 10, "kthread worker: ...").
> 
> This would print the warning only for one broken worker. But I do not
> have strong opinion about it.

I really think that'd be a good enough protection here.  It's
indicative an outright kernel bug and things tend to go awry and/or
badly reported after the initial failure anyway.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-29 15:12         ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-29 15:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Wed, Jul 29, 2015 at 01:23:54PM +0200, Petr Mladek wrote:
> My plan is to make the API cleaner and hide struct kthread_worker
> definition into kthread.c. It would prevent anyone doing any hacks
> with it. BTW, we do the same with struct workqueue_struct.

I think obsessive attachment to cleanliness tends to worse code in
general like simple several liner wrappers which don't do anything
other than increasing interface surface and obscuring what's going on.
Let's please take a reasonable trade-off.  It shouldn't be nasty but
we don't want to be paying unnecessary complexity for perfect purity
either.

> Another possibility would be to add helper function to get the
> associated task struct but this might cause inconsistencies when
> the worker is restarted.

A kthread_worker would be instantiated on the create call and released
on destroy and the caller is natrually expected to synchronize
creation and destruction against all other operations.  Nothing seems
complicated or subtle to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-29 15:12         ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-29 15:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Wed, Jul 29, 2015 at 01:23:54PM +0200, Petr Mladek wrote:
> My plan is to make the API cleaner and hide struct kthread_worker
> definition into kthread.c. It would prevent anyone doing any hacks
> with it. BTW, we do the same with struct workqueue_struct.

I think obsessive attachment to cleanliness tends to worse code in
general like simple several liner wrappers which don't do anything
other than increasing interface surface and obscuring what's going on.
Let's please take a reasonable trade-off.  It shouldn't be nasty but
we don't want to be paying unnecessary complexity for perfect purity
either.

> Another possibility would be to add helper function to get the
> associated task struct but this might cause inconsistencies when
> the worker is restarted.

A kthread_worker would be instantiated on the create call and released
on destroy and the caller is natrually expected to synchronize
creation and destruction against all other operations.  Nothing seems
complicated or subtle to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice()
@ 2015-07-29 15:12         ` Tejun Heo
  0 siblings, 0 replies; 86+ messages in thread
From: Tejun Heo @ 2015-07-29 15:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Hello,

On Wed, Jul 29, 2015 at 01:23:54PM +0200, Petr Mladek wrote:
> My plan is to make the API cleaner and hide struct kthread_worker
> definition into kthread.c. It would prevent anyone doing any hacks
> with it. BTW, we do the same with struct workqueue_struct.

I think obsessive attachment to cleanliness tends to worse code in
general like simple several liner wrappers which don't do anything
other than increasing interface surface and obscuring what's going on.
Let's please take a reasonable trade-off.  It shouldn't be nasty but
we don't want to be paying unnecessary complexity for perfect purity
either.

> Another possibility would be to add helper function to get the
> associated task struct but this might cause inconsistencies when
> the worker is restarted.

A kthread_worker would be instantiated on the create call and released
on destroy and the caller is natrually expected to synchronize
creation and destruction against all other operations.  Nothing seems
complicated or subtle to me.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
  2015-07-28 14:39   ` Petr Mladek
@ 2015-08-03 18:31     ` Steven Rostedt
  -1 siblings, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2015-08-03 18:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, 28 Jul 2015 16:39:26 +0200
Petr Mladek <pmladek@suse.com> wrote:

> It looks strange to initialize the completions repeatedly.
> 
> This patch uses static initialization. It simplifies the code
> and even helps to get rid of two memory barriers.

There was a reason I did it this way and did not use static
initializers. But I can't recall why I did that. :-/

I'll have to think about this some more.

-- Steve


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
@ 2015-08-03 18:31     ` Steven Rostedt
  0 siblings, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2015-08-03 18:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, 28 Jul 2015 16:39:26 +0200
Petr Mladek <pmladek@suse.com> wrote:

> It looks strange to initialize the completions repeatedly.
> 
> This patch uses static initialization. It simplifies the code
> and even helps to get rid of two memory barriers.

There was a reason I did it this way and did not use static
initializers. But I can't recall why I did that. :-/

I'll have to think about this some more.

-- Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
  2015-07-28 14:39   ` Petr Mladek
@ 2015-08-03 18:33     ` Steven Rostedt
  -1 siblings, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2015-08-03 18:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, 28 Jul 2015 16:39:27 +0200
Petr Mladek <pmladek@suse.com> wrote:

> @@ -384,7 +389,7 @@ static int ring_buffer_consumer_thread(void *arg)
>  
>  static int ring_buffer_producer_thread(void *arg)
>  {
> -	while (!kthread_should_stop() && !kill_test) {
> +	while (!break_test()) {
>  		ring_buffer_reset(buffer);
>  
>  		if (consumer) {
> @@ -393,11 +398,15 @@ static int ring_buffer_producer_thread(void *arg)
>  		}
>  
>  		ring_buffer_producer();
> -		if (kill_test)
> +		if (break_test())
>  			goto out_kill;
>  
>  		trace_printk("Sleeping for 10 secs\n");
>  		set_current_state(TASK_INTERRUPTIBLE);
> +		if (break_test()) {
> +			__set_current_state(TASK_RUNNING);

Move the setting of the current state to after the out_kill label.

-- Steve

> +			goto out_kill;
> +		}
>  		schedule_timeout(HZ * SLEEP_TIME);
>  	}
>  


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
@ 2015-08-03 18:33     ` Steven Rostedt
  0 siblings, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2015-08-03 18:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Tue, 28 Jul 2015 16:39:27 +0200
Petr Mladek <pmladek@suse.com> wrote:

> @@ -384,7 +389,7 @@ static int ring_buffer_consumer_thread(void *arg)
>  
>  static int ring_buffer_producer_thread(void *arg)
>  {
> -	while (!kthread_should_stop() && !kill_test) {
> +	while (!break_test()) {
>  		ring_buffer_reset(buffer);
>  
>  		if (consumer) {
> @@ -393,11 +398,15 @@ static int ring_buffer_producer_thread(void *arg)
>  		}
>  
>  		ring_buffer_producer();
> -		if (kill_test)
> +		if (break_test())
>  			goto out_kill;
>  
>  		trace_printk("Sleeping for 10 secs\n");
>  		set_current_state(TASK_INTERRUPTIBLE);
> +		if (break_test()) {
> +			__set_current_state(TASK_RUNNING);

Move the setting of the current state to after the out_kill label.

-- Steve

> +			goto out_kill;
> +		}
>  		schedule_timeout(HZ * SLEEP_TIME);
>  	}
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
@ 2015-09-04  9:31       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-09-04  9:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Mon 2015-08-03 14:31:09, Steven Rostedt wrote:
> On Tue, 28 Jul 2015 16:39:26 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > It looks strange to initialize the completions repeatedly.
> > 
> > This patch uses static initialization. It simplifies the code
> > and even helps to get rid of two memory barriers.
> 
> There was a reason I did it this way and did not use static
> initializers. But I can't recall why I did that. :-/
> 
> I'll have to think about this some more.

Heh, the parallel programming is a real fun. I tried to understand
the code in more details and sometimes felt like Duane Dibbley.

Anyway, I found few possible races related to the completions.
One scenario was opened by my previous fix b44754d8262d3aab8429
("ring_buffer: Allow to exit the ring buffer benchmark immediately")

The races can be fixed by the patch below. I still do not see any
scenario where the extra initialization of the two completions
is needed but I am not brave enough to remove it after all ;-)


>From ad75428b1e5e5127bf7dc6062f880ece11dbdbbf Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Fri, 28 Aug 2015 15:59:00 +0200
Subject: [PATCH 1/2] ring_buffer: Do no not complete benchmark reader too
 early

It seems that complete(&read_done) might be called too early
in some situations.

1st scenario:
-------------

CPU0					CPU1

ring_buffer_producer_thread()
  wake_up_process(consumer);
  wait_for_completion(&read_start);

					ring_buffer_consumer_thread()
					  complete(&read_start);

  ring_buffer_producer()
    # producing data in
    # the do-while cycle

					  ring_buffer_consumer();
					    # reading data
					    # got error
					    # set kill_test = 1;
					    set_current_state(
						TASK_INTERRUPTIBLE);
					    if (reader_finish)  # false
					    schedule();

    # producer still in the middle of
    # do-while cycle
    if (consumer && !(cnt % wakeup_interval))
      wake_up_process(consumer);

					    # spurious wakeup
					    while (!reader_finish &&
						   !kill_test)
					    # leaving because
					    # kill_test == 1
					    reader_finish = 0;
					    complete(&read_done);

1st BANG: We might access uninitialized "read_done" if this is the
	  the first round.

    # producer finally leaving
    # the do-while cycle because kill_test == 1;

    if (consumer) {
      reader_finish = 1;
      wake_up_process(consumer);
      wait_for_completion(&read_done);

2nd BANG: This will never complete because consumer already did
	  the completion.

2nd scenario:
-------------

CPU0					CPU1

ring_buffer_producer_thread()
  wake_up_process(consumer);
  wait_for_completion(&read_start);

					ring_buffer_consumer_thread()
					  complete(&read_start);

  ring_buffer_producer()
    # CPU3 removes the module	  <--- difference from
    # and stops producer          <--- the 1st scenario
    if (kthread_should_stop())
      kill_test = 1;

					  ring_buffer_consumer();
					    while (!reader_finish &&
						   !kill_test)
					    # kill_test == 1 => we never go
					    # into the top level while()
					    reader_finish = 0;
					    complete(&read_done);

    # producer still in the middle of
    # do-while cycle
    if (consumer && !(cnt % wakeup_interval))
      wake_up_process(consumer);

					    # spurious wakeup
					    while (!reader_finish &&
						   !kill_test)
					    # leaving because kill_test == 1
					    reader_finish = 0;
					    complete(&read_done);

BANG: We are in the same "bang" situations as in the 1st scenario.

Root of the problem:
--------------------

ring_buffer_consumer() must complete "read_done" only when "reader_finish"
variable is set. It must not be skipped because of other conditions.

Note that we still must keep the check for "reader_finish" in a loop
because there might be the spurious wakeup as described in the
above scenarios..

Solution:
----------

The top level cycle in ring_buffer_consumer() will finish only when
"reader_finish" is set. The data will be read in "while-do" cycle
so that they are not read after an error (kill_test == 1) and
the spurious wake up.

In addition, "reader_finish" is manipulated by the producer thread.
Therefore we add READ_ONCE() to make sure that the fresh value is
read in each cycle. Also we add the corresponding barrier
to synchronize the sleep check.

Next we set back TASK_RUNNING state for the situation when we
did not sleep.

Just from paranoid reasons, we initialize both completions statically.
It should be more safe if there is other race that we do not know of.

As a side effect we could remove the memory barrier from
ring_buffer_producer_thread(). IMHO, this was the reason of
the barrier. ring_buffer_reset() uses spin locks that should
provide the needed memory barrier for using the buffer.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index a1503a027ee2..045e0a24c2a0 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -24,8 +24,8 @@ struct rb_page {
 static int wakeup_interval = 100;
 
 static int reader_finish;
-static struct completion read_start;
-static struct completion read_done;
+static DECLARE_COMPLETION(read_start);
+static DECLARE_COMPLETION(read_done);
 
 static struct ring_buffer *buffer;
 static struct task_struct *producer;
@@ -178,10 +178,14 @@ static void ring_buffer_consumer(void)
 	read_events ^= 1;
 
 	read = 0;
-	while (!reader_finish && !kill_test) {
-		int found;
+	/*
+	 * Always wait until we are asked to finish and the producer
+	 * is ready to wait for the completion.
+	 */
+	while (!READ_ONCE(reader_finish)) {
+		int found = 1;
 
-		do {
+		while (found && !kill_test) {
 			int cpu;
 
 			found = 0;
@@ -195,17 +199,29 @@ static void ring_buffer_consumer(void)
 
 				if (kill_test)
 					break;
+
 				if (stat == EVENT_FOUND)
 					found = 1;
+
 			}
-		} while (found && !kill_test);
+		}
 
+		/*
+		 * Sleep a bit. Producer with wake up us when some more data
+		 * are available or when we should finish reading.
+		 */
 		set_current_state(TASK_INTERRUPTIBLE);
+		/*
+		 * Make sure that we read the updated finish variable
+		 * before producer tries to wakeup us.
+		 */
+		smp_rmb();
 		if (reader_finish)
 			break;
 
 		schedule();
 	}
+	__set_current_state(TASK_RUNNING);
 	reader_finish = 0;
 	complete(&read_done);
 }
@@ -389,13 +405,10 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	init_completion(&read_start);
-
 	while (!kthread_should_stop() && !kill_test) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
-			smp_wmb();
 			wake_up_process(consumer);
 			wait_for_completion(&read_start);
 		}
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
@ 2015-09-04  9:31       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-09-04  9:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon 2015-08-03 14:31:09, Steven Rostedt wrote:
> On Tue, 28 Jul 2015 16:39:26 +0200
> Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org> wrote:
> 
> > It looks strange to initialize the completions repeatedly.
> > 
> > This patch uses static initialization. It simplifies the code
> > and even helps to get rid of two memory barriers.
> 
> There was a reason I did it this way and did not use static
> initializers. But I can't recall why I did that. :-/
> 
> I'll have to think about this some more.

Heh, the parallel programming is a real fun. I tried to understand
the code in more details and sometimes felt like Duane Dibbley.

Anyway, I found few possible races related to the completions.
One scenario was opened by my previous fix b44754d8262d3aab8429
("ring_buffer: Allow to exit the ring buffer benchmark immediately")

The races can be fixed by the patch below. I still do not see any
scenario where the extra initialization of the two completions
is needed but I am not brave enough to remove it after all ;-)


>From ad75428b1e5e5127bf7dc6062f880ece11dbdbbf Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
Date: Fri, 28 Aug 2015 15:59:00 +0200
Subject: [PATCH 1/2] ring_buffer: Do no not complete benchmark reader too
 early

It seems that complete(&read_done) might be called too early
in some situations.

1st scenario:
-------------

CPU0					CPU1

ring_buffer_producer_thread()
  wake_up_process(consumer);
  wait_for_completion(&read_start);

					ring_buffer_consumer_thread()
					  complete(&read_start);

  ring_buffer_producer()
    # producing data in
    # the do-while cycle

					  ring_buffer_consumer();
					    # reading data
					    # got error
					    # set kill_test = 1;
					    set_current_state(
						TASK_INTERRUPTIBLE);
					    if (reader_finish)  # false
					    schedule();

    # producer still in the middle of
    # do-while cycle
    if (consumer && !(cnt % wakeup_interval))
      wake_up_process(consumer);

					    # spurious wakeup
					    while (!reader_finish &&
						   !kill_test)
					    # leaving because
					    # kill_test == 1
					    reader_finish = 0;
					    complete(&read_done);

1st BANG: We might access uninitialized "read_done" if this is the
	  the first round.

    # producer finally leaving
    # the do-while cycle because kill_test == 1;

    if (consumer) {
      reader_finish = 1;
      wake_up_process(consumer);
      wait_for_completion(&read_done);

2nd BANG: This will never complete because consumer already did
	  the completion.

2nd scenario:
-------------

CPU0					CPU1

ring_buffer_producer_thread()
  wake_up_process(consumer);
  wait_for_completion(&read_start);

					ring_buffer_consumer_thread()
					  complete(&read_start);

  ring_buffer_producer()
    # CPU3 removes the module	  <--- difference from
    # and stops producer          <--- the 1st scenario
    if (kthread_should_stop())
      kill_test = 1;

					  ring_buffer_consumer();
					    while (!reader_finish &&
						   !kill_test)
					    # kill_test == 1 => we never go
					    # into the top level while()
					    reader_finish = 0;
					    complete(&read_done);

    # producer still in the middle of
    # do-while cycle
    if (consumer && !(cnt % wakeup_interval))
      wake_up_process(consumer);

					    # spurious wakeup
					    while (!reader_finish &&
						   !kill_test)
					    # leaving because kill_test == 1
					    reader_finish = 0;
					    complete(&read_done);

BANG: We are in the same "bang" situations as in the 1st scenario.

Root of the problem:
--------------------

ring_buffer_consumer() must complete "read_done" only when "reader_finish"
variable is set. It must not be skipped because of other conditions.

Note that we still must keep the check for "reader_finish" in a loop
because there might be the spurious wakeup as described in the
above scenarios..

Solution:
----------

The top level cycle in ring_buffer_consumer() will finish only when
"reader_finish" is set. The data will be read in "while-do" cycle
so that they are not read after an error (kill_test == 1) and
the spurious wake up.

In addition, "reader_finish" is manipulated by the producer thread.
Therefore we add READ_ONCE() to make sure that the fresh value is
read in each cycle. Also we add the corresponding barrier
to synchronize the sleep check.

Next we set back TASK_RUNNING state for the situation when we
did not sleep.

Just from paranoid reasons, we initialize both completions statically.
It should be more safe if there is other race that we do not know of.

As a side effect we could remove the memory barrier from
ring_buffer_producer_thread(). IMHO, this was the reason of
the barrier. ring_buffer_reset() uses spin locks that should
provide the needed memory barrier for using the buffer.

Signed-off-by: Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
---
 kernel/trace/ring_buffer_benchmark.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index a1503a027ee2..045e0a24c2a0 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -24,8 +24,8 @@ struct rb_page {
 static int wakeup_interval = 100;
 
 static int reader_finish;
-static struct completion read_start;
-static struct completion read_done;
+static DECLARE_COMPLETION(read_start);
+static DECLARE_COMPLETION(read_done);
 
 static struct ring_buffer *buffer;
 static struct task_struct *producer;
@@ -178,10 +178,14 @@ static void ring_buffer_consumer(void)
 	read_events ^= 1;
 
 	read = 0;
-	while (!reader_finish && !kill_test) {
-		int found;
+	/*
+	 * Always wait until we are asked to finish and the producer
+	 * is ready to wait for the completion.
+	 */
+	while (!READ_ONCE(reader_finish)) {
+		int found = 1;
 
-		do {
+		while (found && !kill_test) {
 			int cpu;
 
 			found = 0;
@@ -195,17 +199,29 @@ static void ring_buffer_consumer(void)
 
 				if (kill_test)
 					break;
+
 				if (stat == EVENT_FOUND)
 					found = 1;
+
 			}
-		} while (found && !kill_test);
+		}
 
+		/*
+		 * Sleep a bit. Producer with wake up us when some more data
+		 * are available or when we should finish reading.
+		 */
 		set_current_state(TASK_INTERRUPTIBLE);
+		/*
+		 * Make sure that we read the updated finish variable
+		 * before producer tries to wakeup us.
+		 */
+		smp_rmb();
 		if (reader_finish)
 			break;
 
 		schedule();
 	}
+	__set_current_state(TASK_RUNNING);
 	reader_finish = 0;
 	complete(&read_done);
 }
@@ -389,13 +405,10 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	init_completion(&read_start);
-
 	while (!kthread_should_stop() && !kill_test) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
-			smp_wmb();
 			wake_up_process(consumer);
 			wait_for_completion(&read_start);
 		}
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
@ 2015-09-04  9:31       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-09-04  9:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Mon 2015-08-03 14:31:09, Steven Rostedt wrote:
> On Tue, 28 Jul 2015 16:39:26 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > It looks strange to initialize the completions repeatedly.
> > 
> > This patch uses static initialization. It simplifies the code
> > and even helps to get rid of two memory barriers.
> 
> There was a reason I did it this way and did not use static
> initializers. But I can't recall why I did that. :-/
> 
> I'll have to think about this some more.

Heh, the parallel programming is a real fun. I tried to understand
the code in more details and sometimes felt like Duane Dibbley.

Anyway, I found few possible races related to the completions.
One scenario was opened by my previous fix b44754d8262d3aab8429
("ring_buffer: Allow to exit the ring buffer benchmark immediately")

The races can be fixed by the patch below. I still do not see any
scenario where the extra initialization of the two completions
is needed but I am not brave enough to remove it after all ;-)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
  2015-08-03 18:33     ` Steven Rostedt
  (?)
@ 2015-09-04  9:38       ` Petr Mladek
  -1 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-09-04  9:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Mon 2015-08-03 14:33:23, Steven Rostedt wrote:
> On Tue, 28 Jul 2015 16:39:27 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > @@ -384,7 +389,7 @@ static int ring_buffer_consumer_thread(void *arg)
> >  
> >  static int ring_buffer_producer_thread(void *arg)
> >  {
> > -	while (!kthread_should_stop() && !kill_test) {
> > +	while (!break_test()) {
> >  		ring_buffer_reset(buffer);
> >  
> >  		if (consumer) {
> > @@ -393,11 +398,15 @@ static int ring_buffer_producer_thread(void *arg)
> >  		}
> >  
> >  		ring_buffer_producer();
> > -		if (kill_test)
> > +		if (break_test())
> >  			goto out_kill;
> >  
> >  		trace_printk("Sleeping for 10 secs\n");
> >  		set_current_state(TASK_INTERRUPTIBLE);
> > +		if (break_test()) {
> > +			__set_current_state(TASK_RUNNING);
> 
> Move the setting of the current state to after the out_kill label.

Please, find below the updated version of this patch.

I also reverted some changes in the consumer code. It never stays
in a loop for too long and it must stay in ring_buffer_producer()
until "reader_finish" variable is set.


>From 7f5b1a5b8cf8248245897b55ffc51a6d74c8e15b Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Fri, 19 Jun 2015 14:38:36 +0200
Subject: [PATCH 2/2] ring_buffer: Fix more races when terminating the producer
 in the benchmark

The commit b44754d8262d3aab8 ("ring_buffer: Allow to exit the ring
buffer benchmark immediately") added a hack into ring_buffer_producer()
that set @kill_test when kthread_should_stop() returned true. It improved
the situation a lot. It stopped the kthread in most cases because
the producer spent most of the time in the patched while cycle.

But there are still few possible races when kthread_should_stop()
is set outside of the cycle. Then we do not set @kill_test and
some other checks pass.

This patch adds a better fix. It renames @test_kill/TEST_KILL() into
a better descriptive @test_error/TEST_ERROR(). Also it introduces
break_test() function that checks for both @test_error and
kthread_should_stop().

The new function is used in the producer when the check for @test_error
is not enough. It is not used in the consumer because its state
is manipulated by the producer via the "reader_finish" variable.

Also we add a missing check into ring_buffer_producer_thread()
between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
Otherwise, we might miss a wakeup from kthread_stop().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 54 +++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 045e0a24c2a0..d1bfe4399e96 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -60,12 +60,12 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 
 static int read_events;
 
-static int kill_test;
+static int test_error;
 
-#define KILL_TEST()				\
+#define TEST_ERROR()				\
 	do {					\
-		if (!kill_test) {		\
-			kill_test = 1;		\
+		if (!test_error) {		\
+			test_error = 1;		\
 			WARN_ON(1);		\
 		}				\
 	} while (0)
@@ -75,6 +75,11 @@ enum event_status {
 	EVENT_DROPPED,
 };
 
+static bool break_test(void)
+{
+	return test_error || kthread_should_stop();
+}
+
 static enum event_status read_event(int cpu)
 {
 	struct ring_buffer_event *event;
@@ -87,7 +92,7 @@ static enum event_status read_event(int cpu)
 
 	entry = ring_buffer_event_data(event);
 	if (*entry != cpu) {
-		KILL_TEST();
+		TEST_ERROR();
 		return EVENT_DROPPED;
 	}
 
@@ -115,10 +120,10 @@ static enum event_status read_page(int cpu)
 		rpage = bpage;
 		/* The commit may have missed event flags set, clear them */
 		commit = local_read(&rpage->commit) & 0xfffff;
-		for (i = 0; i < commit && !kill_test; i += inc) {
+		for (i = 0; i < commit && !test_error ; i += inc) {
 
 			if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 
@@ -128,7 +133,7 @@ static enum event_status read_page(int cpu)
 			case RINGBUF_TYPE_PADDING:
 				/* failed writes may be discarded events */
 				if (!event->time_delta)
-					KILL_TEST();
+					TEST_ERROR();
 				inc = event->array[0] + 4;
 				break;
 			case RINGBUF_TYPE_TIME_EXTEND:
@@ -137,12 +142,12 @@ static enum event_status read_page(int cpu)
 			case 0:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				if (!event->array[0]) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				inc = event->array[0] + 4;
@@ -150,17 +155,17 @@ static enum event_status read_page(int cpu)
 			default:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				inc = ((event->type_len + 1) * 4);
 			}
-			if (kill_test)
+			if (test_error)
 				break;
 
 			if (inc <= 0) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 		}
@@ -185,7 +190,7 @@ static void ring_buffer_consumer(void)
 	while (!READ_ONCE(reader_finish)) {
 		int found = 1;
 
-		while (found && !kill_test) {
+		while (found && !test_error) {
 			int cpu;
 
 			found = 0;
@@ -197,7 +202,7 @@ static void ring_buffer_consumer(void)
 				else
 					stat = read_page(cpu);
 
-				if (kill_test)
+				if (test_error)
 					break;
 
 				if (stat == EVENT_FOUND)
@@ -279,10 +284,7 @@ static void ring_buffer_producer(void)
 		if (cnt % wakeup_interval)
 			cond_resched();
 #endif
-		if (kthread_should_stop())
-			kill_test = 1;
-
-	} while (ktime_before(end_time, timeout) && !kill_test);
+	} while (ktime_before(end_time, timeout) && !break_test());
 	trace_printk("End ring buffer hammer\n");
 
 	if (consumer) {
@@ -303,7 +305,7 @@ static void ring_buffer_producer(void)
 	entries = ring_buffer_entries(buffer);
 	overruns = ring_buffer_overruns(buffer);
 
-	if (kill_test && !kthread_should_stop())
+	if (test_error)
 		trace_printk("ERROR!\n");
 
 	if (!disable_reader) {
@@ -384,15 +386,14 @@ static void wait_to_die(void)
 
 static int ring_buffer_consumer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		complete(&read_start);
 
 		ring_buffer_consumer();
 
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (kthread_should_stop() || kill_test)
+		if (break_test())
 			break;
-
 		schedule();
 	}
 	__set_current_state(TASK_RUNNING);
@@ -405,7 +406,7 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
@@ -414,15 +415,18 @@ static int ring_buffer_producer_thread(void *arg)
 		}
 
 		ring_buffer_producer();
-		if (kill_test)
+		if (break_test())
 			goto out_kill;
 
 		trace_printk("Sleeping for 10 secs\n");
 		set_current_state(TASK_INTERRUPTIBLE);
+		if (break_test())
+			goto out_kill;
 		schedule_timeout(HZ * SLEEP_TIME);
 	}
 
 out_kill:
+	__set_current_state(TASK_RUNNING);
 	if (!kthread_should_stop())
 		wait_to_die();
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
@ 2015-09-04  9:38       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-09-04  9:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Mon 2015-08-03 14:33:23, Steven Rostedt wrote:
> On Tue, 28 Jul 2015 16:39:27 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > @@ -384,7 +389,7 @@ static int ring_buffer_consumer_thread(void *arg)
> >  
> >  static int ring_buffer_producer_thread(void *arg)
> >  {
> > -	while (!kthread_should_stop() && !kill_test) {
> > +	while (!break_test()) {
> >  		ring_buffer_reset(buffer);
> >  
> >  		if (consumer) {
> > @@ -393,11 +398,15 @@ static int ring_buffer_producer_thread(void *arg)
> >  		}
> >  
> >  		ring_buffer_producer();
> > -		if (kill_test)
> > +		if (break_test())
> >  			goto out_kill;
> >  
> >  		trace_printk("Sleeping for 10 secs\n");
> >  		set_current_state(TASK_INTERRUPTIBLE);
> > +		if (break_test()) {
> > +			__set_current_state(TASK_RUNNING);
> 
> Move the setting of the current state to after the out_kill label.

Please, find below the updated version of this patch.

I also reverted some changes in the consumer code. It never stays
in a loop for too long and it must stay in ring_buffer_producer()
until "reader_finish" variable is set.


>From 7f5b1a5b8cf8248245897b55ffc51a6d74c8e15b Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Fri, 19 Jun 2015 14:38:36 +0200
Subject: [PATCH 2/2] ring_buffer: Fix more races when terminating the producer
 in the benchmark

The commit b44754d8262d3aab8 ("ring_buffer: Allow to exit the ring
buffer benchmark immediately") added a hack into ring_buffer_producer()
that set @kill_test when kthread_should_stop() returned true. It improved
the situation a lot. It stopped the kthread in most cases because
the producer spent most of the time in the patched while cycle.

But there are still few possible races when kthread_should_stop()
is set outside of the cycle. Then we do not set @kill_test and
some other checks pass.

This patch adds a better fix. It renames @test_kill/TEST_KILL() into
a better descriptive @test_error/TEST_ERROR(). Also it introduces
break_test() function that checks for both @test_error and
kthread_should_stop().

The new function is used in the producer when the check for @test_error
is not enough. It is not used in the consumer because its state
is manipulated by the producer via the "reader_finish" variable.

Also we add a missing check into ring_buffer_producer_thread()
between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
Otherwise, we might miss a wakeup from kthread_stop().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 54 +++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 045e0a24c2a0..d1bfe4399e96 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -60,12 +60,12 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 
 static int read_events;
 
-static int kill_test;
+static int test_error;
 
-#define KILL_TEST()				\
+#define TEST_ERROR()				\
 	do {					\
-		if (!kill_test) {		\
-			kill_test = 1;		\
+		if (!test_error) {		\
+			test_error = 1;		\
 			WARN_ON(1);		\
 		}				\
 	} while (0)
@@ -75,6 +75,11 @@ enum event_status {
 	EVENT_DROPPED,
 };
 
+static bool break_test(void)
+{
+	return test_error || kthread_should_stop();
+}
+
 static enum event_status read_event(int cpu)
 {
 	struct ring_buffer_event *event;
@@ -87,7 +92,7 @@ static enum event_status read_event(int cpu)
 
 	entry = ring_buffer_event_data(event);
 	if (*entry != cpu) {
-		KILL_TEST();
+		TEST_ERROR();
 		return EVENT_DROPPED;
 	}
 
@@ -115,10 +120,10 @@ static enum event_status read_page(int cpu)
 		rpage = bpage;
 		/* The commit may have missed event flags set, clear them */
 		commit = local_read(&rpage->commit) & 0xfffff;
-		for (i = 0; i < commit && !kill_test; i += inc) {
+		for (i = 0; i < commit && !test_error ; i += inc) {
 
 			if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 
@@ -128,7 +133,7 @@ static enum event_status read_page(int cpu)
 			case RINGBUF_TYPE_PADDING:
 				/* failed writes may be discarded events */
 				if (!event->time_delta)
-					KILL_TEST();
+					TEST_ERROR();
 				inc = event->array[0] + 4;
 				break;
 			case RINGBUF_TYPE_TIME_EXTEND:
@@ -137,12 +142,12 @@ static enum event_status read_page(int cpu)
 			case 0:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				if (!event->array[0]) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				inc = event->array[0] + 4;
@@ -150,17 +155,17 @@ static enum event_status read_page(int cpu)
 			default:
 				entry = ring_buffer_event_data(event);
 				if (*entry != cpu) {
-					KILL_TEST();
+					TEST_ERROR();
 					break;
 				}
 				read++;
 				inc = ((event->type_len + 1) * 4);
 			}
-			if (kill_test)
+			if (test_error)
 				break;
 
 			if (inc <= 0) {
-				KILL_TEST();
+				TEST_ERROR();
 				break;
 			}
 		}
@@ -185,7 +190,7 @@ static void ring_buffer_consumer(void)
 	while (!READ_ONCE(reader_finish)) {
 		int found = 1;
 
-		while (found && !kill_test) {
+		while (found && !test_error) {
 			int cpu;
 
 			found = 0;
@@ -197,7 +202,7 @@ static void ring_buffer_consumer(void)
 				else
 					stat = read_page(cpu);
 
-				if (kill_test)
+				if (test_error)
 					break;
 
 				if (stat == EVENT_FOUND)
@@ -279,10 +284,7 @@ static void ring_buffer_producer(void)
 		if (cnt % wakeup_interval)
 			cond_resched();
 #endif
-		if (kthread_should_stop())
-			kill_test = 1;
-
-	} while (ktime_before(end_time, timeout) && !kill_test);
+	} while (ktime_before(end_time, timeout) && !break_test());
 	trace_printk("End ring buffer hammer\n");
 
 	if (consumer) {
@@ -303,7 +305,7 @@ static void ring_buffer_producer(void)
 	entries = ring_buffer_entries(buffer);
 	overruns = ring_buffer_overruns(buffer);
 
-	if (kill_test && !kthread_should_stop())
+	if (test_error)
 		trace_printk("ERROR!\n");
 
 	if (!disable_reader) {
@@ -384,15 +386,14 @@ static void wait_to_die(void)
 
 static int ring_buffer_consumer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		complete(&read_start);
 
 		ring_buffer_consumer();
 
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (kthread_should_stop() || kill_test)
+		if (break_test())
 			break;
-
 		schedule();
 	}
 	__set_current_state(TASK_RUNNING);
@@ -405,7 +406,7 @@ static int ring_buffer_consumer_thread(void *arg)
 
 static int ring_buffer_producer_thread(void *arg)
 {
-	while (!kthread_should_stop() && !kill_test) {
+	while (!break_test()) {
 		ring_buffer_reset(buffer);
 
 		if (consumer) {
@@ -414,15 +415,18 @@ static int ring_buffer_producer_thread(void *arg)
 		}
 
 		ring_buffer_producer();
-		if (kill_test)
+		if (break_test())
 			goto out_kill;
 
 		trace_printk("Sleeping for 10 secs\n");
 		set_current_state(TASK_INTERRUPTIBLE);
+		if (break_test())
+			goto out_kill;
 		schedule_timeout(HZ * SLEEP_TIME);
 	}
 
 out_kill:
+	__set_current_state(TASK_RUNNING);
 	if (!kthread_should_stop())
 		wait_to_die();
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
@ 2015-09-04  9:38       ` Petr Mladek
  0 siblings, 0 replies; 86+ messages in thread
From: Petr Mladek @ 2015-09-04  9:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Mon 2015-08-03 14:33:23, Steven Rostedt wrote:
> On Tue, 28 Jul 2015 16:39:27 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > @@ -384,7 +389,7 @@ static int ring_buffer_consumer_thread(void *arg)
> >  
> >  static int ring_buffer_producer_thread(void *arg)
> >  {
> > -	while (!kthread_should_stop() && !kill_test) {
> > +	while (!break_test()) {
> >  		ring_buffer_reset(buffer);
> >  
> >  		if (consumer) {
> > @@ -393,11 +398,15 @@ static int ring_buffer_producer_thread(void *arg)
> >  		}
> >  
> >  		ring_buffer_producer();
> > -		if (kill_test)
> > +		if (break_test())
> >  			goto out_kill;
> >  
> >  		trace_printk("Sleeping for 10 secs\n");
> >  		set_current_state(TASK_INTERRUPTIBLE);
> > +		if (break_test()) {
> > +			__set_current_state(TASK_RUNNING);
> 
> Move the setting of the current state to after the out_kill label.

Please, find below the updated version of this patch.

I also reverted some changes in the consumer code. It never stays
in a loop for too long and it must stay in ring_buffer_producer()
until "reader_finish" variable is set.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
  2015-09-04  9:31       ` Petr Mladek
@ 2015-09-04 13:15         ` Steven Rostedt
  -1 siblings, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2015-09-04 13:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Fri, 4 Sep 2015 11:31:26 +0200
Petr Mladek <pmladek@suse.com> wrote:

> 1st scenario:
> -------------
> 
> CPU0					CPU1
> 
> ring_buffer_producer_thread()
>   wake_up_process(consumer);
>   wait_for_completion(&read_start);
> 
> 					ring_buffer_consumer_thread()
> 					  complete(&read_start);
> 
>   ring_buffer_producer()
>     # producing data in
>     # the do-while cycle
> 
> 					  ring_buffer_consumer();
> 					    # reading data
> 					    # got error

So you're saying the error condition can cause this race? OK, I'll
admit that. Although, I don't think it's that big of a bug because the
error condition will also trigger a WARN_ON() and it means the ring
buffer code is broken, which also means the kernel is broken. Things
that go wrong after that is just tough luck.

But I'm not saying we couldn't fix it either.

> 					    # set kill_test = 1;
> 					    set_current_state(
> 						TASK_INTERRUPTIBLE);
> 					    if (reader_finish)  # false
> 					    schedule();
> 
>     # producer still in the middle of
>     # do-while cycle
>     if (consumer && !(cnt % wakeup_interval))
>       wake_up_process(consumer);
> 
> 					    # spurious wakeup
> 					    while (!reader_finish &&
> 						   !kill_test)
> 					    # leaving because
> 					    # kill_test == 1
> 					    reader_finish = 0;
> 					    complete(&read_done);
> 
> 1st BANG: We might access uninitialized "read_done" if this is the
> 	  the first round.
> 
>     # producer finally leaving
>     # the do-while cycle because kill_test == 1;
> 
>     if (consumer) {
>       reader_finish = 1;
>       wake_up_process(consumer);
>       wait_for_completion(&read_done);
> 
> 2nd BANG: This will never complete because consumer already did
> 	  the completion.
> 
> 2nd scenario:
> -------------
> 
> CPU0					CPU1
> 
> ring_buffer_producer_thread()
>   wake_up_process(consumer);
>   wait_for_completion(&read_start);
> 
> 					ring_buffer_consumer_thread()
> 					  complete(&read_start);
> 
>   ring_buffer_producer()
>     # CPU3 removes the module	  <--- difference from
>     # and stops producer          <--- the 1st scenario
>     if (kthread_should_stop())
>       kill_test = 1;
> 
> 					  ring_buffer_consumer();
> 					    while (!reader_finish &&
> 						   !kill_test)
> 					    # kill_test == 1 => we never go
> 					    # into the top level while()
> 					    reader_finish = 0;
> 					    complete(&read_done);
> 
>     # producer still in the middle of
>     # do-while cycle
>     if (consumer && !(cnt % wakeup_interval))
>       wake_up_process(consumer);
> 
> 					    # spurious wakeup
> 					    while (!reader_finish &&
> 						   !kill_test)
> 					    # leaving because kill_test == 1
> 					    reader_finish = 0;
> 					    complete(&read_done);
> 
> BANG: We are in the same "bang" situations as in the 1st scenario.

This scenario I believe is a true bug, because it can happen on a
kernel that is not broken.

> 
> Root of the problem:
> --------------------
> 
> ring_buffer_consumer() must complete "read_done" only when "reader_finish"
> variable is set. It must not be skipped because of other conditions.

"It must not be skipped due to other conditions."

> 
> Note that we still must keep the check for "reader_finish" in a loop
> because there might be the spurious wakeup as described in the

"might be spurious wakeups"

> above scenarios..
> 
> Solution:
> ----------
> 
> The top level cycle in ring_buffer_consumer() will finish only when
> "reader_finish" is set. The data will be read in "while-do" cycle
> so that they are not read after an error (kill_test == 1) and
> the spurious wake up.

"or a spurious wake up"

> 
> In addition, "reader_finish" is manipulated by the producer thread.
> Therefore we add READ_ONCE() to make sure that the fresh value is
> read in each cycle. Also we add the corresponding barrier
> to synchronize the sleep check.
> 
> Next we set back TASK_RUNNING state for the situation when we
> did not sleep.

"Next we set the state back to TASK_RUNNING for the situation where we
did not sleep"

> 
> Just from paranoid reasons, we initialize both completions statically.
> It should be more safe if there is other race that we do not know of.

"This is safer, in case there are other races that we are unaware of."


> 
> As a side effect we could remove the memory barrier from
> ring_buffer_producer_thread(). IMHO, this was the reason of

"the reason for"

> the barrier. ring_buffer_reset() uses spin locks that should
> provide the needed memory barrier for using the buffer.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/trace/ring_buffer_benchmark.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
> index a1503a027ee2..045e0a24c2a0 100644
> --- a/kernel/trace/ring_buffer_benchmark.c
> +++ b/kernel/trace/ring_buffer_benchmark.c
> @@ -24,8 +24,8 @@ struct rb_page {
>  static int wakeup_interval = 100;
>  
>  static int reader_finish;
> -static struct completion read_start;
> -static struct completion read_done;
> +static DECLARE_COMPLETION(read_start);
> +static DECLARE_COMPLETION(read_done);
>  
>  static struct ring_buffer *buffer;
>  static struct task_struct *producer;
> @@ -178,10 +178,14 @@ static void ring_buffer_consumer(void)
>  	read_events ^= 1;
>  
>  	read = 0;
> -	while (!reader_finish && !kill_test) {
> -		int found;
> +	/*
> +	 * Always wait until we are asked to finish and the producer
> +	 * is ready to wait for the completion.

"Continue running until the producer specifically asks to stop and is
ready for the completion."

> +	 */
> +	while (!READ_ONCE(reader_finish)) {
> +		int found = 1;
>  
> -		do {
> +		while (found && !kill_test) {
>  			int cpu;
>  
>  			found = 0;
> @@ -195,17 +199,29 @@ static void ring_buffer_consumer(void)
>  
>  				if (kill_test)
>  					break;
> +
>  				if (stat == EVENT_FOUND)
>  					found = 1;
> +
>  			}
> -		} while (found && !kill_test);
> +		}
>  
> +		/*
> +		 * Sleep a bit. Producer with wake up us when some more data
> +		 * are available or when we should finish reading.

"Wait till the producer wakes us up when there is more data available or
when the producer wants us to finish reading"

> +		 */
>  		set_current_state(TASK_INTERRUPTIBLE);
> +		/*
> +		 * Make sure that we read the updated finish variable
> +		 * before producer tries to wakeup us.
> +		 */
> +		smp_rmb();

The above is unneeded. Look at the definition of set_current_state().

>  		if (reader_finish)
>  			break;
>  
>  		schedule();
>  	}
> +	__set_current_state(TASK_RUNNING);
>  	reader_finish = 0;
>  	complete(&read_done);
>  }
> @@ -389,13 +405,10 @@ static int ring_buffer_consumer_thread(void *arg)
>  
>  static int ring_buffer_producer_thread(void *arg)
>  {
> -	init_completion(&read_start);
> -
>  	while (!kthread_should_stop() && !kill_test) {
>  		ring_buffer_reset(buffer);
>  
>  		if (consumer) {
> -			smp_wmb();
>  			wake_up_process(consumer);
>  			wait_for_completion(&read_start);
>  		}

Please make the above changes and submit the patch again as a separate
patch.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark
@ 2015-09-04 13:15         ` Steven Rostedt
  0 siblings, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2015-09-04 13:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

On Fri, 4 Sep 2015 11:31:26 +0200
Petr Mladek <pmladek@suse.com> wrote:

> 1st scenario:
> -------------
> 
> CPU0					CPU1
> 
> ring_buffer_producer_thread()
>   wake_up_process(consumer);
>   wait_for_completion(&read_start);
> 
> 					ring_buffer_consumer_thread()
> 					  complete(&read_start);
> 
>   ring_buffer_producer()
>     # producing data in
>     # the do-while cycle
> 
> 					  ring_buffer_consumer();
> 					    # reading data
> 					    # got error

So you're saying the error condition can cause this race? OK, I'll
admit that. Although, I don't think it's that big of a bug because the
error condition will also trigger a WARN_ON() and it means the ring
buffer code is broken, which also means the kernel is broken. Things
that go wrong after that is just tough luck.

But I'm not saying we couldn't fix it either.

> 					    # set kill_test = 1;
> 					    set_current_state(
> 						TASK_INTERRUPTIBLE);
> 					    if (reader_finish)  # false
> 					    schedule();
> 
>     # producer still in the middle of
>     # do-while cycle
>     if (consumer && !(cnt % wakeup_interval))
>       wake_up_process(consumer);
> 
> 					    # spurious wakeup
> 					    while (!reader_finish &&
> 						   !kill_test)
> 					    # leaving because
> 					    # kill_test == 1
> 					    reader_finish = 0;
> 					    complete(&read_done);
> 
> 1st BANG: We might access uninitialized "read_done" if this is the
> 	  the first round.
> 
>     # producer finally leaving
>     # the do-while cycle because kill_test == 1;
> 
>     if (consumer) {
>       reader_finish = 1;
>       wake_up_process(consumer);
>       wait_for_completion(&read_done);
> 
> 2nd BANG: This will never complete because consumer already did
> 	  the completion.
> 
> 2nd scenario:
> -------------
> 
> CPU0					CPU1
> 
> ring_buffer_producer_thread()
>   wake_up_process(consumer);
>   wait_for_completion(&read_start);
> 
> 					ring_buffer_consumer_thread()
> 					  complete(&read_start);
> 
>   ring_buffer_producer()
>     # CPU3 removes the module	  <--- difference from
>     # and stops producer          <--- the 1st scenario
>     if (kthread_should_stop())
>       kill_test = 1;
> 
> 					  ring_buffer_consumer();
> 					    while (!reader_finish &&
> 						   !kill_test)
> 					    # kill_test == 1 => we never go
> 					    # into the top level while()
> 					    reader_finish = 0;
> 					    complete(&read_done);
> 
>     # producer still in the middle of
>     # do-while cycle
>     if (consumer && !(cnt % wakeup_interval))
>       wake_up_process(consumer);
> 
> 					    # spurious wakeup
> 					    while (!reader_finish &&
> 						   !kill_test)
> 					    # leaving because kill_test == 1
> 					    reader_finish = 0;
> 					    complete(&read_done);
> 
> BANG: We are in the same "bang" situations as in the 1st scenario.

This scenario I believe is a true bug, because it can happen on a
kernel that is not broken.

> 
> Root of the problem:
> --------------------
> 
> ring_buffer_consumer() must complete "read_done" only when "reader_finish"
> variable is set. It must not be skipped because of other conditions.

"It must not be skipped due to other conditions."

> 
> Note that we still must keep the check for "reader_finish" in a loop
> because there might be the spurious wakeup as described in the

"might be spurious wakeups"

> above scenarios..
> 
> Solution:
> ----------
> 
> The top level cycle in ring_buffer_consumer() will finish only when
> "reader_finish" is set. The data will be read in "while-do" cycle
> so that they are not read after an error (kill_test == 1) and
> the spurious wake up.

"or a spurious wake up"

> 
> In addition, "reader_finish" is manipulated by the producer thread.
> Therefore we add READ_ONCE() to make sure that the fresh value is
> read in each cycle. Also we add the corresponding barrier
> to synchronize the sleep check.
> 
> Next we set back TASK_RUNNING state for the situation when we
> did not sleep.

"Next we set the state back to TASK_RUNNING for the situation where we
did not sleep"

> 
> Just from paranoid reasons, we initialize both completions statically.
> It should be more safe if there is other race that we do not know of.

"This is safer, in case there are other races that we are unaware of."


> 
> As a side effect we could remove the memory barrier from
> ring_buffer_producer_thread(). IMHO, this was the reason of

"the reason for"

> the barrier. ring_buffer_reset() uses spin locks that should
> provide the needed memory barrier for using the buffer.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/trace/ring_buffer_benchmark.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
> index a1503a027ee2..045e0a24c2a0 100644
> --- a/kernel/trace/ring_buffer_benchmark.c
> +++ b/kernel/trace/ring_buffer_benchmark.c
> @@ -24,8 +24,8 @@ struct rb_page {
>  static int wakeup_interval = 100;
>  
>  static int reader_finish;
> -static struct completion read_start;
> -static struct completion read_done;
> +static DECLARE_COMPLETION(read_start);
> +static DECLARE_COMPLETION(read_done);
>  
>  static struct ring_buffer *buffer;
>  static struct task_struct *producer;
> @@ -178,10 +178,14 @@ static void ring_buffer_consumer(void)
>  	read_events ^= 1;
>  
>  	read = 0;
> -	while (!reader_finish && !kill_test) {
> -		int found;
> +	/*
> +	 * Always wait until we are asked to finish and the producer
> +	 * is ready to wait for the completion.

"Continue running until the producer specifically asks to stop and is
ready for the completion."

> +	 */
> +	while (!READ_ONCE(reader_finish)) {
> +		int found = 1;
>  
> -		do {
> +		while (found && !kill_test) {
>  			int cpu;
>  
>  			found = 0;
> @@ -195,17 +199,29 @@ static void ring_buffer_consumer(void)
>  
>  				if (kill_test)
>  					break;
> +
>  				if (stat == EVENT_FOUND)
>  					found = 1;
> +
>  			}
> -		} while (found && !kill_test);
> +		}
>  
> +		/*
> +		 * Sleep a bit. Producer with wake up us when some more data
> +		 * are available or when we should finish reading.

"Wait till the producer wakes us up when there is more data available or
when the producer wants us to finish reading"

> +		 */
>  		set_current_state(TASK_INTERRUPTIBLE);
> +		/*
> +		 * Make sure that we read the updated finish variable
> +		 * before producer tries to wakeup us.
> +		 */
> +		smp_rmb();

The above is unneeded. Look at the definition of set_current_state().

>  		if (reader_finish)
>  			break;
>  
>  		schedule();
>  	}
> +	__set_current_state(TASK_RUNNING);
>  	reader_finish = 0;
>  	complete(&read_done);
>  }
> @@ -389,13 +405,10 @@ static int ring_buffer_consumer_thread(void *arg)
>  
>  static int ring_buffer_producer_thread(void *arg)
>  {
> -	init_completion(&read_start);
> -
>  	while (!kthread_should_stop() && !kill_test) {
>  		ring_buffer_reset(buffer);
>  
>  		if (consumer) {
> -			smp_wmb();
>  			wake_up_process(consumer);
>  			wait_for_completion(&read_start);
>  		}

Please make the above changes and submit the patch again as a separate
patch.

Thanks!

-- Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
@ 2015-09-07 17:49         ` Oleg Nesterov
  0 siblings, 0 replies; 86+ messages in thread
From: Oleg Nesterov @ 2015-09-07 17:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Andrew Morton, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Sorry, I didn't read these emails, and I never looked at this code...
Can't understand what are you talking about but a minor nit anyway ;)

On 09/04, Petr Mladek wrote:
>
> +	__set_current_state(TASK_RUNNING);
>  	if (!kthread_should_stop())
>  		wait_to_die();

I bet this wait_to_die() can die, consumer/producer can simply exit.

Just you need get_task_struct() after kthread_create(), and put_task_struct()
after kthread_stop().

Oleg.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
@ 2015-09-07 17:49         ` Oleg Nesterov
  0 siblings, 0 replies; 86+ messages in thread
From: Oleg Nesterov @ 2015-09-07 17:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Andrew Morton, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	live-patching-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Sorry, I didn't read these emails, and I never looked at this code...
Can't understand what are you talking about but a minor nit anyway ;)

On 09/04, Petr Mladek wrote:
>
> +	__set_current_state(TASK_RUNNING);
>  	if (!kthread_should_stop())
>  		wait_to_die();

I bet this wait_to_die() can die, consumer/producer can simply exit.

Just you need get_task_struct() after kthread_create(), and put_task_struct()
after kthread_stop().

Oleg.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer in the benchmark
@ 2015-09-07 17:49         ` Oleg Nesterov
  0 siblings, 0 replies; 86+ messages in thread
From: Oleg Nesterov @ 2015-09-07 17:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Andrew Morton, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, live-patching, linux-api,
	linux-kernel

Sorry, I didn't read these emails, and I never looked at this code...
Can't understand what are you talking about but a minor nit anyway ;)

On 09/04, Petr Mladek wrote:
>
> +	__set_current_state(TASK_RUNNING);
>  	if (!kthread_should_stop())
>  		wait_to_die();

I bet this wait_to_die() can die, consumer/producer can simply exit.

Just you need get_task_struct() after kthread_create(), and put_task_struct()
after kthread_stop().

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2015-09-07 17:52 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-28 14:39 [RFC PATCH 00/14] kthread: Use kthread worker API more widely Petr Mladek
2015-07-28 14:39 ` Petr Mladek
2015-07-28 14:39 ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 01/14] kthread: Allow to call __kthread_create_on_node() with va_list args Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 02/14] kthread: Add create_kthread_worker*() Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 03/14] kthread: Add drain_kthread_worker() Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 17:18   ` Tejun Heo
2015-07-28 17:18     ` Tejun Heo
2015-07-28 17:18     ` Tejun Heo
2015-07-29 10:04     ` Petr Mladek
2015-07-29 10:04       ` Petr Mladek
2015-07-29 10:04       ` Petr Mladek
2015-07-29 15:03       ` Tejun Heo
2015-07-29 15:03         ` Tejun Heo
2015-07-29 15:03         ` Tejun Heo
2015-07-28 14:39 ` [RFC PATCH 04/14] kthread: Add destroy_kthread_worker() Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 05/14] kthread: Add wakeup_and_destroy_kthread_worker() Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 17:23   ` Tejun Heo
2015-07-28 17:23     ` Tejun Heo
2015-07-28 17:23     ` Tejun Heo
2015-07-28 14:39 ` [RFC PATCH 06/14] kthread: Add kthread_worker_created() Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 17:26   ` Tejun Heo
2015-07-28 17:26     ` Tejun Heo
2015-07-28 17:26     ` Tejun Heo
2015-07-29 10:07     ` Petr Mladek
2015-07-29 10:07       ` Petr Mladek
2015-07-29 10:07       ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 07/14] mm/huge_page: Convert khugepaged() into kthread worker API Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 17:36   ` Tejun Heo
2015-07-28 17:36     ` Tejun Heo
2015-07-29 11:32     ` Petr Mladek
2015-07-29 11:32       ` Petr Mladek
2015-07-29 11:32       ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 08/14] rcu: Convert RCU gp kthreads " Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 17:37   ` Tejun Heo
2015-07-28 17:37     ` Tejun Heo
2015-07-28 14:39 ` [RFC PATCH 09/14] ring_buffer: Initialize completions statically in the benchmark Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-08-03 18:31   ` Steven Rostedt
2015-08-03 18:31     ` Steven Rostedt
2015-09-04  9:31     ` Petr Mladek
2015-09-04  9:31       ` Petr Mladek
2015-09-04  9:31       ` Petr Mladek
2015-09-04 13:15       ` Steven Rostedt
2015-09-04 13:15         ` Steven Rostedt
2015-07-28 14:39 ` [RFC PATCH 10/14] ring_buffer: Fix more races when terminating the producer " Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-08-03 18:33   ` Steven Rostedt
2015-08-03 18:33     ` Steven Rostedt
2015-09-04  9:38     ` Petr Mladek
2015-09-04  9:38       ` Petr Mladek
2015-09-04  9:38       ` Petr Mladek
2015-09-07 17:49       ` Oleg Nesterov
2015-09-07 17:49         ` Oleg Nesterov
2015-09-07 17:49         ` Oleg Nesterov
2015-07-28 14:39 ` [RFC PATCH 11/14] ring_buffer: Use kthread worker API for the producer kthread " Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 12/14] kthread_worker: Better support freezable kthread workers Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 14:39 ` [RFC PATCH 13/14] kthread_worker: Add set_kthread_worker_user_nice() Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 17:40   ` Tejun Heo
2015-07-28 17:40     ` Tejun Heo
2015-07-28 17:40     ` Tejun Heo
2015-07-29 11:23     ` Petr Mladek
2015-07-29 11:23       ` Petr Mladek
2015-07-29 11:23       ` Petr Mladek
2015-07-29 15:12       ` Tejun Heo
2015-07-29 15:12         ` Tejun Heo
2015-07-29 15:12         ` Tejun Heo
2015-07-28 14:39 ` [RFC PATCH 14/14] kthread_worker: Add set_kthread_worker_scheduler*() Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 14:39   ` Petr Mladek
2015-07-28 17:41   ` Tejun Heo
2015-07-28 17:41     ` Tejun Heo
2015-07-28 19:48     ` Peter Zijlstra
2015-07-28 19:48       ` Peter Zijlstra
2015-07-28 19:48       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.