All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/22] kthread: Use kthread worker API more widely
@ 2016-01-25 15:44 ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Catalin Marinas, linux-watchdog, Corey Minyard,
	openipmi-developer, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, Maxim Levitsky, Zhang Rui, Eduardo Valentin,
	Jacob Pan

My intention is to make it easier to manipulate and maintain kthreads.
Especially, I want to replace all the custom main cycles with a
generic one. Also I want to make the kthreads sleep in a consistent
state in a common place when there is no work.

My first attempt was with a brand new API (iterant kthread), see
http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
directed to improve the existing kthread worker API. This is
the 3rd iteration of the new direction.


1st patch: add support to check if a timer callback is being called

2nd..12th patches: improve the existing kthread worker API

13th..18th, 20th, 22nd patches: convert several kthreads into
      the kthread worker API, namely: khugepaged, ring buffer
      benchmark, hung_task, kmemleak, ipmi, IB/fmr_pool,
      memstick/r592, intel_powerclamp
      
19th, 21st patches: do some preparation steps; they usually do
      some clean up that makes sense even without the conversion.


Changes against v3:

  + allow to free struct kthread_work from its callback; do not touch
    the struct from the worker post-mortem; as a side effect, the structure
    must be reinitialized when the worker gets restarted; updated
    khugepaged, and kmemleak accordingly

  + call del_timer_sync() with worker->lock; instead, detect canceling
    in the timer callback and give up an attempt to get the lock there;
    do busy loop with spin_is_locked() to reduce cache bouncing

  + renamed ipmi+func() -> ipmi_kthread_worker_func() as suggested
    by Corey

  + added some collected Reviewed-by

  
Changes against v2:

  + used worker->lock to synchronize the operations with the work
    instead of the PENDING bit as suggested by Tejun Heo; it simplified
    the implementation in several ways

  + added timer_active(); used it together with del_timer_sync()
    to cancel the work a less tricky way

  + removed the controversial conversion of the RCU kthreads

  + added several other examples: hung_task, kmemleak, ipmi,
    IB/fmr_pool, memstick/r592, intel_powerclamp

  + the helper fixes for the ring buffer benchmark has been improved
    as suggested by Steven; they already are in the Linus tree now

  + fixed a possible race between the check for existing khugepaged
    worker and queuing the work
 

Changes against v1:

  + remove wrappers to manipulate the scheduling policy and priority

  + remove questionable wakeup_and_destroy_kthread_worker() variant

  + do not check for chained work when draining the queue

  + allocate struct kthread worker in create_kthread_work() and
    use more simple checks for running worker

  + add support for delayed kthread works and use them instead
    of waiting inside the works

  + rework the "unrelated" fixes for the ring buffer benchmark
    as discussed in the 1st RFC; also sent separately

  + convert also the consumer in the ring buffer benchmark


I have tested this patch set against the stable Linus tree
for 4.5-rc1.

Comments against v3 can be found at
http://thread.gmane.org/gmane.linux.kernel.mm/141493


Petr Mladek (22):
  timer: Allow to check when the timer callback has not finished yet
  kthread/smpboot: Do not park in kthread_create_on_cpu()
  kthread: Allow to call __kthread_create_on_node() with va_list args
  kthread: Add create_kthread_worker*()
  kthread: Add drain_kthread_worker()
  kthread: Add destroy_kthread_worker()
  kthread: Detect when a kthread work is used by more workers
  kthread: Initial support for delayed kthread work
  kthread: Allow to cancel kthread work
  kthread: Allow to modify delayed kthread work
  kthread: Better support freezable kthread workers
  kthread: Use try_lock_kthread_work() in flush_kthread_work()
  mm/huge_page: Convert khugepaged() into kthread worker API
  ring_buffer: Convert benchmark kthreads into kthread worker API
  hung_task: Convert hungtaskd into kthread worker API
  kmemleak: Convert kmemleak kthread into kthread worker API
  ipmi: Convert kipmi kthread into kthread worker API
  IB/fmr_pool: Convert the cleanup thread into kthread worker API
  memstick/r592: Better synchronize debug messages in r592_io kthread
  memstick/r592: convert r592_io kthread into kthread worker API
  thermal/intel_powerclamp: Remove duplicated code that starts the
    kthread
  thermal/intel_powerclamp: Convert the kthread to kthread worker API

 drivers/char/ipmi/ipmi_si_intf.c     | 121 ++++---
 drivers/infiniband/core/fmr_pool.c   |  54 ++-
 drivers/memstick/host/r592.c         |  61 ++--
 drivers/memstick/host/r592.h         |   5 +-
 drivers/thermal/intel_powerclamp.c   | 302 +++++++++--------
 include/linux/kthread.h              |  56 ++++
 include/linux/timer.h                |   2 +
 kernel/hung_task.c                   |  41 ++-
 kernel/kthread.c                     | 619 +++++++++++++++++++++++++++++++----
 kernel/smpboot.c                     |   5 +
 kernel/time/timer.c                  |  24 ++
 kernel/trace/ring_buffer_benchmark.c | 133 ++++----
 mm/huge_memory.c                     | 138 ++++----
 mm/kmemleak.c                        |  87 +++--
 14 files changed, 1151 insertions(+), 497 deletions(-)

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: linux-watchdog@vger.kernel.org
CC: Corey Minyard <minyard@acm.org>
CC: openipmi-developer@lists.sourceforge.net
CC: Doug Ledford <dledford@redhat.com>
CC: Sean Hefty <sean.hefty@intel.com>
CC: Hal Rosenstock <hal.rosenstock@gmail.com>
CC: linux-rdma@vger.kernel.org
CC: Maxim Levitsky <maximlevitsky@gmail.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: linux-pm@vger.kernel.org

-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v4 00/22] kthread: Use kthread worker API more widely
@ 2016-01-25 15:44 ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Catalin Marinas, linux-watchdog, Corey Minyard,
	openipmi-developer, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, Maxim Levitsky, Zhang Rui, Eduardo Valentin,
	Jacob Pan, linux-pm

My intention is to make it easier to manipulate and maintain kthreads.
Especially, I want to replace all the custom main cycles with a
generic one. Also I want to make the kthreads sleep in a consistent
state in a common place when there is no work.

My first attempt was with a brand new API (iterant kthread), see
http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
directed to improve the existing kthread worker API. This is
the 3rd iteration of the new direction.


1st patch: add support to check if a timer callback is being called

2nd..12th patches: improve the existing kthread worker API

13th..18th, 20th, 22nd patches: convert several kthreads into
      the kthread worker API, namely: khugepaged, ring buffer
      benchmark, hung_task, kmemleak, ipmi, IB/fmr_pool,
      memstick/r592, intel_powerclamp
      
19th, 21st patches: do some preparation steps; they usually do
      some clean up that makes sense even without the conversion.


Changes against v3:

  + allow to free struct kthread_work from its callback; do not touch
    the struct from the worker post-mortem; as a side effect, the structure
    must be reinitialized when the worker gets restarted; updated
    khugepaged, and kmemleak accordingly

  + call del_timer_sync() with worker->lock; instead, detect canceling
    in the timer callback and give up an attempt to get the lock there;
    do busy loop with spin_is_locked() to reduce cache bouncing

  + renamed ipmi+func() -> ipmi_kthread_worker_func() as suggested
    by Corey

  + added some collected Reviewed-by

  
Changes against v2:

  + used worker->lock to synchronize the operations with the work
    instead of the PENDING bit as suggested by Tejun Heo; it simplified
    the implementation in several ways

  + added timer_active(); used it together with del_timer_sync()
    to cancel the work a less tricky way

  + removed the controversial conversion of the RCU kthreads

  + added several other examples: hung_task, kmemleak, ipmi,
    IB/fmr_pool, memstick/r592, intel_powerclamp

  + the helper fixes for the ring buffer benchmark has been improved
    as suggested by Steven; they already are in the Linus tree now

  + fixed a possible race between the check for existing khugepaged
    worker and queuing the work
 

Changes against v1:

  + remove wrappers to manipulate the scheduling policy and priority

  + remove questionable wakeup_and_destroy_kthread_worker() variant

  + do not check for chained work when draining the queue

  + allocate struct kthread worker in create_kthread_work() and
    use more simple checks for running worker

  + add support for delayed kthread works and use them instead
    of waiting inside the works

  + rework the "unrelated" fixes for the ring buffer benchmark
    as discussed in the 1st RFC; also sent separately

  + convert also the consumer in the ring buffer benchmark


I have tested this patch set against the stable Linus tree
for 4.5-rc1.

Comments against v3 can be found at
http://thread.gmane.org/gmane.linux.kernel.mm/141493


Petr Mladek (22):
  timer: Allow to check when the timer callback has not finished yet
  kthread/smpboot: Do not park in kthread_create_on_cpu()
  kthread: Allow to call __kthread_create_on_node() with va_list args
  kthread: Add create_kthread_worker*()
  kthread: Add drain_kthread_worker()
  kthread: Add destroy_kthread_worker()
  kthread: Detect when a kthread work is used by more workers
  kthread: Initial support for delayed kthread work
  kthread: Allow to cancel kthread work
  kthread: Allow to modify delayed kthread work
  kthread: Better support freezable kthread workers
  kthread: Use try_lock_kthread_work() in flush_kthread_work()
  mm/huge_page: Convert khugepaged() into kthread worker API
  ring_buffer: Convert benchmark kthreads into kthread worker API
  hung_task: Convert hungtaskd into kthread worker API
  kmemleak: Convert kmemleak kthread into kthread worker API
  ipmi: Convert kipmi kthread into kthread worker API
  IB/fmr_pool: Convert the cleanup thread into kthread worker API
  memstick/r592: Better synchronize debug messages in r592_io kthread
  memstick/r592: convert r592_io kthread into kthread worker API
  thermal/intel_powerclamp: Remove duplicated code that starts the
    kthread
  thermal/intel_powerclamp: Convert the kthread to kthread worker API

 drivers/char/ipmi/ipmi_si_intf.c     | 121 ++++---
 drivers/infiniband/core/fmr_pool.c   |  54 ++-
 drivers/memstick/host/r592.c         |  61 ++--
 drivers/memstick/host/r592.h         |   5 +-
 drivers/thermal/intel_powerclamp.c   | 302 +++++++++--------
 include/linux/kthread.h              |  56 ++++
 include/linux/timer.h                |   2 +
 kernel/hung_task.c                   |  41 ++-
 kernel/kthread.c                     | 619 +++++++++++++++++++++++++++++++----
 kernel/smpboot.c                     |   5 +
 kernel/time/timer.c                  |  24 ++
 kernel/trace/ring_buffer_benchmark.c | 133 ++++----
 mm/huge_memory.c                     | 138 ++++----
 mm/kmemleak.c                        |  87 +++--
 14 files changed, 1151 insertions(+), 497 deletions(-)

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: linux-watchdog@vger.kernel.org
CC: Corey Minyard <minyard@acm.org>
CC: openipmi-developer@lists.sourceforge.net
CC: Doug Ledford <dledford@redhat.com>
CC: Sean Hefty <sean.hefty@intel.com>
CC: Hal Rosenstock <hal.rosenstock@gmail.com>
CC: linux-rdma@vger.kernel.org
CC: Maxim Levitsky <maximlevitsky@gmail.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: linux-pm@vger.kernel.org

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v4 00/22] kthread: Use kthread worker API more widely
@ 2016-01-25 15:44 ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Catalin Marinas, linux-watchdog, Corey Minyard,
	openipmi-developer, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, Maxim Levitsky, Zhang Rui, Eduardo Valentin,
	Jacob Pan, linux-pm

My intention is to make it easier to manipulate and maintain kthreads.
Especially, I want to replace all the custom main cycles with a
generic one. Also I want to make the kthreads sleep in a consistent
state in a common place when there is no work.

My first attempt was with a brand new API (iterant kthread), see
http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
directed to improve the existing kthread worker API. This is
the 3rd iteration of the new direction.


1st patch: add support to check if a timer callback is being called

2nd..12th patches: improve the existing kthread worker API

13th..18th, 20th, 22nd patches: convert several kthreads into
      the kthread worker API, namely: khugepaged, ring buffer
      benchmark, hung_task, kmemleak, ipmi, IB/fmr_pool,
      memstick/r592, intel_powerclamp
      
19th, 21st patches: do some preparation steps; they usually do
      some clean up that makes sense even without the conversion.


Changes against v3:

  + allow to free struct kthread_work from its callback; do not touch
    the struct from the worker post-mortem; as a side effect, the structure
    must be reinitialized when the worker gets restarted; updated
    khugepaged, and kmemleak accordingly

  + call del_timer_sync() with worker->lock; instead, detect canceling
    in the timer callback and give up an attempt to get the lock there;
    do busy loop with spin_is_locked() to reduce cache bouncing

  + renamed ipmi+func() -> ipmi_kthread_worker_func() as suggested
    by Corey

  + added some collected Reviewed-by

  
Changes against v2:

  + used worker->lock to synchronize the operations with the work
    instead of the PENDING bit as suggested by Tejun Heo; it simplified
    the implementation in several ways

  + added timer_active(); used it together with del_timer_sync()
    to cancel the work a less tricky way

  + removed the controversial conversion of the RCU kthreads

  + added several other examples: hung_task, kmemleak, ipmi,
    IB/fmr_pool, memstick/r592, intel_powerclamp

  + the helper fixes for the ring buffer benchmark has been improved
    as suggested by Steven; they already are in the Linus tree now

  + fixed a possible race between the check for existing khugepaged
    worker and queuing the work
 

Changes against v1:

  + remove wrappers to manipulate the scheduling policy and priority

  + remove questionable wakeup_and_destroy_kthread_worker() variant

  + do not check for chained work when draining the queue

  + allocate struct kthread worker in create_kthread_work() and
    use more simple checks for running worker

  + add support for delayed kthread works and use them instead
    of waiting inside the works

  + rework the "unrelated" fixes for the ring buffer benchmark
    as discussed in the 1st RFC; also sent separately

  + convert also the consumer in the ring buffer benchmark


I have tested this patch set against the stable Linus tree
for 4.5-rc1.

Comments against v3 can be found at
http://thread.gmane.org/gmane.linux.kernel.mm/141493


Petr Mladek (22):
  timer: Allow to check when the timer callback has not finished yet
  kthread/smpboot: Do not park in kthread_create_on_cpu()
  kthread: Allow to call __kthread_create_on_node() with va_list args
  kthread: Add create_kthread_worker*()
  kthread: Add drain_kthread_worker()
  kthread: Add destroy_kthread_worker()
  kthread: Detect when a kthread work is used by more workers
  kthread: Initial support for delayed kthread work
  kthread: Allow to cancel kthread work
  kthread: Allow to modify delayed kthread work
  kthread: Better support freezable kthread workers
  kthread: Use try_lock_kthread_work() in flush_kthread_work()
  mm/huge_page: Convert khugepaged() into kthread worker API
  ring_buffer: Convert benchmark kthreads into kthread worker API
  hung_task: Convert hungtaskd into kthread worker API
  kmemleak: Convert kmemleak kthread into kthread worker API
  ipmi: Convert kipmi kthread into kthread worker API
  IB/fmr_pool: Convert the cleanup thread into kthread worker API
  memstick/r592: Better synchronize debug messages in r592_io kthread
  memstick/r592: convert r592_io kthread into kthread worker API
  thermal/intel_powerclamp: Remove duplicated code that starts the
    kthread
  thermal/intel_powerclamp: Convert the kthread to kthread worker API

 drivers/char/ipmi/ipmi_si_intf.c     | 121 ++++---
 drivers/infiniband/core/fmr_pool.c   |  54 ++-
 drivers/memstick/host/r592.c         |  61 ++--
 drivers/memstick/host/r592.h         |   5 +-
 drivers/thermal/intel_powerclamp.c   | 302 +++++++++--------
 include/linux/kthread.h              |  56 ++++
 include/linux/timer.h                |   2 +
 kernel/hung_task.c                   |  41 ++-
 kernel/kthread.c                     | 619 +++++++++++++++++++++++++++++++----
 kernel/smpboot.c                     |   5 +
 kernel/time/timer.c                  |  24 ++
 kernel/trace/ring_buffer_benchmark.c | 133 ++++----
 mm/huge_memory.c                     | 138 ++++----
 mm/kmemleak.c                        |  87 +++--
 14 files changed, 1151 insertions(+), 497 deletions(-)

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: linux-watchdog@vger.kernel.org
CC: Corey Minyard <minyard@acm.org>
CC: openipmi-developer@lists.sourceforge.net
CC: Doug Ledford <dledford@redhat.com>
CC: Sean Hefty <sean.hefty@intel.com>
CC: Hal Rosenstock <hal.rosenstock@gmail.com>
CC: linux-rdma@vger.kernel.org
CC: Maxim Levitsky <maximlevitsky@gmail.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: linux-pm@vger.kernel.org

-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v4 01/22] timer: Allow to check when the timer callback has not finished yet
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

timer_pending() checks whether the list of callbacks is empty.
Each callback is removed from the list before it is called,
see call_timer_fn() in __run_timers().

Sometimes we need to make sure that the callback has finished.
For example, if we want to free some resources that are accessed
by the callback.

For this purpose, this patch adds timer_active(). It checks both
the list of callbacks and the running_timer. It takes the base_lock
to see a consistent state.

I plan to use it to implement delayed works in kthread worker.
But I guess that it will have wider use. In fact, I wonder if
timer_pending() is misused in some situations.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/timer.h |  2 ++
 kernel/time/timer.c   | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 61aa61dc410c..237b7c3e2b4e 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -165,6 +165,8 @@ static inline int timer_pending(const struct timer_list * timer)
 	return timer->entry.pprev != NULL;
 }
 
+extern int timer_active(struct timer_list *timer);
+
 extern void add_timer_on(struct timer_list *timer, int cpu);
 extern int del_timer(struct timer_list * timer);
 extern int mod_timer(struct timer_list *timer, unsigned long expires);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index bbc5d1114583..1c16f3230771 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -778,6 +778,30 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
 	}
 }
 
+/**
+ * timer_active - is a timer still in use?
+ * @timer: the timer in question
+ *
+ * timer_in_use() will tell whether the timer is pending or if the callback
+ * is curretly running.
+ *
+ * Use this function if you want to make sure that some resources
+ * will not longer get accessed by the timer callback. timer_pending()
+ * is not safe in this case.
+ */
+int timer_active(struct timer_list *timer)
+{
+	struct tvec_base *base;
+	unsigned long flags;
+	int ret;
+
+	base = lock_timer_base(timer, &flags);
+	ret = timer_pending(timer) || base->running_timer == timer;
+	spin_unlock_irqrestore(&base->lock, flags);
+
+	return ret;
+}
+
 static inline int
 __mod_timer(struct timer_list *timer, unsigned long expires,
 	    bool pending_only, int pinned)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 01/22] timer: Allow to check when the timer callback has not finished yet
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

timer_pending() checks whether the list of callbacks is empty.
Each callback is removed from the list before it is called,
see call_timer_fn() in __run_timers().

Sometimes we need to make sure that the callback has finished.
For example, if we want to free some resources that are accessed
by the callback.

For this purpose, this patch adds timer_active(). It checks both
the list of callbacks and the running_timer. It takes the base_lock
to see a consistent state.

I plan to use it to implement delayed works in kthread worker.
But I guess that it will have wider use. In fact, I wonder if
timer_pending() is misused in some situations.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/timer.h |  2 ++
 kernel/time/timer.c   | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 61aa61dc410c..237b7c3e2b4e 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -165,6 +165,8 @@ static inline int timer_pending(const struct timer_list * timer)
 	return timer->entry.pprev != NULL;
 }
 
+extern int timer_active(struct timer_list *timer);
+
 extern void add_timer_on(struct timer_list *timer, int cpu);
 extern int del_timer(struct timer_list * timer);
 extern int mod_timer(struct timer_list *timer, unsigned long expires);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index bbc5d1114583..1c16f3230771 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -778,6 +778,30 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
 	}
 }
 
+/**
+ * timer_active - is a timer still in use?
+ * @timer: the timer in question
+ *
+ * timer_in_use() will tell whether the timer is pending or if the callback
+ * is curretly running.
+ *
+ * Use this function if you want to make sure that some resources
+ * will not longer get accessed by the timer callback. timer_pending()
+ * is not safe in this case.
+ */
+int timer_active(struct timer_list *timer)
+{
+	struct tvec_base *base;
+	unsigned long flags;
+	int ret;
+
+	base = lock_timer_base(timer, &flags);
+	ret = timer_pending(timer) || base->running_timer == timer;
+	spin_unlock_irqrestore(&base->lock, flags);
+
+	return ret;
+}
+
 static inline int
 __mod_timer(struct timer_list *timer, unsigned long expires,
 	    bool pending_only, int pinned)
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 02/22] kthread/smpboot: Do not park in kthread_create_on_cpu()
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e
("kthread: Implement park/unpark facility"). It is currently used
only when enabling new CPU. For this purpose, the newly created
kthread has to be parked.

The CPU binding is a bit tricky. The kthread is parked when the CPU
has not been allowed yet. And the CPU is bound when the kthread
is unparked.

The function would be useful for more per-CPU kthreads, e.g.
bnx2fc_thread, fcoethread. For this purpose, the newly created
kthread should stay in the uninterruptible state.

This patch moves the parking into smpboot. It binds the thread
already when created. Then the function might be used universally.
Also the behavior is consistent with kthread_create() and
kthread_create_on_node().

Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/kthread.c | 8 ++++++--
 kernel/smpboot.c | 5 +++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173dca1ae..1ffc11ec5546 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -390,10 +390,10 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 				   cpu);
 	if (IS_ERR(p))
 		return p;
+	kthread_bind(p, cpu);
+	/* CPU hotplug need to bind once again when unparking the thread. */
 	set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
 	to_kthread(p)->cpu = cpu;
-	/* Park the thread to get it out of TASK_UNINTERRUPTIBLE state */
-	kthread_park(p);
 	return p;
 }
 
@@ -407,6 +407,10 @@ static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
 	 * which might be about to be cleared.
 	 */
 	if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
+		/*
+		 * Newly created kthread was parked when the CPU was offline.
+		 * The binding was lost and we need to set it again.
+		 */
 		if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
 			__kthread_bind(k, kthread->cpu, TASK_PARKED);
 		wake_up_state(k, TASK_PARKED);
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index d264f59bff56..79f07014be6e 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -186,6 +186,11 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
 		kfree(td);
 		return PTR_ERR(tsk);
 	}
+	/*
+	 * Park the thread so that it could start right on the CPU
+	 * when it is available.
+	 */
+	kthread_park(tsk);
 	get_task_struct(tsk);
 	*per_cpu_ptr(ht->store, cpu) = tsk;
 	if (ht->create) {
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 02/22] kthread/smpboot: Do not park in kthread_create_on_cpu()
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e
("kthread: Implement park/unpark facility"). It is currently used
only when enabling new CPU. For this purpose, the newly created
kthread has to be parked.

The CPU binding is a bit tricky. The kthread is parked when the CPU
has not been allowed yet. And the CPU is bound when the kthread
is unparked.

The function would be useful for more per-CPU kthreads, e.g.
bnx2fc_thread, fcoethread. For this purpose, the newly created
kthread should stay in the uninterruptible state.

This patch moves the parking into smpboot. It binds the thread
already when created. Then the function might be used universally.
Also the behavior is consistent with kthread_create() and
kthread_create_on_node().

Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/kthread.c | 8 ++++++--
 kernel/smpboot.c | 5 +++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173dca1ae..1ffc11ec5546 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -390,10 +390,10 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 				   cpu);
 	if (IS_ERR(p))
 		return p;
+	kthread_bind(p, cpu);
+	/* CPU hotplug need to bind once again when unparking the thread. */
 	set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
 	to_kthread(p)->cpu = cpu;
-	/* Park the thread to get it out of TASK_UNINTERRUPTIBLE state */
-	kthread_park(p);
 	return p;
 }
 
@@ -407,6 +407,10 @@ static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
 	 * which might be about to be cleared.
 	 */
 	if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
+		/*
+		 * Newly created kthread was parked when the CPU was offline.
+		 * The binding was lost and we need to set it again.
+		 */
 		if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
 			__kthread_bind(k, kthread->cpu, TASK_PARKED);
 		wake_up_state(k, TASK_PARKED);
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index d264f59bff56..79f07014be6e 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -186,6 +186,11 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
 		kfree(td);
 		return PTR_ERR(tsk);
 	}
+	/*
+	 * Park the thread so that it could start right on the CPU
+	 * when it is available.
+	 */
+	kthread_park(tsk);
 	get_task_struct(tsk);
 	*per_cpu_ptr(ht->store, cpu) = tsk;
 	if (ht->create) {
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 03/22] kthread: Allow to call __kthread_create_on_node() with va_list args
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

kthread_create_on_node() implements a bunch of logic to create
the kthread. It is already called by kthread_create_on_cpu().

We are going to extend the kthread worker API and will
need to call kthread_create_on_node() with va_list args there.

This patch does only a refactoring and does not modify the existing
behavior.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 72 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1ffc11ec5546..bfe8742c4217 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -244,33 +244,10 @@ static void create_kthread(struct kthread_create_info *create)
 	}
 }
 
-/**
- * kthread_create_on_node - create a kthread.
- * @threadfn: the function to run until signal_pending(current).
- * @data: data ptr for @threadfn.
- * @node: task and thread structures for the thread are allocated on this node
- * @namefmt: printf-style name for the thread.
- *
- * Description: This helper function creates and names a kernel
- * thread.  The thread will be stopped: use wake_up_process() to start
- * it.  See also kthread_run().  The new thread has SCHED_NORMAL policy and
- * is affine to all CPUs.
- *
- * If thread is going to be bound on a particular cpu, give its node
- * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
- * When woken, the thread will run @threadfn() with @data as its
- * argument. @threadfn() can either call do_exit() directly if it is a
- * standalone thread for which no one will call kthread_stop(), or
- * return when 'kthread_should_stop()' is true (which means
- * kthread_stop() has been called).  The return value should be zero
- * or a negative error number; it will be passed to kthread_stop().
- *
- * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
- */
-struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
-					   void *data, int node,
-					   const char namefmt[],
-					   ...)
+static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
+						    void *data, int node,
+						    const char namefmt[],
+						    va_list args)
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 	struct task_struct *task;
@@ -311,11 +288,8 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	task = create->result;
 	if (!IS_ERR(task)) {
 		static const struct sched_param param = { .sched_priority = 0 };
-		va_list args;
 
-		va_start(args, namefmt);
 		vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
-		va_end(args);
 		/*
 		 * root may have changed our (kthreadd's) priority or CPU mask.
 		 * The kernel thread should not inherit these properties.
@@ -326,6 +300,44 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	kfree(create);
 	return task;
 }
+
+/**
+ * kthread_create_on_node - create a kthread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @node: task and thread structures for the thread are allocated on this node
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread.  The thread will be stopped: use wake_up_process() to start
+ * it.  See also kthread_run().  The new thread has SCHED_NORMAL policy and
+ * is affine to all CPUs.
+ *
+ * If thread is going to be bound on a particular cpu, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
+ * When woken, the thread will run @threadfn() with @data as its
+ * argument. @threadfn() can either call do_exit() directly if it is a
+ * standalone thread for which no one will call kthread_stop(), or
+ * return when 'kthread_should_stop()' is true (which means
+ * kthread_stop() has been called).  The return value should be zero
+ * or a negative error number; it will be passed to kthread_stop().
+ *
+ * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
+ */
+struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
+					   void *data, int node,
+					   const char namefmt[],
+					   ...)
+{
+	struct task_struct *task;
+	va_list args;
+
+	va_start(args, namefmt);
+	task = __kthread_create_on_node(threadfn, data, node, namefmt, args);
+	va_end(args);
+
+	return task;
+}
 EXPORT_SYMBOL(kthread_create_on_node);
 
 static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, long state)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 03/22] kthread: Allow to call __kthread_create_on_node() with va_list args
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

kthread_create_on_node() implements a bunch of logic to create
the kthread. It is already called by kthread_create_on_cpu().

We are going to extend the kthread worker API and will
need to call kthread_create_on_node() with va_list args there.

This patch does only a refactoring and does not modify the existing
behavior.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 72 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1ffc11ec5546..bfe8742c4217 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -244,33 +244,10 @@ static void create_kthread(struct kthread_create_info *create)
 	}
 }
 
-/**
- * kthread_create_on_node - create a kthread.
- * @threadfn: the function to run until signal_pending(current).
- * @data: data ptr for @threadfn.
- * @node: task and thread structures for the thread are allocated on this node
- * @namefmt: printf-style name for the thread.
- *
- * Description: This helper function creates and names a kernel
- * thread.  The thread will be stopped: use wake_up_process() to start
- * it.  See also kthread_run().  The new thread has SCHED_NORMAL policy and
- * is affine to all CPUs.
- *
- * If thread is going to be bound on a particular cpu, give its node
- * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
- * When woken, the thread will run @threadfn() with @data as its
- * argument. @threadfn() can either call do_exit() directly if it is a
- * standalone thread for which no one will call kthread_stop(), or
- * return when 'kthread_should_stop()' is true (which means
- * kthread_stop() has been called).  The return value should be zero
- * or a negative error number; it will be passed to kthread_stop().
- *
- * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
- */
-struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
-					   void *data, int node,
-					   const char namefmt[],
-					   ...)
+static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
+						    void *data, int node,
+						    const char namefmt[],
+						    va_list args)
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 	struct task_struct *task;
@@ -311,11 +288,8 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	task = create->result;
 	if (!IS_ERR(task)) {
 		static const struct sched_param param = { .sched_priority = 0 };
-		va_list args;
 
-		va_start(args, namefmt);
 		vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
-		va_end(args);
 		/*
 		 * root may have changed our (kthreadd's) priority or CPU mask.
 		 * The kernel thread should not inherit these properties.
@@ -326,6 +300,44 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	kfree(create);
 	return task;
 }
+
+/**
+ * kthread_create_on_node - create a kthread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @node: task and thread structures for the thread are allocated on this node
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread.  The thread will be stopped: use wake_up_process() to start
+ * it.  See also kthread_run().  The new thread has SCHED_NORMAL policy and
+ * is affine to all CPUs.
+ *
+ * If thread is going to be bound on a particular cpu, give its node
+ * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
+ * When woken, the thread will run @threadfn() with @data as its
+ * argument. @threadfn() can either call do_exit() directly if it is a
+ * standalone thread for which no one will call kthread_stop(), or
+ * return when 'kthread_should_stop()' is true (which means
+ * kthread_stop() has been called).  The return value should be zero
+ * or a negative error number; it will be passed to kthread_stop().
+ *
+ * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
+ */
+struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
+					   void *data, int node,
+					   const char namefmt[],
+					   ...)
+{
+	struct task_struct *task;
+	va_list args;
+
+	va_start(args, namefmt);
+	task = __kthread_create_on_node(threadfn, data, node, namefmt, args);
+	va_end(args);
+
+	return task;
+}
 EXPORT_SYMBOL(kthread_create_on_node);
 
 static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, long state)
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 04/22] kthread: Add create_kthread_worker*()
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines create_kthread_worker() and
create_kthread_worker_on_cpu() functions that hide implementation details.

They enforce using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think
that we do not want any alternative main thread because it would be
hard to support consistency with the rest of the kthread worker API.

The naming and function is inspired by the workqueues API like the rest
of the kthread worker API.

Note that we need to bind per-CPU kthread workers already when they are
created. It makes the life easier. kthread_bind() could not be used later
for an already running worker.

This patch does _not_ convert existing kthread workers. The kthread worker
API need more improvements first, e.g. a function to destroy the worker.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  7 ++++
 kernel/kthread.c        | 99 ++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 96 insertions(+), 10 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index e691b6a23f72..943900c7ce35 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -124,6 +124,13 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
+__printf(1, 2)
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...);
+
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfe8742c4217..df402e18bb5a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,23 +567,24 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * kthread_worker_fn - kthread function to process kthread_worker
  * @worker_ptr: pointer to initialized kthread_worker
  *
- * This function can be used as @threadfn to kthread_create() or
- * kthread_run() with @worker_ptr argument pointing to an initialized
- * kthread_worker.  The started kthread will process work_list until
- * the it is stopped with kthread_stop().  A kthread can also call
- * this function directly after extra initialization.
+ * This function implements the main cycle of kthread worker. It processes
+ * work_list until it is stopped with kthread_stop(). It sleeps when the queue
+ * is empty.
  *
- * Different kthreads can be used for the same kthread_worker as long
- * as there's only one kthread attached to it at any given time.  A
- * kthread_worker without an attached kthread simply collects queued
- * kthread_works.
+ * The works are not allowed to keep any locks, disable preemption or interrupts
+ * when they finish. There is defined a safe point for freezing when one work
+ * finishes and before a new one is started.
  */
 int kthread_worker_fn(void *worker_ptr)
 {
 	struct kthread_worker *worker = worker_ptr;
 	struct kthread_work *work;
 
-	WARN_ON(worker->task);
+	/*
+	 * FIXME: Update the check and remove the assignment when all kthread
+	 * worker users are created using create_kthread_worker*() functions.
+	 */
+	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
@@ -617,6 +618,84 @@ repeat:
 }
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
+static struct kthread_worker *
+__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+{
+	struct kthread_worker *worker;
+	struct task_struct *task;
+
+	worker = kzalloc(sizeof(*worker), GFP_KERNEL);
+	if (!worker)
+		return ERR_PTR(-ENOMEM);
+
+	init_kthread_worker(worker);
+
+	if (cpu >= 0)
+		task = kthread_create_on_cpu(kthread_worker_fn, worker,
+					     cpu, namefmt);
+	else
+		task = __kthread_create_on_node(kthread_worker_fn, worker,
+						-1, namefmt, args);
+	if (IS_ERR(task))
+		goto fail_task;
+
+	worker->task = task;
+	wake_up_process(task);
+	return worker;
+
+fail_task:
+	kfree(worker);
+	return ERR_CAST(task);
+}
+
+/**
+ * create_kthread_worker - create a kthread worker
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Returns pointer to an allocated worker on success, ERR_PTR(-ENOMEM) when
+ * the needed structures could not get allocated, and ERR_PTR(-EINTR) when
+ * the worker was SIGKILLed.
+ */
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...)
+{
+	struct kthread_worker *worker;
+	va_list args;
+
+	va_start(args, namefmt);
+	worker = __create_kthread_worker(-1, namefmt, args);
+	va_end(args);
+
+	return worker;
+}
+EXPORT_SYMBOL(create_kthread_worker);
+
+/**
+ * create_kthread_worker_on_cpu - create a kthread worker and bind it
+ *	it to a given CPU and the associated NUMA node.
+ * @cpu: CPU number
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Use a valid CPU number if you want to bind the kthread worker
+ * to the given CPU and the associated NUMA node.
+ *
+ * @namefmt might include one "%d" that will get replaced by CPU number.
+ *
+ * Returns pointer to allocated worker on success, ERR_PTR when the CPU
+ * number is not valid, ERR_PTR(-ENOMEM) when the needed structures could
+ * not get allocated, ERR_PTR(-EINTR) when the worker was SIGKILLed, and
+ * ERR_PTR(-EINVAL) on invalid @cpu.
+ */
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[])
+{
+	if (cpu < 0 || cpu > num_possible_cpus())
+		return ERR_PTR(-EINVAL);
+
+	return __create_kthread_worker(cpu, namefmt, NULL);
+}
+EXPORT_SYMBOL(create_kthread_worker_on_cpu);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines create_kthread_worker() and
create_kthread_worker_on_cpu() functions that hide implementation details.

They enforce using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think
that we do not want any alternative main thread because it would be
hard to support consistency with the rest of the kthread worker API.

The naming and function is inspired by the workqueues API like the rest
of the kthread worker API.

Note that we need to bind per-CPU kthread workers already when they are
created. It makes the life easier. kthread_bind() could not be used later
for an already running worker.

This patch does _not_ convert existing kthread workers. The kthread worker
API need more improvements first, e.g. a function to destroy the worker.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  7 ++++
 kernel/kthread.c        | 99 ++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 96 insertions(+), 10 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index e691b6a23f72..943900c7ce35 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -124,6 +124,13 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
+__printf(1, 2)
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...);
+
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
+
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfe8742c4217..df402e18bb5a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,23 +567,24 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * kthread_worker_fn - kthread function to process kthread_worker
  * @worker_ptr: pointer to initialized kthread_worker
  *
- * This function can be used as @threadfn to kthread_create() or
- * kthread_run() with @worker_ptr argument pointing to an initialized
- * kthread_worker.  The started kthread will process work_list until
- * the it is stopped with kthread_stop().  A kthread can also call
- * this function directly after extra initialization.
+ * This function implements the main cycle of kthread worker. It processes
+ * work_list until it is stopped with kthread_stop(). It sleeps when the queue
+ * is empty.
  *
- * Different kthreads can be used for the same kthread_worker as long
- * as there's only one kthread attached to it at any given time.  A
- * kthread_worker without an attached kthread simply collects queued
- * kthread_works.
+ * The works are not allowed to keep any locks, disable preemption or interrupts
+ * when they finish. There is defined a safe point for freezing when one work
+ * finishes and before a new one is started.
  */
 int kthread_worker_fn(void *worker_ptr)
 {
 	struct kthread_worker *worker = worker_ptr;
 	struct kthread_work *work;
 
-	WARN_ON(worker->task);
+	/*
+	 * FIXME: Update the check and remove the assignment when all kthread
+	 * worker users are created using create_kthread_worker*() functions.
+	 */
+	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
@@ -617,6 +618,84 @@ repeat:
 }
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
+static struct kthread_worker *
+__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+{
+	struct kthread_worker *worker;
+	struct task_struct *task;
+
+	worker = kzalloc(sizeof(*worker), GFP_KERNEL);
+	if (!worker)
+		return ERR_PTR(-ENOMEM);
+
+	init_kthread_worker(worker);
+
+	if (cpu >= 0)
+		task = kthread_create_on_cpu(kthread_worker_fn, worker,
+					     cpu, namefmt);
+	else
+		task = __kthread_create_on_node(kthread_worker_fn, worker,
+						-1, namefmt, args);
+	if (IS_ERR(task))
+		goto fail_task;
+
+	worker->task = task;
+	wake_up_process(task);
+	return worker;
+
+fail_task:
+	kfree(worker);
+	return ERR_CAST(task);
+}
+
+/**
+ * create_kthread_worker - create a kthread worker
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Returns pointer to an allocated worker on success, ERR_PTR(-ENOMEM) when
+ * the needed structures could not get allocated, and ERR_PTR(-EINTR) when
+ * the worker was SIGKILLed.
+ */
+struct kthread_worker *
+create_kthread_worker(const char namefmt[], ...)
+{
+	struct kthread_worker *worker;
+	va_list args;
+
+	va_start(args, namefmt);
+	worker = __create_kthread_worker(-1, namefmt, args);
+	va_end(args);
+
+	return worker;
+}
+EXPORT_SYMBOL(create_kthread_worker);
+
+/**
+ * create_kthread_worker_on_cpu - create a kthread worker and bind it
+ *	it to a given CPU and the associated NUMA node.
+ * @cpu: CPU number
+ * @namefmt: printf-style name for the kthread worker (task).
+ *
+ * Use a valid CPU number if you want to bind the kthread worker
+ * to the given CPU and the associated NUMA node.
+ *
+ * @namefmt might include one "%d" that will get replaced by CPU number.
+ *
+ * Returns pointer to allocated worker on success, ERR_PTR when the CPU
+ * number is not valid, ERR_PTR(-ENOMEM) when the needed structures could
+ * not get allocated, ERR_PTR(-EINTR) when the worker was SIGKILLed, and
+ * ERR_PTR(-EINVAL) on invalid @cpu.
+ */
+struct kthread_worker *
+create_kthread_worker_on_cpu(int cpu, const char namefmt[])
+{
+	if (cpu < 0 || cpu > num_possible_cpus())
+		return ERR_PTR(-EINVAL);
+
+	return __create_kthread_worker(cpu, namefmt, NULL);
+}
+EXPORT_SYMBOL(create_kthread_worker_on_cpu);
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
 			       struct kthread_work *work,
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 05/22] kthread: Add drain_kthread_worker()
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

flush_kthread_worker() returns when the currently queued works are proceed.
But some other works might have been queued in the meantime.

This patch adds drain_kthread_work() that is inspired by drain_workqueue().
It returns when the queue is completely empty and warns when it takes too
long.

The initial implementation does not block queuing new works when draining.
It makes things much easier. The blocking would be useful to debug
potential problems but it is not clear if it is worth the complication
at the moment.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index df402e18bb5a..a18ad3b58f61 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -804,3 +804,37 @@ void flush_kthread_worker(struct kthread_worker *worker)
 	wait_for_completion(&fwork.done);
 }
 EXPORT_SYMBOL_GPL(flush_kthread_worker);
+
+/**
+ * drain_kthread_worker - drain a kthread worker
+ * @worker: worker to be drained
+ *
+ * Wait until there is no work queued for the given kthread worker.
+ * @worker is flushed repeatedly until it becomes empty.  The number
+ * of flushing is determined by the depth of chaining and should
+ * be relatively short.  Whine if it takes too long.
+ *
+ * The caller is responsible for blocking all users of this kthread
+ * worker from queuing new works. Also it is responsible for blocking
+ * the already queued works from an infinite re-queuing!
+ */
+void drain_kthread_worker(struct kthread_worker *worker)
+{
+	int flush_cnt = 0;
+
+	spin_lock_irq(&worker->lock);
+
+	while (!list_empty(&worker->work_list)) {
+		spin_unlock_irq(&worker->lock);
+
+		flush_kthread_worker(worker);
+		WARN_ONCE(flush_cnt++ > 10,
+			  "kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
+			  worker->task->comm, flush_cnt);
+
+		spin_lock_irq(&worker->lock);
+	}
+
+	spin_unlock_irq(&worker->lock);
+}
+EXPORT_SYMBOL(drain_kthread_worker);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 05/22] kthread: Add drain_kthread_worker()
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

flush_kthread_worker() returns when the currently queued works are proceed.
But some other works might have been queued in the meantime.

This patch adds drain_kthread_work() that is inspired by drain_workqueue().
It returns when the queue is completely empty and warns when it takes too
long.

The initial implementation does not block queuing new works when draining.
It makes things much easier. The blocking would be useful to debug
potential problems but it is not clear if it is worth the complication
at the moment.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index df402e18bb5a..a18ad3b58f61 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -804,3 +804,37 @@ void flush_kthread_worker(struct kthread_worker *worker)
 	wait_for_completion(&fwork.done);
 }
 EXPORT_SYMBOL_GPL(flush_kthread_worker);
+
+/**
+ * drain_kthread_worker - drain a kthread worker
+ * @worker: worker to be drained
+ *
+ * Wait until there is no work queued for the given kthread worker.
+ * @worker is flushed repeatedly until it becomes empty.  The number
+ * of flushing is determined by the depth of chaining and should
+ * be relatively short.  Whine if it takes too long.
+ *
+ * The caller is responsible for blocking all users of this kthread
+ * worker from queuing new works. Also it is responsible for blocking
+ * the already queued works from an infinite re-queuing!
+ */
+void drain_kthread_worker(struct kthread_worker *worker)
+{
+	int flush_cnt = 0;
+
+	spin_lock_irq(&worker->lock);
+
+	while (!list_empty(&worker->work_list)) {
+		spin_unlock_irq(&worker->lock);
+
+		flush_kthread_worker(worker);
+		WARN_ONCE(flush_cnt++ > 10,
+			  "kthread worker %s: drain_kthread_worker() isn't complete after %u tries\n",
+			  worker->task->comm, flush_cnt);
+
+		spin_lock_irq(&worker->lock);
+	}
+
+	spin_unlock_irq(&worker->lock);
+}
+EXPORT_SYMBOL(drain_kthread_worker);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 06/22] kthread: Add destroy_kthread_worker()
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

The current kthread worker users call flush() and stop() explicitly.
This function drains the worker, stops it, and frees the kthread_worker
struct in one call.

It is supposed to be used together with create_kthread_worker*() that
allocates struct kthread_worker.

Also note that drain() correctly handles self-queuing works in compare
with flush().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  2 ++
 kernel/kthread.c        | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 943900c7ce35..c4a95a3ba500 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -136,4 +136,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+void destroy_kthread_worker(struct kthread_worker *worker);
+
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index a18ad3b58f61..1d41e0faef2d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -838,3 +838,24 @@ void drain_kthread_worker(struct kthread_worker *worker)
 	spin_unlock_irq(&worker->lock);
 }
 EXPORT_SYMBOL(drain_kthread_worker);
+
+/**
+ * destroy_kthread_worker - destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Drain and destroy @worker.  It has the same conditions
+ * for use as drain_kthread_worker(), see above.
+ */
+void destroy_kthread_worker(struct kthread_worker *worker)
+{
+	struct task_struct *task;
+
+	task = worker->task;
+	if (WARN_ON(!task))
+		return;
+
+	drain_kthread_worker(worker);
+	kthread_stop(task);
+	kfree(worker);
+}
+EXPORT_SYMBOL(destroy_kthread_worker);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 06/22] kthread: Add destroy_kthread_worker()
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

The current kthread worker users call flush() and stop() explicitly.
This function drains the worker, stops it, and frees the kthread_worker
struct in one call.

It is supposed to be used together with create_kthread_worker*() that
allocates struct kthread_worker.

Also note that drain() correctly handles self-queuing works in compare
with flush().

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  2 ++
 kernel/kthread.c        | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 943900c7ce35..c4a95a3ba500 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -136,4 +136,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+void destroy_kthread_worker(struct kthread_worker *worker);
+
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index a18ad3b58f61..1d41e0faef2d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -838,3 +838,24 @@ void drain_kthread_worker(struct kthread_worker *worker)
 	spin_unlock_irq(&worker->lock);
 }
 EXPORT_SYMBOL(drain_kthread_worker);
+
+/**
+ * destroy_kthread_worker - destroy a kthread worker
+ * @worker: worker to be destroyed
+ *
+ * Drain and destroy @worker.  It has the same conditions
+ * for use as drain_kthread_worker(), see above.
+ */
+void destroy_kthread_worker(struct kthread_worker *worker)
+{
+	struct task_struct *task;
+
+	task = worker->task;
+	if (WARN_ON(!task))
+		return;
+
+	drain_kthread_worker(worker);
+	kthread_stop(task);
+	kfree(worker);
+}
+EXPORT_SYMBOL(destroy_kthread_worker);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Nothing currently prevents a work from queuing for a kthread worker
when it is already running on another one. This means that the work
might run in parallel on more workers. Also some operations, e.g.
flush or drain are not reliable.

This problem will be even more visible after we add cancel_kthread_work()
function. It will only have "work" as the parameter and will use
worker->lock to synchronize with others.

Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.

This patch adds a warning when we try to insert the work for another
worker. It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.

A side effect is that one needs to explicitely reinitialize the work
if it must be queued into another worker. This is needed, for example,
when the worker is stopped and started again. It is a bit inconvenient.
But it looks like a good compromise between the stability and complexity.

Just for completeness, the patch adds a check for disabled interrupts
and an empty queue.

The patch also puts all the checks into a separate function. It will
be reused when implementing delayed works.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1d41e0faef2d..e12576bc0e39 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -574,6 +574,9 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * The works are not allowed to keep any locks, disable preemption or interrupts
  * when they finish. There is defined a safe point for freezing when one work
  * finishes and before a new one is started.
+ *
+ * Also the works must not be handled by more workers at the same time, see also
+ * queue_kthread_work().
  */
 int kthread_worker_fn(void *worker_ptr)
 {
@@ -696,12 +699,22 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[])
 }
 EXPORT_SYMBOL(create_kthread_worker_on_cpu);
 
+static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
+					       struct kthread_work *work)
+{
+	lockdep_assert_held(&worker->lock);
+	WARN_ON_ONCE(!irqs_disabled());
+	WARN_ON_ONCE(!list_empty(&work->node));
+	/* Do not use a work with more workers, see queue_kthread_work() */
+	WARN_ON_ONCE(work->worker && work->worker != worker);
+}
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
-			       struct kthread_work *work,
-			       struct list_head *pos)
+				struct kthread_work *work,
+				struct list_head *pos)
 {
-	lockdep_assert_held(&worker->lock);
+	insert_kthread_work_sanity_check(worker, work);
 
 	list_add_tail(&work->node, pos);
 	work->worker = worker;
@@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
  * Queue @work to work processor @task for async execution.  @task
  * must have been created with kthread_worker_create().  Returns %true
  * if @work was successfully queued, %false if it was already pending.
+ *
+ * Never queue a work into a worker when it is being processed by another
+ * one. Otherwise, some operations, e.g. cancel or flush, will not work
+ * correctly or the work might run in parallel. This is not enforced
+ * because it would make the code too complex. There are only warnings
+ * printed when such a situation is detected.
+ *
+ * Reinitialize the work if it needs to be used by another worker.
+ * For example, when the worker was stopped and started again.
  */
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Nothing currently prevents a work from queuing for a kthread worker
when it is already running on another one. This means that the work
might run in parallel on more workers. Also some operations, e.g.
flush or drain are not reliable.

This problem will be even more visible after we add cancel_kthread_work()
function. It will only have "work" as the parameter and will use
worker->lock to synchronize with others.

Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.

This patch adds a warning when we try to insert the work for another
worker. It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.

A side effect is that one needs to explicitely reinitialize the work
if it must be queued into another worker. This is needed, for example,
when the worker is stopped and started again. It is a bit inconvenient.
But it looks like a good compromise between the stability and complexity.

Just for completeness, the patch adds a check for disabled interrupts
and an empty queue.

The patch also puts all the checks into a separate function. It will
be reused when implementing delayed works.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1d41e0faef2d..e12576bc0e39 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -574,6 +574,9 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * The works are not allowed to keep any locks, disable preemption or interrupts
  * when they finish. There is defined a safe point for freezing when one work
  * finishes and before a new one is started.
+ *
+ * Also the works must not be handled by more workers at the same time, see also
+ * queue_kthread_work().
  */
 int kthread_worker_fn(void *worker_ptr)
 {
@@ -696,12 +699,22 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[])
 }
 EXPORT_SYMBOL(create_kthread_worker_on_cpu);
 
+static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
+					       struct kthread_work *work)
+{
+	lockdep_assert_held(&worker->lock);
+	WARN_ON_ONCE(!irqs_disabled());
+	WARN_ON_ONCE(!list_empty(&work->node));
+	/* Do not use a work with more workers, see queue_kthread_work() */
+	WARN_ON_ONCE(work->worker && work->worker != worker);
+}
+
 /* insert @work before @pos in @worker */
 static void insert_kthread_work(struct kthread_worker *worker,
-			       struct kthread_work *work,
-			       struct list_head *pos)
+				struct kthread_work *work,
+				struct list_head *pos)
 {
-	lockdep_assert_held(&worker->lock);
+	insert_kthread_work_sanity_check(worker, work);
 
 	list_add_tail(&work->node, pos);
 	work->worker = worker;
@@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
  * Queue @work to work processor @task for async execution.  @task
  * must have been created with kthread_worker_create().  Returns %true
  * if @work was successfully queued, %false if it was already pending.
+ *
+ * Never queue a work into a worker when it is being processed by another
+ * one. Otherwise, some operations, e.g. cancel or flush, will not work
+ * correctly or the work might run in parallel. This is not enforced
+ * because it would make the code too complex. There are only warnings
+ * printed when such a situation is detected.
+ *
+ * Reinitialize the work if it needs to be used by another worker.
+ * For example, when the worker was stopped and started again.
  */
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work)
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 08/22] kthread: Initial support for delayed kthread work
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

We are going to use kthread_worker more widely and delayed works
will be pretty useful.

The implementation is inspired by workqueues. It uses a timer to
queue the work after the requested delay. If the delay is zero,
the work is queued immediately.

In compare with workqueues, each work is associated with a single
worker (kthread). Therefore the implementation could be much easier.
In particular, we use the worker->lock to synchronized all the
operations with the work. We do not need any atomic operation
with a flags variable.

We even do not need to maintain any state variable. Instead, we
add a pointer[*] to the timer into the struct kthread_work. Then
we could detect the pending status using a rather simple check,
see the new kthread_worker_pending() function.

Finally, the timer callback knows only about the struct work.
It is better be paranoid and try to get the worker->lock carefully.
The try_lock_thread_work() function will be later useful also when
canceling the work.

[*] I considered also adding the entire struct timer_list into
    struct kthread_work. But it would increase the size from
    40 to 120 bytes on x86_64 with an often unused stuff.

    Another alternative was to add a flags variable. But this
    would add an extra code to synchronize it with the state
    of the timer.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  34 ++++++++++++
 kernel/kthread.c        | 142 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c4a95a3ba500..1a5738dcdf8d 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -63,6 +63,7 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
  */
 struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
+void delayed_kthread_work_timer_fn(unsigned long __data);
 
 struct kthread_worker {
 	spinlock_t		lock;
@@ -75,6 +76,12 @@ struct kthread_work {
 	struct list_head	node;
 	kthread_work_func_t	func;
 	struct kthread_worker	*worker;
+	struct timer_list	*timer;
+};
+
+struct delayed_kthread_work {
+	struct kthread_work work;
+	struct timer_list timer;
 };
 
 #define KTHREAD_WORKER_INIT(worker)	{				\
@@ -87,12 +94,24 @@ struct kthread_work {
 	.func = (fn),							\
 	}
 
+#define DELAYED_KTHREAD_WORK_INIT(dwork, fn) {				\
+	.work = KTHREAD_WORK_INIT((dwork).work, (fn)),			\
+	.timer = __TIMER_INITIALIZER(delayed_kthread_work_timer_fn,	\
+				     0, (unsigned long)&(dwork),	\
+				     TIMER_IRQSAFE),			\
+	.work.timer = &(dwork).timer,			\
+	}
+
 #define DEFINE_KTHREAD_WORKER(worker)					\
 	struct kthread_worker worker = KTHREAD_WORKER_INIT(worker)
 
 #define DEFINE_KTHREAD_WORK(work, fn)					\
 	struct kthread_work work = KTHREAD_WORK_INIT(work, fn)
 
+#define DEFINE_DELAYED_KTHREAD_WORK(dwork, fn)				\
+	struct delayed_kthread_work dwork =				\
+		DELAYED_KTHREAD_WORK_INIT(dwork, fn)
+
 /*
  * kthread_worker.lock needs its own lockdep class key when defined on
  * stack with lockdep enabled.  Use the following macros in such cases.
@@ -122,6 +141,16 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 		(work)->func = (fn);					\
 	} while (0)
 
+#define init_delayed_kthread_work(dwork, fn)				\
+	do {								\
+		init_kthread_work(&(dwork)->work, (fn));		\
+		__setup_timer(&(dwork)->timer,				\
+			      delayed_kthread_work_timer_fn,		\
+			      (unsigned long)(dwork),			\
+			      TIMER_IRQSAFE);				\
+		(dwork)->work.timer = &(dwork)->timer;			\
+	} while (0)
+
 int kthread_worker_fn(void *worker_ptr);
 
 __printf(1, 2)
@@ -133,6 +162,11 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
 
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
+
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+				struct delayed_kthread_work *dwork,
+				unsigned long delay);
+
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index e12576bc0e39..6933d90c0fec 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -563,6 +563,20 @@ void __init_kthread_worker(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(__init_kthread_worker);
 
+/*
+ * Returns true when there is a pending operation for this work.
+ * In particular, it checks if the work is:
+ *	- queued
+ *	- a timer is running to queue this delayed work
+ *
+ * This function must be called with locked work.
+ */
+static inline bool kthread_work_pending(const struct kthread_work *work)
+{
+	return !list_empty(&work->node) ||
+	       (work->timer && timer_active(work->timer));
+}
+
 /**
  * kthread_worker_fn - kthread function to process kthread_worker
  * @worker_ptr: pointer to initialized kthread_worker
@@ -722,6 +736,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
 		wake_up_process(worker->task);
 }
 
+/*
+ * Queue @work right into the worker queue.
+ */
+static void __queue_kthread_work(struct kthread_worker *worker,
+			  struct kthread_work *work)
+{
+	insert_kthread_work(worker, work, &worker->work_list);
+}
+
 /**
  * queue_kthread_work - queue a kthread_work
  * @worker: target kthread_worker
@@ -747,8 +770,8 @@ bool queue_kthread_work(struct kthread_worker *worker,
 	unsigned long flags;
 
 	spin_lock_irqsave(&worker->lock, flags);
-	if (list_empty(&work->node)) {
-		insert_kthread_work(worker, work, &worker->work_list);
+	if (!kthread_work_pending(work)) {
+		__queue_kthread_work(worker, work);
 		ret = true;
 	}
 	spin_unlock_irqrestore(&worker->lock, flags);
@@ -756,6 +779,121 @@ bool queue_kthread_work(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(queue_kthread_work);
 
+static bool try_lock_kthread_work(struct kthread_work *work)
+{
+	struct kthread_worker *worker;
+	int ret = false;
+
+try_again:
+	worker = work->worker;
+
+	if (!worker)
+		goto out;
+
+	spin_lock(&worker->lock);
+	if (worker != work->worker) {
+		spin_unlock(&worker->lock);
+		goto try_again;
+	}
+	ret = true;
+
+out:
+	return ret;
+}
+
+static inline void unlock_kthread_work(struct kthread_work *work)
+{
+	spin_unlock(&work->worker->lock);
+}
+
+/**
+ * delayed_kthread_work_timer_fn - callback that queues the associated delayed
+ *	kthread work when the timer expires.
+ * @__data: pointer to the data associated with the timer
+ *
+ * The format of the function is defined by struct timer_list.
+ * It should have been called from irqsafe timer with irq already off.
+ */
+void delayed_kthread_work_timer_fn(unsigned long __data)
+{
+	struct delayed_kthread_work *dwork =
+		(struct delayed_kthread_work *)__data;
+	struct kthread_work *work = &dwork->work;
+
+	if (!try_lock_kthread_work(work))
+		return;
+
+	__queue_kthread_work(work->worker, work);
+	unlock_kthread_work(work);
+}
+EXPORT_SYMBOL(delayed_kthread_work_timer_fn);
+
+void __queue_delayed_kthread_work(struct kthread_worker *worker,
+				struct delayed_kthread_work *dwork,
+				unsigned long delay)
+{
+	struct timer_list *timer = &dwork->timer;
+	struct kthread_work *work = &dwork->work;
+
+	WARN_ON_ONCE(timer->function != delayed_kthread_work_timer_fn ||
+		     timer->data != (unsigned long)dwork);
+	WARN_ON_ONCE(timer_pending(timer));
+
+	/*
+	 * If @delay is 0, queue @dwork->work immediately.  This is for
+	 * both optimization and correctness.  The earliest @timer can
+	 * expire is on the closest next tick and delayed_work users depend
+	 * on that there's no such delay when @delay is 0.
+	 */
+	if (!delay) {
+		__queue_kthread_work(worker, work);
+		return;
+	}
+
+	/* Be paranoid and try to detect possible races already now. */
+	insert_kthread_work_sanity_check(worker, work);
+
+	work->worker = worker;
+	timer_stats_timer_set_start_info(&dwork->timer);
+	timer->expires = jiffies + delay;
+	add_timer(timer);
+}
+
+/**
+ * queue_delayed_kthread_work - queue the associated kthread work
+ *	after a delay.
+ * @worker: target kthread_worker
+ * @work: kthread_work to queue
+ * delay: number of jiffies to wait before queuing
+ *
+ * If the work has not been pending it starts a timer that will queue
+ * the work after the given @delay. If @delay is zero, it queues the
+ * work immediately.
+ *
+ * Return: %false if the @work has already been pending. It means that
+ * either the timer was running or the work was queued. It returns %true
+ * otherwise.
+ */
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+				struct delayed_kthread_work *dwork,
+				unsigned long delay)
+{
+	struct kthread_work *work = &dwork->work;
+	unsigned long flags;
+	bool ret = false;
+
+	spin_lock_irqsave(&worker->lock, flags);
+
+	if (!kthread_work_pending(work)) {
+		__queue_delayed_kthread_work(worker, dwork, delay);
+		ret = true;
+	}
+
+	spin_unlock_irqrestore(&worker->lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(queue_delayed_kthread_work);
+
 struct kthread_flush_work {
 	struct kthread_work	work;
 	struct completion	done;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 08/22] kthread: Initial support for delayed kthread work
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

We are going to use kthread_worker more widely and delayed works
will be pretty useful.

The implementation is inspired by workqueues. It uses a timer to
queue the work after the requested delay. If the delay is zero,
the work is queued immediately.

In compare with workqueues, each work is associated with a single
worker (kthread). Therefore the implementation could be much easier.
In particular, we use the worker->lock to synchronized all the
operations with the work. We do not need any atomic operation
with a flags variable.

We even do not need to maintain any state variable. Instead, we
add a pointer[*] to the timer into the struct kthread_work. Then
we could detect the pending status using a rather simple check,
see the new kthread_worker_pending() function.

Finally, the timer callback knows only about the struct work.
It is better be paranoid and try to get the worker->lock carefully.
The try_lock_thread_work() function will be later useful also when
canceling the work.

[*] I considered also adding the entire struct timer_list into
    struct kthread_work. But it would increase the size from
    40 to 120 bytes on x86_64 with an often unused stuff.

    Another alternative was to add a flags variable. But this
    would add an extra code to synchronize it with the state
    of the timer.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  34 ++++++++++++
 kernel/kthread.c        | 142 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c4a95a3ba500..1a5738dcdf8d 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -63,6 +63,7 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
  */
 struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
+void delayed_kthread_work_timer_fn(unsigned long __data);
 
 struct kthread_worker {
 	spinlock_t		lock;
@@ -75,6 +76,12 @@ struct kthread_work {
 	struct list_head	node;
 	kthread_work_func_t	func;
 	struct kthread_worker	*worker;
+	struct timer_list	*timer;
+};
+
+struct delayed_kthread_work {
+	struct kthread_work work;
+	struct timer_list timer;
 };
 
 #define KTHREAD_WORKER_INIT(worker)	{				\
@@ -87,12 +94,24 @@ struct kthread_work {
 	.func = (fn),							\
 	}
 
+#define DELAYED_KTHREAD_WORK_INIT(dwork, fn) {				\
+	.work = KTHREAD_WORK_INIT((dwork).work, (fn)),			\
+	.timer = __TIMER_INITIALIZER(delayed_kthread_work_timer_fn,	\
+				     0, (unsigned long)&(dwork),	\
+				     TIMER_IRQSAFE),			\
+	.work.timer = &(dwork).timer,			\
+	}
+
 #define DEFINE_KTHREAD_WORKER(worker)					\
 	struct kthread_worker worker = KTHREAD_WORKER_INIT(worker)
 
 #define DEFINE_KTHREAD_WORK(work, fn)					\
 	struct kthread_work work = KTHREAD_WORK_INIT(work, fn)
 
+#define DEFINE_DELAYED_KTHREAD_WORK(dwork, fn)				\
+	struct delayed_kthread_work dwork =				\
+		DELAYED_KTHREAD_WORK_INIT(dwork, fn)
+
 /*
  * kthread_worker.lock needs its own lockdep class key when defined on
  * stack with lockdep enabled.  Use the following macros in such cases.
@@ -122,6 +141,16 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 		(work)->func = (fn);					\
 	} while (0)
 
+#define init_delayed_kthread_work(dwork, fn)				\
+	do {								\
+		init_kthread_work(&(dwork)->work, (fn));		\
+		__setup_timer(&(dwork)->timer,				\
+			      delayed_kthread_work_timer_fn,		\
+			      (unsigned long)(dwork),			\
+			      TIMER_IRQSAFE);				\
+		(dwork)->work.timer = &(dwork)->timer;			\
+	} while (0)
+
 int kthread_worker_fn(void *worker_ptr);
 
 __printf(1, 2)
@@ -133,6 +162,11 @@ create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
 
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
+
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+				struct delayed_kthread_work *dwork,
+				unsigned long delay);
+
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index e12576bc0e39..6933d90c0fec 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -563,6 +563,20 @@ void __init_kthread_worker(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(__init_kthread_worker);
 
+/*
+ * Returns true when there is a pending operation for this work.
+ * In particular, it checks if the work is:
+ *	- queued
+ *	- a timer is running to queue this delayed work
+ *
+ * This function must be called with locked work.
+ */
+static inline bool kthread_work_pending(const struct kthread_work *work)
+{
+	return !list_empty(&work->node) ||
+	       (work->timer && timer_active(work->timer));
+}
+
 /**
  * kthread_worker_fn - kthread function to process kthread_worker
  * @worker_ptr: pointer to initialized kthread_worker
@@ -722,6 +736,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
 		wake_up_process(worker->task);
 }
 
+/*
+ * Queue @work right into the worker queue.
+ */
+static void __queue_kthread_work(struct kthread_worker *worker,
+			  struct kthread_work *work)
+{
+	insert_kthread_work(worker, work, &worker->work_list);
+}
+
 /**
  * queue_kthread_work - queue a kthread_work
  * @worker: target kthread_worker
@@ -747,8 +770,8 @@ bool queue_kthread_work(struct kthread_worker *worker,
 	unsigned long flags;
 
 	spin_lock_irqsave(&worker->lock, flags);
-	if (list_empty(&work->node)) {
-		insert_kthread_work(worker, work, &worker->work_list);
+	if (!kthread_work_pending(work)) {
+		__queue_kthread_work(worker, work);
 		ret = true;
 	}
 	spin_unlock_irqrestore(&worker->lock, flags);
@@ -756,6 +779,121 @@ bool queue_kthread_work(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(queue_kthread_work);
 
+static bool try_lock_kthread_work(struct kthread_work *work)
+{
+	struct kthread_worker *worker;
+	int ret = false;
+
+try_again:
+	worker = work->worker;
+
+	if (!worker)
+		goto out;
+
+	spin_lock(&worker->lock);
+	if (worker != work->worker) {
+		spin_unlock(&worker->lock);
+		goto try_again;
+	}
+	ret = true;
+
+out:
+	return ret;
+}
+
+static inline void unlock_kthread_work(struct kthread_work *work)
+{
+	spin_unlock(&work->worker->lock);
+}
+
+/**
+ * delayed_kthread_work_timer_fn - callback that queues the associated delayed
+ *	kthread work when the timer expires.
+ * @__data: pointer to the data associated with the timer
+ *
+ * The format of the function is defined by struct timer_list.
+ * It should have been called from irqsafe timer with irq already off.
+ */
+void delayed_kthread_work_timer_fn(unsigned long __data)
+{
+	struct delayed_kthread_work *dwork =
+		(struct delayed_kthread_work *)__data;
+	struct kthread_work *work = &dwork->work;
+
+	if (!try_lock_kthread_work(work))
+		return;
+
+	__queue_kthread_work(work->worker, work);
+	unlock_kthread_work(work);
+}
+EXPORT_SYMBOL(delayed_kthread_work_timer_fn);
+
+void __queue_delayed_kthread_work(struct kthread_worker *worker,
+				struct delayed_kthread_work *dwork,
+				unsigned long delay)
+{
+	struct timer_list *timer = &dwork->timer;
+	struct kthread_work *work = &dwork->work;
+
+	WARN_ON_ONCE(timer->function != delayed_kthread_work_timer_fn ||
+		     timer->data != (unsigned long)dwork);
+	WARN_ON_ONCE(timer_pending(timer));
+
+	/*
+	 * If @delay is 0, queue @dwork->work immediately.  This is for
+	 * both optimization and correctness.  The earliest @timer can
+	 * expire is on the closest next tick and delayed_work users depend
+	 * on that there's no such delay when @delay is 0.
+	 */
+	if (!delay) {
+		__queue_kthread_work(worker, work);
+		return;
+	}
+
+	/* Be paranoid and try to detect possible races already now. */
+	insert_kthread_work_sanity_check(worker, work);
+
+	work->worker = worker;
+	timer_stats_timer_set_start_info(&dwork->timer);
+	timer->expires = jiffies + delay;
+	add_timer(timer);
+}
+
+/**
+ * queue_delayed_kthread_work - queue the associated kthread work
+ *	after a delay.
+ * @worker: target kthread_worker
+ * @work: kthread_work to queue
+ * delay: number of jiffies to wait before queuing
+ *
+ * If the work has not been pending it starts a timer that will queue
+ * the work after the given @delay. If @delay is zero, it queues the
+ * work immediately.
+ *
+ * Return: %false if the @work has already been pending. It means that
+ * either the timer was running or the work was queued. It returns %true
+ * otherwise.
+ */
+bool queue_delayed_kthread_work(struct kthread_worker *worker,
+				struct delayed_kthread_work *dwork,
+				unsigned long delay)
+{
+	struct kthread_work *work = &dwork->work;
+	unsigned long flags;
+	bool ret = false;
+
+	spin_lock_irqsave(&worker->lock, flags);
+
+	if (!kthread_work_pending(work)) {
+		__queue_delayed_kthread_work(worker, dwork, delay);
+		ret = true;
+	}
+
+	spin_unlock_irqrestore(&worker->lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(queue_delayed_kthread_work);
+
 struct kthread_flush_work {
 	struct kthread_work	work;
 	struct completion	done;
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 09/22] kthread: Allow to cancel kthread work
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.

This patch implements cancel_*_sync() operations as inspired by
workqueues. Well, we are synchronized against the other operations
via the worker lock, we use del_timer_sync() and a counter to count
parallel cancel operations. Therefore the implementation might be easier.

First, we try to lock the work. If it does not work, it means that
no worker is assigned and that we are done.

Second, we try to cancel the timer when it exists. The timer is deleted
synchronously to make sure that the timer call back is not running.
We hold the worker->lock at this point. To avoid a possible race,
the timer callback need to use a modified variant of try_lock_kthread_work()
and give up spinning when the canceling flag is set.

Third, we try to remove the work from the worker list.

Fourth, if the work is running, we call flush_kthread_work(). It might
take an arbitrary time. In the meantime, queuing of the work is blocked
by the new canceling counter.

As already mentioned, the check for a pending kthread work is done under
a lock. In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations. Therefore do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_work_flush(). Any queuing is blocked until the counter is zero.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |   4 ++
 kernel/kthread.c        | 154 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 154 insertions(+), 4 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 1a5738dcdf8d..dd2a587a2bd7 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -77,6 +77,7 @@ struct kthread_work {
 	kthread_work_func_t	func;
 	struct kthread_worker	*worker;
 	struct timer_list	*timer;
+	int			canceling;
 };
 
 struct delayed_kthread_work {
@@ -170,6 +171,9 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+bool cancel_kthread_work_sync(struct kthread_work *work);
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *work);
+
 void destroy_kthread_worker(struct kthread_worker *worker);
 
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 6933d90c0fec..6e2eeca08d5f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,6 +567,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * Returns true when there is a pending operation for this work.
  * In particular, it checks if the work is:
  *	- queued
+ *	- being cancelled
  *	- a timer is running to queue this delayed work
  *
  * This function must be called with locked work.
@@ -574,6 +575,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
 static inline bool kthread_work_pending(const struct kthread_work *work)
 {
 	return !list_empty(&work->node) ||
+	       work->canceling ||
 	       (work->timer && timer_active(work->timer));
 }
 
@@ -779,7 +781,13 @@ bool queue_kthread_work(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(queue_kthread_work);
 
-static bool try_lock_kthread_work(struct kthread_work *work)
+/*
+ * Get the worker lock if any worker is associated with the work.
+ * Depending on @check_canceling, it might need to give up the busy
+ * wait when work->canceling gets set.
+ */
+static bool try_lock_kthread_work(struct kthread_work *work,
+				  bool check_canceling)
 {
 	struct kthread_worker *worker;
 	int ret = false;
@@ -790,7 +798,24 @@ try_again:
 	if (!worker)
 		goto out;
 
-	spin_lock(&worker->lock);
+	if (check_canceling) {
+		if (!spin_trylock(&worker->lock)) {
+			/*
+			 * Busy wait with spin_is_locked() to avoid
+			 * cache bouncing. Break when canceling
+			 * is set to avoid a deadlock.
+			 */
+			do {
+				if (READ_ONCE(work->canceling))
+					goto out;
+				cpu_relax();
+			} while (spin_is_locked(&worker->lock));
+			goto try_again;
+		}
+	} else {
+		spin_lock(&worker->lock);
+	}
+
 	if (worker != work->worker) {
 		spin_unlock(&worker->lock);
 		goto try_again;
@@ -820,10 +845,13 @@ void delayed_kthread_work_timer_fn(unsigned long __data)
 		(struct delayed_kthread_work *)__data;
 	struct kthread_work *work = &dwork->work;
 
-	if (!try_lock_kthread_work(work))
+	/* Give up when the work is being canceled. */
+	if (!try_lock_kthread_work(work, true))
 		return;
 
-	__queue_kthread_work(work->worker, work);
+	if (!work->canceling)
+		__queue_kthread_work(work->worker, work);
+
 	unlock_kthread_work(work);
 }
 EXPORT_SYMBOL(delayed_kthread_work_timer_fn);
@@ -947,6 +975,124 @@ retry:
 EXPORT_SYMBOL_GPL(flush_kthread_work);
 
 /**
+ * try_to_cancel_kthread_work - Try to cancel kthread work.
+ * @work: work item to cancel
+ * @lock: lock used to protect the work
+ * @flags: flags stored when the lock was taken
+ *
+ * This function tries to cancel the given kthread work by deleting
+ * the timer and by removing the work from the queue.
+ *
+ * If the timer callback is in progress, it waits until it finishes
+ * but it has to drop the lock to avoid a deadlock.
+ *
+ * Return:
+ *  1		if @work was pending and successfully canceled
+ *  0		if @work was not pending
+ */
+static int
+try_to_cancel_kthread_work(struct kthread_work *work,
+				   spinlock_t *lock,
+				   unsigned long *flags)
+{
+	int ret = 0;
+
+	/* Try to cancel the timer if pending. */
+	if (work->timer && del_timer_sync(work->timer)) {
+		ret = 1;
+		goto out;
+	}
+
+	/* Try to remove queued work before it is being executed. */
+	if (!list_empty(&work->node)) {
+		list_del_init(&work->node);
+		ret = 1;
+	}
+
+out:
+	return ret;
+}
+
+static bool __cancel_kthread_work_sync(struct kthread_work *work)
+{
+	struct kthread_worker *worker;
+	unsigned long flags;
+	int ret;
+
+	local_irq_save(flags);
+	if (!try_lock_kthread_work(work, false)) {
+		local_irq_restore(flags);
+		ret = 0;
+		goto out;
+	}
+	worker = work->worker;
+
+	/*
+	 * Block further queueing. It must be set before trying to cancel
+	 * the kthread work. It avoids a possible deadlock between
+	 * del_timer_sync() and the timer callback.
+	 */
+	work->canceling++;
+	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
+
+	if (worker->current_work != work)
+		goto out_fast;
+
+	spin_unlock_irqrestore(&worker->lock, flags);
+	flush_kthread_work(work);
+	/*
+	 * Nobody is allowed to switch the worker or queue the work
+	 * when .canceling is set.
+	 */
+	spin_lock_irqsave(&worker->lock, flags);
+
+out_fast:
+	work->canceling--;
+	spin_unlock_irqrestore(&worker->lock, flags);
+out:
+	return ret;
+}
+
+/**
+ * cancel_kthread_work_sync - cancel a kthread work and wait for it to finish
+ * @work: the kthread work to cancel
+ *
+ * Cancel @work and wait for its execution to finish.  This function
+ * can be used even if the work re-queues itself. On return from this
+ * function, @work is guaranteed to be not pending or executing on any CPU.
+ *
+ * The caller must ensure that the worker on which @work was last
+ * queued can't be destroyed before this function returns.
+ *
+ * Return:
+ * %true if @work was pending, %false otherwise.
+ */
+bool cancel_kthread_work_sync(struct kthread_work *work)
+{
+	/* Rather use cancel_delayed_kthread_work() for delayed works. */
+	WARN_ON_ONCE(work->timer);
+
+	return __cancel_kthread_work_sync(work);
+}
+EXPORT_SYMBOL_GPL(cancel_kthread_work_sync);
+
+/**
+ * cancel_delayed_kthread_work_sync - cancel a delayed kthread work and
+ *	wait for it to finish.
+ * @dwork: the delayed kthread work to cancel
+ *
+ * This is cancel_kthread_work_sync() for delayed works.
+ *
+ * Return:
+ * %true if @dwork was pending, %false otherwise.
+ */
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *dwork)
+{
+	return __cancel_kthread_work_sync(&dwork->work);
+}
+EXPORT_SYMBOL_GPL(cancel_delayed_kthread_work_sync);
+
+/**
  * flush_kthread_worker - flush all current works on a kthread_worker
  * @worker: worker to flush
  *
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 09/22] kthread: Allow to cancel kthread work
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.

This patch implements cancel_*_sync() operations as inspired by
workqueues. Well, we are synchronized against the other operations
via the worker lock, we use del_timer_sync() and a counter to count
parallel cancel operations. Therefore the implementation might be easier.

First, we try to lock the work. If it does not work, it means that
no worker is assigned and that we are done.

Second, we try to cancel the timer when it exists. The timer is deleted
synchronously to make sure that the timer call back is not running.
We hold the worker->lock at this point. To avoid a possible race,
the timer callback need to use a modified variant of try_lock_kthread_work()
and give up spinning when the canceling flag is set.

Third, we try to remove the work from the worker list.

Fourth, if the work is running, we call flush_kthread_work(). It might
take an arbitrary time. In the meantime, queuing of the work is blocked
by the new canceling counter.

As already mentioned, the check for a pending kthread work is done under
a lock. In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations. Therefore do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_work_flush(). Any queuing is blocked until the counter is zero.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |   4 ++
 kernel/kthread.c        | 154 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 154 insertions(+), 4 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 1a5738dcdf8d..dd2a587a2bd7 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -77,6 +77,7 @@ struct kthread_work {
 	kthread_work_func_t	func;
 	struct kthread_worker	*worker;
 	struct timer_list	*timer;
+	int			canceling;
 };
 
 struct delayed_kthread_work {
@@ -170,6 +171,9 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
+bool cancel_kthread_work_sync(struct kthread_work *work);
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *work);
+
 void destroy_kthread_worker(struct kthread_worker *worker);
 
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 6933d90c0fec..6e2eeca08d5f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -567,6 +567,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
  * Returns true when there is a pending operation for this work.
  * In particular, it checks if the work is:
  *	- queued
+ *	- being cancelled
  *	- a timer is running to queue this delayed work
  *
  * This function must be called with locked work.
@@ -574,6 +575,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
 static inline bool kthread_work_pending(const struct kthread_work *work)
 {
 	return !list_empty(&work->node) ||
+	       work->canceling ||
 	       (work->timer && timer_active(work->timer));
 }
 
@@ -779,7 +781,13 @@ bool queue_kthread_work(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(queue_kthread_work);
 
-static bool try_lock_kthread_work(struct kthread_work *work)
+/*
+ * Get the worker lock if any worker is associated with the work.
+ * Depending on @check_canceling, it might need to give up the busy
+ * wait when work->canceling gets set.
+ */
+static bool try_lock_kthread_work(struct kthread_work *work,
+				  bool check_canceling)
 {
 	struct kthread_worker *worker;
 	int ret = false;
@@ -790,7 +798,24 @@ try_again:
 	if (!worker)
 		goto out;
 
-	spin_lock(&worker->lock);
+	if (check_canceling) {
+		if (!spin_trylock(&worker->lock)) {
+			/*
+			 * Busy wait with spin_is_locked() to avoid
+			 * cache bouncing. Break when canceling
+			 * is set to avoid a deadlock.
+			 */
+			do {
+				if (READ_ONCE(work->canceling))
+					goto out;
+				cpu_relax();
+			} while (spin_is_locked(&worker->lock));
+			goto try_again;
+		}
+	} else {
+		spin_lock(&worker->lock);
+	}
+
 	if (worker != work->worker) {
 		spin_unlock(&worker->lock);
 		goto try_again;
@@ -820,10 +845,13 @@ void delayed_kthread_work_timer_fn(unsigned long __data)
 		(struct delayed_kthread_work *)__data;
 	struct kthread_work *work = &dwork->work;
 
-	if (!try_lock_kthread_work(work))
+	/* Give up when the work is being canceled. */
+	if (!try_lock_kthread_work(work, true))
 		return;
 
-	__queue_kthread_work(work->worker, work);
+	if (!work->canceling)
+		__queue_kthread_work(work->worker, work);
+
 	unlock_kthread_work(work);
 }
 EXPORT_SYMBOL(delayed_kthread_work_timer_fn);
@@ -947,6 +975,124 @@ retry:
 EXPORT_SYMBOL_GPL(flush_kthread_work);
 
 /**
+ * try_to_cancel_kthread_work - Try to cancel kthread work.
+ * @work: work item to cancel
+ * @lock: lock used to protect the work
+ * @flags: flags stored when the lock was taken
+ *
+ * This function tries to cancel the given kthread work by deleting
+ * the timer and by removing the work from the queue.
+ *
+ * If the timer callback is in progress, it waits until it finishes
+ * but it has to drop the lock to avoid a deadlock.
+ *
+ * Return:
+ *  1		if @work was pending and successfully canceled
+ *  0		if @work was not pending
+ */
+static int
+try_to_cancel_kthread_work(struct kthread_work *work,
+				   spinlock_t *lock,
+				   unsigned long *flags)
+{
+	int ret = 0;
+
+	/* Try to cancel the timer if pending. */
+	if (work->timer && del_timer_sync(work->timer)) {
+		ret = 1;
+		goto out;
+	}
+
+	/* Try to remove queued work before it is being executed. */
+	if (!list_empty(&work->node)) {
+		list_del_init(&work->node);
+		ret = 1;
+	}
+
+out:
+	return ret;
+}
+
+static bool __cancel_kthread_work_sync(struct kthread_work *work)
+{
+	struct kthread_worker *worker;
+	unsigned long flags;
+	int ret;
+
+	local_irq_save(flags);
+	if (!try_lock_kthread_work(work, false)) {
+		local_irq_restore(flags);
+		ret = 0;
+		goto out;
+	}
+	worker = work->worker;
+
+	/*
+	 * Block further queueing. It must be set before trying to cancel
+	 * the kthread work. It avoids a possible deadlock between
+	 * del_timer_sync() and the timer callback.
+	 */
+	work->canceling++;
+	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
+
+	if (worker->current_work != work)
+		goto out_fast;
+
+	spin_unlock_irqrestore(&worker->lock, flags);
+	flush_kthread_work(work);
+	/*
+	 * Nobody is allowed to switch the worker or queue the work
+	 * when .canceling is set.
+	 */
+	spin_lock_irqsave(&worker->lock, flags);
+
+out_fast:
+	work->canceling--;
+	spin_unlock_irqrestore(&worker->lock, flags);
+out:
+	return ret;
+}
+
+/**
+ * cancel_kthread_work_sync - cancel a kthread work and wait for it to finish
+ * @work: the kthread work to cancel
+ *
+ * Cancel @work and wait for its execution to finish.  This function
+ * can be used even if the work re-queues itself. On return from this
+ * function, @work is guaranteed to be not pending or executing on any CPU.
+ *
+ * The caller must ensure that the worker on which @work was last
+ * queued can't be destroyed before this function returns.
+ *
+ * Return:
+ * %true if @work was pending, %false otherwise.
+ */
+bool cancel_kthread_work_sync(struct kthread_work *work)
+{
+	/* Rather use cancel_delayed_kthread_work() for delayed works. */
+	WARN_ON_ONCE(work->timer);
+
+	return __cancel_kthread_work_sync(work);
+}
+EXPORT_SYMBOL_GPL(cancel_kthread_work_sync);
+
+/**
+ * cancel_delayed_kthread_work_sync - cancel a delayed kthread work and
+ *	wait for it to finish.
+ * @dwork: the delayed kthread work to cancel
+ *
+ * This is cancel_kthread_work_sync() for delayed works.
+ *
+ * Return:
+ * %true if @dwork was pending, %false otherwise.
+ */
+bool cancel_delayed_kthread_work_sync(struct delayed_kthread_work *dwork)
+{
+	return __cancel_kthread_work_sync(&dwork->work);
+}
+EXPORT_SYMBOL_GPL(cancel_delayed_kthread_work_sync);
+
+/**
  * flush_kthread_worker - flush all current works on a kthread_worker
  * @worker: worker to flush
  *
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 10/22] kthread: Allow to modify delayed kthread work
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:44   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.

This patch implements mod_delayed_kthread_work() as inspired workqueues.
It tries to cancel the pending work and queue it again with the
given timeout.

A very special case is when the work is being canceled at the same time.
cancel_*kthread_work_sync() operation blocks queuing until the running
work finishes. Therefore we do nothing and let cancel() win. This should
not normally happen as the caller is supposed to synchronize these
operations a reasonable way.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  4 ++++
 kernel/kthread.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index dd2a587a2bd7..f501dfeaa0e3 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -168,6 +168,10 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
 				struct delayed_kthread_work *dwork,
 				unsigned long delay);
 
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+			      struct delayed_kthread_work *dwork,
+			      unsigned long delay);
+
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 6e2eeca08d5f..ebb91848685f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1013,6 +1013,56 @@ out:
 	return ret;
 }
 
+/**
+ * mod_delayed_kthread_work - modify delay of or queue a delayed kthread work
+ * @worker: kthread worker to use
+ * @dwork: delayed kthread work to queue
+ * @delay: number of jiffies to wait before queuing
+ *
+ * If @dwork is idle, equivalent to queue_delayed_kthread work(). Otherwise,
+ * modify @dwork's timer so that it expires after @delay. If @delay is zero,
+ * @work is guaranteed to be queued immediately.
+ *
+ * Return: %false if @dwork was idle and queued. Return %true if @dwork was
+ * pending and its timer was modified.
+ *
+ * A special case is when cancel_work_sync() is running in parallel.
+ * It blocks further queuing. We let the cancel() win and return %false.
+ * The caller is supposed to synchronize these operations a reasonable way.
+ *
+ * This function is safe to call from any context including IRQ handler.
+ * See try_to_grab_pending_kthread_work() for details.
+ */
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+			      struct delayed_kthread_work *dwork,
+			      unsigned long delay)
+{
+	struct kthread_work *work = &dwork->work;
+	unsigned long flags;
+	int ret = 0;
+
+try_again:
+	spin_lock_irqsave(&worker->lock, flags);
+	WARN_ON_ONCE(work->worker && work->worker != worker);
+
+	if (work->canceling)
+		goto out;
+
+	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
+	if (ret == -EAGAIN)
+		goto try_again;
+
+	if (work->canceling)
+		ret = 0;
+	else
+		__queue_delayed_kthread_work(worker, dwork, delay);
+
+out:
+	spin_unlock_irqrestore(&worker->lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mod_delayed_kthread_work);
+
 static bool __cancel_kthread_work_sync(struct kthread_work *work)
 {
 	struct kthread_worker *worker;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 10/22] kthread: Allow to modify delayed kthread work
@ 2016-01-25 15:44   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:44 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.

This patch implements mod_delayed_kthread_work() as inspired workqueues.
It tries to cancel the pending work and queue it again with the
given timeout.

A very special case is when the work is being canceled at the same time.
cancel_*kthread_work_sync() operation blocks queuing until the running
work finishes. Therefore we do nothing and let cancel() win. This should
not normally happen as the caller is supposed to synchronize these
operations a reasonable way.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h |  4 ++++
 kernel/kthread.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index dd2a587a2bd7..f501dfeaa0e3 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -168,6 +168,10 @@ bool queue_delayed_kthread_work(struct kthread_worker *worker,
 				struct delayed_kthread_work *dwork,
 				unsigned long delay);
 
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+			      struct delayed_kthread_work *dwork,
+			      unsigned long delay);
+
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 6e2eeca08d5f..ebb91848685f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1013,6 +1013,56 @@ out:
 	return ret;
 }
 
+/**
+ * mod_delayed_kthread_work - modify delay of or queue a delayed kthread work
+ * @worker: kthread worker to use
+ * @dwork: delayed kthread work to queue
+ * @delay: number of jiffies to wait before queuing
+ *
+ * If @dwork is idle, equivalent to queue_delayed_kthread work(). Otherwise,
+ * modify @dwork's timer so that it expires after @delay. If @delay is zero,
+ * @work is guaranteed to be queued immediately.
+ *
+ * Return: %false if @dwork was idle and queued. Return %true if @dwork was
+ * pending and its timer was modified.
+ *
+ * A special case is when cancel_work_sync() is running in parallel.
+ * It blocks further queuing. We let the cancel() win and return %false.
+ * The caller is supposed to synchronize these operations a reasonable way.
+ *
+ * This function is safe to call from any context including IRQ handler.
+ * See try_to_grab_pending_kthread_work() for details.
+ */
+bool mod_delayed_kthread_work(struct kthread_worker *worker,
+			      struct delayed_kthread_work *dwork,
+			      unsigned long delay)
+{
+	struct kthread_work *work = &dwork->work;
+	unsigned long flags;
+	int ret = 0;
+
+try_again:
+	spin_lock_irqsave(&worker->lock, flags);
+	WARN_ON_ONCE(work->worker && work->worker != worker);
+
+	if (work->canceling)
+		goto out;
+
+	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
+	if (ret == -EAGAIN)
+		goto try_again;
+
+	if (work->canceling)
+		ret = 0;
+	else
+		__queue_delayed_kthread_work(worker, dwork, delay);
+
+out:
+	spin_unlock_irqrestore(&worker->lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mod_delayed_kthread_work);
+
 static bool __cancel_kthread_work_sync(struct kthread_work *work)
 {
 	struct kthread_worker *worker;
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 11/22] kthread: Better support freezable kthread workers
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h | 11 ++++++++---
 kernel/kthread.c        | 17 ++++++++++++-----
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index f501dfeaa0e3..2dad7020047f 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -65,7 +65,12 @@ struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
 void delayed_kthread_work_timer_fn(unsigned long __data);
 
+enum {
+	KTW_FREEZABLE		= 1 << 2,	/* freeze during suspend */
+};
+
 struct kthread_worker {
+	unsigned int		flags;
 	spinlock_t		lock;
 	struct list_head	work_list;
 	struct task_struct	*task;
@@ -154,12 +159,12 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
-__printf(1, 2)
+__printf(2, 3)
 struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...);
+create_kthread_worker(unsigned int flags, const char namefmt[], ...);
 
 struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
+create_kthread_worker_on_cpu(unsigned int flags, int cpu, const char namefmt[]);
 
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ebb91848685f..53c4d5a7c723 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -556,6 +556,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
 				const char *name,
 				struct lock_class_key *key)
 {
+	worker->flags = 0;
 	spin_lock_init(&worker->lock);
 	lockdep_set_class_and_name(&worker->lock, key, name);
 	INIT_LIST_HEAD(&worker->work_list);
@@ -605,6 +606,10 @@ int kthread_worker_fn(void *worker_ptr)
 	 */
 	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
+
+	if (worker->flags & KTW_FREEZABLE)
+		set_freezable();
+
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
 
@@ -638,7 +643,8 @@ repeat:
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
 static struct kthread_worker *
-__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+__create_kthread_worker(unsigned int flags, int cpu,
+			const char namefmt[], va_list args)
 {
 	struct kthread_worker *worker;
 	struct task_struct *task;
@@ -658,6 +664,7 @@ __create_kthread_worker(int cpu, const char namefmt[], va_list args)
 	if (IS_ERR(task))
 		goto fail_task;
 
+	worker->flags = flags;
 	worker->task = task;
 	wake_up_process(task);
 	return worker;
@@ -676,13 +683,13 @@ fail_task:
  * the worker was SIGKILLed.
  */
 struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...)
+create_kthread_worker(unsigned int flags, const char namefmt[], ...)
 {
 	struct kthread_worker *worker;
 	va_list args;
 
 	va_start(args, namefmt);
-	worker = __create_kthread_worker(-1, namefmt, args);
+	worker = __create_kthread_worker(flags, -1, namefmt, args);
 	va_end(args);
 
 	return worker;
@@ -706,12 +713,12 @@ EXPORT_SYMBOL(create_kthread_worker);
  * ERR_PTR(-EINVAL) on invalid @cpu.
  */
 struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[])
+create_kthread_worker_on_cpu(unsigned int flags, int cpu, const char namefmt[])
 {
 	if (cpu < 0 || cpu > num_possible_cpus())
 		return ERR_PTR(-EINVAL);
 
-	return __create_kthread_worker(cpu, namefmt, NULL);
+	return __create_kthread_worker(flags, cpu, namefmt, NULL);
 }
 EXPORT_SYMBOL(create_kthread_worker_on_cpu);
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 11/22] kthread: Better support freezable kthread workers
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/kthread.h | 11 ++++++++---
 kernel/kthread.c        | 17 ++++++++++++-----
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index f501dfeaa0e3..2dad7020047f 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -65,7 +65,12 @@ struct kthread_work;
 typedef void (*kthread_work_func_t)(struct kthread_work *work);
 void delayed_kthread_work_timer_fn(unsigned long __data);
 
+enum {
+	KTW_FREEZABLE		= 1 << 2,	/* freeze during suspend */
+};
+
 struct kthread_worker {
+	unsigned int		flags;
 	spinlock_t		lock;
 	struct list_head	work_list;
 	struct task_struct	*task;
@@ -154,12 +159,12 @@ extern void __init_kthread_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
-__printf(1, 2)
+__printf(2, 3)
 struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...);
+create_kthread_worker(unsigned int flags, const char namefmt[], ...);
 
 struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[]);
+create_kthread_worker_on_cpu(unsigned int flags, int cpu, const char namefmt[]);
 
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ebb91848685f..53c4d5a7c723 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -556,6 +556,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
 				const char *name,
 				struct lock_class_key *key)
 {
+	worker->flags = 0;
 	spin_lock_init(&worker->lock);
 	lockdep_set_class_and_name(&worker->lock, key, name);
 	INIT_LIST_HEAD(&worker->work_list);
@@ -605,6 +606,10 @@ int kthread_worker_fn(void *worker_ptr)
 	 */
 	WARN_ON(worker->task && worker->task != current);
 	worker->task = current;
+
+	if (worker->flags & KTW_FREEZABLE)
+		set_freezable();
+
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
 
@@ -638,7 +643,8 @@ repeat:
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
 static struct kthread_worker *
-__create_kthread_worker(int cpu, const char namefmt[], va_list args)
+__create_kthread_worker(unsigned int flags, int cpu,
+			const char namefmt[], va_list args)
 {
 	struct kthread_worker *worker;
 	struct task_struct *task;
@@ -658,6 +664,7 @@ __create_kthread_worker(int cpu, const char namefmt[], va_list args)
 	if (IS_ERR(task))
 		goto fail_task;
 
+	worker->flags = flags;
 	worker->task = task;
 	wake_up_process(task);
 	return worker;
@@ -676,13 +683,13 @@ fail_task:
  * the worker was SIGKILLed.
  */
 struct kthread_worker *
-create_kthread_worker(const char namefmt[], ...)
+create_kthread_worker(unsigned int flags, const char namefmt[], ...)
 {
 	struct kthread_worker *worker;
 	va_list args;
 
 	va_start(args, namefmt);
-	worker = __create_kthread_worker(-1, namefmt, args);
+	worker = __create_kthread_worker(flags, -1, namefmt, args);
 	va_end(args);
 
 	return worker;
@@ -706,12 +713,12 @@ EXPORT_SYMBOL(create_kthread_worker);
  * ERR_PTR(-EINVAL) on invalid @cpu.
  */
 struct kthread_worker *
-create_kthread_worker_on_cpu(int cpu, const char namefmt[])
+create_kthread_worker_on_cpu(unsigned int flags, int cpu, const char namefmt[])
 {
 	if (cpu < 0 || cpu > num_possible_cpus())
 		return ERR_PTR(-EINVAL);
 
-	return __create_kthread_worker(cpu, namefmt, NULL);
+	return __create_kthread_worker(flags, cpu, namefmt, NULL);
 }
 EXPORT_SYMBOL(create_kthread_worker_on_cpu);
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 12/22] kthread: Use try_lock_kthread_work() in flush_kthread_work()
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Remove code duplication and use the new try_lock_kthread_work()
function in flush_kthread_work() as well.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 53c4d5a7c723..7193582fe299 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -956,16 +956,12 @@ void flush_kthread_work(struct kthread_work *work)
 	struct kthread_worker *worker;
 	bool noop = false;
 
-retry:
-	worker = work->worker;
-	if (!worker)
+	local_irq_disable();
+	if (!try_lock_kthread_work(work, false)) {
+		local_irq_enable();
 		return;
-
-	spin_lock_irq(&worker->lock);
-	if (work->worker != worker) {
-		spin_unlock_irq(&worker->lock);
-		goto retry;
 	}
+	worker = work->worker;
 
 	if (!list_empty(&work->node))
 		insert_kthread_work(worker, &fwork.work, work->node.next);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 12/22] kthread: Use try_lock_kthread_work() in flush_kthread_work()
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Remove code duplication and use the new try_lock_kthread_work()
function in flush_kthread_work() as well.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/kthread.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 53c4d5a7c723..7193582fe299 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -956,16 +956,12 @@ void flush_kthread_work(struct kthread_work *work)
 	struct kthread_worker *worker;
 	bool noop = false;
 
-retry:
-	worker = work->worker;
-	if (!worker)
+	local_irq_disable();
+	if (!try_lock_kthread_work(work, false)) {
+		local_irq_enable();
 		return;
-
-	spin_lock_irq(&worker->lock);
-	if (work->worker != worker) {
-		spin_unlock_irq(&worker->lock);
-		goto retry;
 	}
+	worker = work->worker;
 
 	if (!list_empty(&work->node))
 		insert_kthread_work(worker, &fwork.work, work->node.next);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.

Note that kthread works get associated with a single kthread worker.
They must be initialized if we want to use them with another worker.
This is needed also when the worker is restarted.

set_freezable() is not needed because the kthread worker is
created as freezable.

set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.

The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.

Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.

They delay between scans is done using a delayed work.

Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.

Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 mm/huge_memory.c | 138 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fd3a07b3e6f4..828d741ed242 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,10 +89,16 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static struct delayed_kthread_work khugepaged_do_scan_work;
+static struct kthread_work khugepaged_cleanup_work;
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
 /*
  * default collapse hugepages if there is at least one pte mapped like
  * it would have happened if the vma was large enough during page
@@ -100,7 +106,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -180,29 +185,55 @@ static void set_recommended_min_free_kbytes(void)
 	setup_per_zone_wmarks();
 }
 
+static int khugepaged_has_work(void)
+{
+	return !list_empty(&khugepaged_scan.mm_head);
+}
+
 static int start_stop_khugepaged(void)
 {
+	struct kthread_worker *worker;
 	int err = 0;
+
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (IS_ERR(khugepaged_thread)) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (khugepaged_worker)
+			goto out;
+
+		worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+		if (IS_ERR(worker)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
+		set_user_nice(worker->task, MAX_NICE);
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		/* Always initialize the works when the worker is started. */
+		init_delayed_kthread_work(&khugepaged_do_scan_work,
+					  khugepaged_do_scan_func);
+		init_kthread_work(&khugepaged_cleanup_work,
+				  khugepaged_cleanup_func);
+
+		/* Make the worker public and check for work synchronously. */
+		spin_lock(&khugepaged_mm_lock);
+		khugepaged_worker = worker;
+		if (khugepaged_has_work())
+			queue_delayed_kthread_work(worker,
+						   &khugepaged_do_scan_work,
+						   0);
+		spin_unlock(&khugepaged_mm_lock);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (khugepaged_worker) {
+		/* First, stop others from using the worker. */
+		spin_lock(&khugepaged_mm_lock);
+		worker = khugepaged_worker;
+		khugepaged_worker = NULL;
+		spin_unlock(&khugepaged_mm_lock);
+
+		cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+		queue_kthread_work(worker, &khugepaged_cleanup_work);
+		destroy_kthread_worker(worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -461,7 +492,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_scan_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+
+	spin_lock(&khugepaged_mm_lock);
+	if (khugepaged_worker && khugepaged_has_work())
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return count;
 }
@@ -488,7 +525,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_alloc_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+	wake_up_interruptible(&khugepaged_alloc_wait);
 
 	return count;
 }
@@ -1878,7 +1915,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 int __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
-	int wakeup;
+	int has_work;
 
 	mm_slot = alloc_mm_slot();
 	if (!mm_slot)
@@ -1897,13 +1934,15 @@ int __khugepaged_enter(struct mm_struct *mm)
 	 * Insert just behind the scanning cursor, to let the area settle
 	 * down a little.
 	 */
-	wakeup = list_empty(&khugepaged_scan.mm_head);
+	has_work = khugepaged_has_work();
 	list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
-	spin_unlock(&khugepaged_mm_lock);
 
 	atomic_inc(&mm->mm_count);
-	if (wakeup)
-		wake_up_interruptible(&khugepaged_wait);
+	if (khugepaged_worker && has_work)
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return 0;
 }
@@ -2142,10 +2181,10 @@ static void khugepaged_alloc_sleep(void)
 {
 	DEFINE_WAIT(wait);
 
-	add_wait_queue(&khugepaged_wait, &wait);
+	add_wait_queue(&khugepaged_alloc_wait, &wait);
 	freezable_schedule_timeout_interruptible(
 		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
-	remove_wait_queue(&khugepaged_wait, &wait);
+	remove_wait_queue(&khugepaged_alloc_wait, &wait);
 }
 
 static int khugepaged_node_load[MAX_NUMNODES];
@@ -2716,19 +2755,7 @@ breakouterloop_mmap_sem:
 	return progress;
 }
 
-static int khugepaged_has_work(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) &&
-		khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2743,7 +2770,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2760,43 +2787,30 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
-}
 
-static void khugepaged_wait_work(void)
-{
 	if (khugepaged_has_work()) {
-		if (!khugepaged_scan_sleep_millisecs)
-			return;
 
-		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
-			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
-	}
+		unsigned long delay = 0;
 
-	if (khugepaged_enabled())
-		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+		if (khugepaged_scan_sleep_millisecs)
+			delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
+
+		queue_delayed_kthread_work(khugepaged_worker,
+					   &khugepaged_do_scan_work,
+					   delay);
+	}
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.

Note that kthread works get associated with a single kthread worker.
They must be initialized if we want to use them with another worker.
This is needed also when the worker is restarted.

set_freezable() is not needed because the kthread worker is
created as freezable.

set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.

The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.

Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.

They delay between scans is done using a delayed work.

Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.

Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
---
 mm/huge_memory.c | 138 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fd3a07b3e6f4..828d741ed242 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,10 +89,16 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static struct delayed_kthread_work khugepaged_do_scan_work;
+static struct kthread_work khugepaged_cleanup_work;
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
 /*
  * default collapse hugepages if there is at least one pte mapped like
  * it would have happened if the vma was large enough during page
@@ -100,7 +106,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -180,29 +185,55 @@ static void set_recommended_min_free_kbytes(void)
 	setup_per_zone_wmarks();
 }
 
+static int khugepaged_has_work(void)
+{
+	return !list_empty(&khugepaged_scan.mm_head);
+}
+
 static int start_stop_khugepaged(void)
 {
+	struct kthread_worker *worker;
 	int err = 0;
+
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (IS_ERR(khugepaged_thread)) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (khugepaged_worker)
+			goto out;
+
+		worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+		if (IS_ERR(worker)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
+		set_user_nice(worker->task, MAX_NICE);
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		/* Always initialize the works when the worker is started. */
+		init_delayed_kthread_work(&khugepaged_do_scan_work,
+					  khugepaged_do_scan_func);
+		init_kthread_work(&khugepaged_cleanup_work,
+				  khugepaged_cleanup_func);
+
+		/* Make the worker public and check for work synchronously. */
+		spin_lock(&khugepaged_mm_lock);
+		khugepaged_worker = worker;
+		if (khugepaged_has_work())
+			queue_delayed_kthread_work(worker,
+						   &khugepaged_do_scan_work,
+						   0);
+		spin_unlock(&khugepaged_mm_lock);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (khugepaged_worker) {
+		/* First, stop others from using the worker. */
+		spin_lock(&khugepaged_mm_lock);
+		worker = khugepaged_worker;
+		khugepaged_worker = NULL;
+		spin_unlock(&khugepaged_mm_lock);
+
+		cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+		queue_kthread_work(worker, &khugepaged_cleanup_work);
+		destroy_kthread_worker(worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -461,7 +492,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_scan_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+
+	spin_lock(&khugepaged_mm_lock);
+	if (khugepaged_worker && khugepaged_has_work())
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return count;
 }
@@ -488,7 +525,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_alloc_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+	wake_up_interruptible(&khugepaged_alloc_wait);
 
 	return count;
 }
@@ -1878,7 +1915,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 int __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
-	int wakeup;
+	int has_work;
 
 	mm_slot = alloc_mm_slot();
 	if (!mm_slot)
@@ -1897,13 +1934,15 @@ int __khugepaged_enter(struct mm_struct *mm)
 	 * Insert just behind the scanning cursor, to let the area settle
 	 * down a little.
 	 */
-	wakeup = list_empty(&khugepaged_scan.mm_head);
+	has_work = khugepaged_has_work();
 	list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
-	spin_unlock(&khugepaged_mm_lock);
 
 	atomic_inc(&mm->mm_count);
-	if (wakeup)
-		wake_up_interruptible(&khugepaged_wait);
+	if (khugepaged_worker && has_work)
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return 0;
 }
@@ -2142,10 +2181,10 @@ static void khugepaged_alloc_sleep(void)
 {
 	DEFINE_WAIT(wait);
 
-	add_wait_queue(&khugepaged_wait, &wait);
+	add_wait_queue(&khugepaged_alloc_wait, &wait);
 	freezable_schedule_timeout_interruptible(
 		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
-	remove_wait_queue(&khugepaged_wait, &wait);
+	remove_wait_queue(&khugepaged_alloc_wait, &wait);
 }
 
 static int khugepaged_node_load[MAX_NUMNODES];
@@ -2716,19 +2755,7 @@ breakouterloop_mmap_sem:
 	return progress;
 }
 
-static int khugepaged_has_work(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) &&
-		khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2743,7 +2770,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2760,43 +2787,30 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
-}
 
-static void khugepaged_wait_work(void)
-{
 	if (khugepaged_has_work()) {
-		if (!khugepaged_scan_sleep_millisecs)
-			return;
 
-		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
-			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
-	}
+		unsigned long delay = 0;
 
-	if (khugepaged_enabled())
-		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+		if (khugepaged_scan_sleep_millisecs)
+			delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
+
+		queue_delayed_kthread_work(khugepaged_worker,
+					   &khugepaged_do_scan_work,
+					   delay);
+	}
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts khugepaged() in kthread worker API
because it modifies the scheduling.

It keeps the functionality except that we do not wakeup the worker
when it is already created and someone calls start() once again.

Note that kthread works get associated with a single kthread worker.
They must be initialized if we want to use them with another worker.
This is needed also when the worker is restarted.

set_freezable() is not needed because the kthread worker is
created as freezable.

set_user_nice() is called from start_stop_khugepaged(). It need
not be done from within the kthread.

The scan work must be queued only when the worker is available.
We have to use "khugepaged_mm_lock" to avoid a race between the check
and queuing. I admit that this was a bit easier before because wake_up()
was a nope when the kthread did not exist.

Also the scan work is queued only when the list of scanned pages is
not empty. It adds one check but it is cleaner.

They delay between scans is done using a delayed work.

Note that @khugepaged_wait waitqueue had two purposes. It was used
to wait between scans and when an allocation failed. It is still used
for the second purpose. Therefore it was renamed to better describe
the current use.

Also note that we could not longer check for kthread_should_stop()
in the works. The kthread used by the worker has to stay alive
until all queued works are finished. Instead, we use the existing
check khugepaged_enabled() that returns false when we are going down.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 mm/huge_memory.c | 138 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fd3a07b3e6f4..828d741ed242 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,10 +89,16 @@ static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
-static struct task_struct *khugepaged_thread __read_mostly;
+
+static void khugepaged_do_scan_func(struct kthread_work *dummy);
+static void khugepaged_cleanup_func(struct kthread_work *dummy);
+static struct kthread_worker *khugepaged_worker;
+static struct delayed_kthread_work khugepaged_do_scan_work;
+static struct kthread_work khugepaged_cleanup_work;
+
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
-static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
+static DECLARE_WAIT_QUEUE_HEAD(khugepaged_alloc_wait);
 /*
  * default collapse hugepages if there is at least one pte mapped like
  * it would have happened if the vma was large enough during page
@@ -100,7 +106,6 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
 
-static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
 static void khugepaged_slab_exit(void);
 
@@ -180,29 +185,55 @@ static void set_recommended_min_free_kbytes(void)
 	setup_per_zone_wmarks();
 }
 
+static int khugepaged_has_work(void)
+{
+	return !list_empty(&khugepaged_scan.mm_head);
+}
+
 static int start_stop_khugepaged(void)
 {
+	struct kthread_worker *worker;
 	int err = 0;
+
 	if (khugepaged_enabled()) {
-		if (!khugepaged_thread)
-			khugepaged_thread = kthread_run(khugepaged, NULL,
-							"khugepaged");
-		if (IS_ERR(khugepaged_thread)) {
-			pr_err("khugepaged: kthread_run(khugepaged) failed\n");
-			err = PTR_ERR(khugepaged_thread);
-			khugepaged_thread = NULL;
-			goto fail;
+		if (khugepaged_worker)
+			goto out;
+
+		worker = create_kthread_worker(KTW_FREEZABLE, "khugepaged");
+		if (IS_ERR(worker)) {
+			pr_err("khugepaged: failed to create kthread worker\n");
+			goto out;
 		}
+		set_user_nice(worker->task, MAX_NICE);
 
-		if (!list_empty(&khugepaged_scan.mm_head))
-			wake_up_interruptible(&khugepaged_wait);
+		/* Always initialize the works when the worker is started. */
+		init_delayed_kthread_work(&khugepaged_do_scan_work,
+					  khugepaged_do_scan_func);
+		init_kthread_work(&khugepaged_cleanup_work,
+				  khugepaged_cleanup_func);
+
+		/* Make the worker public and check for work synchronously. */
+		spin_lock(&khugepaged_mm_lock);
+		khugepaged_worker = worker;
+		if (khugepaged_has_work())
+			queue_delayed_kthread_work(worker,
+						   &khugepaged_do_scan_work,
+						   0);
+		spin_unlock(&khugepaged_mm_lock);
 
 		set_recommended_min_free_kbytes();
-	} else if (khugepaged_thread) {
-		kthread_stop(khugepaged_thread);
-		khugepaged_thread = NULL;
+	} else if (khugepaged_worker) {
+		/* First, stop others from using the worker. */
+		spin_lock(&khugepaged_mm_lock);
+		worker = khugepaged_worker;
+		khugepaged_worker = NULL;
+		spin_unlock(&khugepaged_mm_lock);
+
+		cancel_delayed_kthread_work_sync(&khugepaged_do_scan_work);
+		queue_kthread_work(worker, &khugepaged_cleanup_work);
+		destroy_kthread_worker(worker);
 	}
-fail:
+out:
 	return err;
 }
 
@@ -461,7 +492,13 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_scan_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+
+	spin_lock(&khugepaged_mm_lock);
+	if (khugepaged_worker && khugepaged_has_work())
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return count;
 }
@@ -488,7 +525,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
 		return -EINVAL;
 
 	khugepaged_alloc_sleep_millisecs = msecs;
-	wake_up_interruptible(&khugepaged_wait);
+	wake_up_interruptible(&khugepaged_alloc_wait);
 
 	return count;
 }
@@ -1878,7 +1915,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 int __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
-	int wakeup;
+	int has_work;
 
 	mm_slot = alloc_mm_slot();
 	if (!mm_slot)
@@ -1897,13 +1934,15 @@ int __khugepaged_enter(struct mm_struct *mm)
 	 * Insert just behind the scanning cursor, to let the area settle
 	 * down a little.
 	 */
-	wakeup = list_empty(&khugepaged_scan.mm_head);
+	has_work = khugepaged_has_work();
 	list_add_tail(&mm_slot->mm_node, &khugepaged_scan.mm_head);
-	spin_unlock(&khugepaged_mm_lock);
 
 	atomic_inc(&mm->mm_count);
-	if (wakeup)
-		wake_up_interruptible(&khugepaged_wait);
+	if (khugepaged_worker && has_work)
+		mod_delayed_kthread_work(khugepaged_worker,
+					 &khugepaged_do_scan_work,
+					 0);
+	spin_unlock(&khugepaged_mm_lock);
 
 	return 0;
 }
@@ -2142,10 +2181,10 @@ static void khugepaged_alloc_sleep(void)
 {
 	DEFINE_WAIT(wait);
 
-	add_wait_queue(&khugepaged_wait, &wait);
+	add_wait_queue(&khugepaged_alloc_wait, &wait);
 	freezable_schedule_timeout_interruptible(
 		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
-	remove_wait_queue(&khugepaged_wait, &wait);
+	remove_wait_queue(&khugepaged_alloc_wait, &wait);
 }
 
 static int khugepaged_node_load[MAX_NUMNODES];
@@ -2716,19 +2755,7 @@ breakouterloop_mmap_sem:
 	return progress;
 }
 
-static int khugepaged_has_work(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) &&
-		khugepaged_enabled();
-}
-
-static int khugepaged_wait_event(void)
-{
-	return !list_empty(&khugepaged_scan.mm_head) ||
-		kthread_should_stop();
-}
-
-static void khugepaged_do_scan(void)
+static void khugepaged_do_scan_func(struct kthread_work *dummy)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
@@ -2743,7 +2770,7 @@ static void khugepaged_do_scan(void)
 
 		cond_resched();
 
-		if (unlikely(kthread_should_stop() || try_to_freeze()))
+		if (unlikely(!khugepaged_enabled() || try_to_freeze()))
 			break;
 
 		spin_lock(&khugepaged_mm_lock);
@@ -2760,43 +2787,30 @@ static void khugepaged_do_scan(void)
 
 	if (!IS_ERR_OR_NULL(hpage))
 		put_page(hpage);
-}
 
-static void khugepaged_wait_work(void)
-{
 	if (khugepaged_has_work()) {
-		if (!khugepaged_scan_sleep_millisecs)
-			return;
 
-		wait_event_freezable_timeout(khugepaged_wait,
-					     kthread_should_stop(),
-			msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
-		return;
-	}
+		unsigned long delay = 0;
 
-	if (khugepaged_enabled())
-		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+		if (khugepaged_scan_sleep_millisecs)
+			delay = msecs_to_jiffies(khugepaged_scan_sleep_millisecs);
+
+		queue_delayed_kthread_work(khugepaged_worker,
+					   &khugepaged_do_scan_work,
+					   delay);
+	}
 }
 
-static int khugepaged(void *none)
+static void khugepaged_cleanup_func(struct kthread_work *dummy)
 {
 	struct mm_slot *mm_slot;
 
-	set_freezable();
-	set_user_nice(current, MAX_NICE);
-
-	while (!kthread_should_stop()) {
-		khugepaged_do_scan();
-		khugepaged_wait_work();
-	}
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
 	if (mm_slot)
 		collect_mm_slot(mm_slot);
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 14/22] ring_buffer: Convert benchmark kthreads into kthread worker API
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the ring buffer benchmark producer into a kthread
worker because it modifies the scheduling priority and policy.
Also, it is a benchmark. It makes CPU very busy. It will most likely
run only limited time. IMHO, it does not make sense to mess the system
workqueues with it.

The thread is split into two independent works. It might look more
complicated but it helped me to find a race in the sleeping part
that was fixed separately.

kthread_should_stop() could not longer be used inside the works
because it defines the life of the worker and it needs to stay
usable until all works are done. Instead, we add @test_end
global variable. It is set during normal termination in compare
with @test_error.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 133 ++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 74 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 6df9a83e20d7..7ff443f1e406 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -26,10 +26,17 @@ static int wakeup_interval = 100;
 static int reader_finish;
 static DECLARE_COMPLETION(read_start);
 static DECLARE_COMPLETION(read_done);
-
 static struct ring_buffer *buffer;
-static struct task_struct *producer;
-static struct task_struct *consumer;
+
+static void rb_producer_hammer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_producer_worker;
+static DEFINE_DELAYED_KTHREAD_WORK(rb_producer_hammer_work,
+				   rb_producer_hammer_func);
+
+static void rb_consumer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_consumer_worker;
+static DEFINE_KTHREAD_WORK(rb_consumer_work, rb_consumer_func);
+
 static unsigned long read;
 
 static unsigned int disable_reader;
@@ -61,6 +68,7 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 static int read_events;
 
 static int test_error;
+static int test_end;
 
 #define TEST_ERROR()				\
 	do {					\
@@ -77,7 +85,7 @@ enum event_status {
 
 static bool break_test(void)
 {
-	return test_error || kthread_should_stop();
+	return test_error || test_end;
 }
 
 static enum event_status read_event(int cpu)
@@ -262,8 +270,8 @@ static void ring_buffer_producer(void)
 		end_time = ktime_get();
 
 		cnt++;
-		if (consumer && !(cnt % wakeup_interval))
-			wake_up_process(consumer);
+		if (rb_consumer_worker && !(cnt % wakeup_interval))
+			wake_up_process(rb_consumer_worker->task);
 
 #ifndef CONFIG_PREEMPT
 		/*
@@ -281,14 +289,14 @@ static void ring_buffer_producer(void)
 	} while (ktime_before(end_time, timeout) && !break_test());
 	trace_printk("End ring buffer hammer\n");
 
-	if (consumer) {
+	if (rb_consumer_worker) {
 		/* Init both completions here to avoid races */
 		init_completion(&read_start);
 		init_completion(&read_done);
 		/* the completions must be visible before the finish var */
 		smp_wmb();
 		reader_finish = 1;
-		wake_up_process(consumer);
+		wake_up_process(rb_consumer_worker->task);
 		wait_for_completion(&read_done);
 	}
 
@@ -366,68 +374,39 @@ static void ring_buffer_producer(void)
 	}
 }
 
-static void wait_to_die(void)
-{
-	set_current_state(TASK_INTERRUPTIBLE);
-	while (!kthread_should_stop()) {
-		schedule();
-		set_current_state(TASK_INTERRUPTIBLE);
-	}
-	__set_current_state(TASK_RUNNING);
-}
-
-static int ring_buffer_consumer_thread(void *arg)
+static void rb_consumer_func(struct kthread_work *dummy)
 {
-	while (!break_test()) {
-		complete(&read_start);
-
-		ring_buffer_consumer();
+	complete(&read_start);
 
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (break_test())
-			break;
-		schedule();
-	}
-	__set_current_state(TASK_RUNNING);
-
-	if (!kthread_should_stop())
-		wait_to_die();
-
-	return 0;
+	ring_buffer_consumer();
 }
 
-static int ring_buffer_producer_thread(void *arg)
+static void rb_producer_hammer_func(struct kthread_work *dummy)
 {
-	while (!break_test()) {
-		ring_buffer_reset(buffer);
+	if (break_test())
+		return;
 
-		if (consumer) {
-			wake_up_process(consumer);
-			wait_for_completion(&read_start);
-		}
-
-		ring_buffer_producer();
-		if (break_test())
-			goto out_kill;
+	ring_buffer_reset(buffer);
 
-		trace_printk("Sleeping for 10 secs\n");
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (break_test())
-			goto out_kill;
-		schedule_timeout(HZ * SLEEP_TIME);
+	if (rb_consumer_worker) {
+		queue_kthread_work(rb_consumer_worker, &rb_consumer_work);
+		wait_for_completion(&read_start);
 	}
 
-out_kill:
-	__set_current_state(TASK_RUNNING);
-	if (!kthread_should_stop())
-		wait_to_die();
+	ring_buffer_producer();
 
-	return 0;
+	if (break_test())
+		return;
+
+	trace_printk("Sleeping for 10 secs\n");
+	queue_delayed_kthread_work(rb_producer_worker,
+				   &rb_producer_hammer_work,
+				   HZ * SLEEP_TIME);
 }
 
 static int __init ring_buffer_benchmark_init(void)
 {
-	int ret;
+	int ret = 0;
 
 	/* make a one meg buffer in overwite mode */
 	buffer = ring_buffer_alloc(1000000, RB_FL_OVERWRITE);
@@ -435,19 +414,21 @@ static int __init ring_buffer_benchmark_init(void)
 		return -ENOMEM;
 
 	if (!disable_reader) {
-		consumer = kthread_create(ring_buffer_consumer_thread,
-					  NULL, "rb_consumer");
-		ret = PTR_ERR(consumer);
-		if (IS_ERR(consumer))
+		rb_consumer_worker = create_kthread_worker(0, "rb_consumer");
+		if (IS_ERR(rb_consumer_worker)) {
+			ret = PTR_ERR(rb_consumer_worker);
 			goto out_fail;
+		}
 	}
 
-	producer = kthread_run(ring_buffer_producer_thread,
-			       NULL, "rb_producer");
-	ret = PTR_ERR(producer);
-
-	if (IS_ERR(producer))
+	rb_producer_worker = create_kthread_worker(0, "rb_producer");
+	if (IS_ERR(rb_producer_worker)) {
+		ret = PTR_ERR(rb_producer_worker);
 		goto out_kill;
+	}
+
+	queue_delayed_kthread_work(rb_producer_worker,
+				   &rb_producer_hammer_work, 0);
 
 	/*
 	 * Run them as low-prio background tasks by default:
@@ -457,24 +438,26 @@ static int __init ring_buffer_benchmark_init(void)
 			struct sched_param param = {
 				.sched_priority = consumer_fifo
 			};
-			sched_setscheduler(consumer, SCHED_FIFO, &param);
+			sched_setscheduler(rb_consumer_worker->task,
+					   SCHED_FIFO, &param);
 		} else
-			set_user_nice(consumer, consumer_nice);
+			set_user_nice(rb_consumer_worker->task, consumer_nice);
 	}
 
 	if (producer_fifo >= 0) {
 		struct sched_param param = {
 			.sched_priority = producer_fifo
 		};
-		sched_setscheduler(producer, SCHED_FIFO, &param);
+		sched_setscheduler(rb_producer_worker->task,
+				   SCHED_FIFO, &param);
 	} else
-		set_user_nice(producer, producer_nice);
+		set_user_nice(rb_producer_worker->task, producer_nice);
 
 	return 0;
 
  out_kill:
-	if (consumer)
-		kthread_stop(consumer);
+	if (rb_consumer_worker)
+		destroy_kthread_worker(rb_consumer_worker);
 
  out_fail:
 	ring_buffer_free(buffer);
@@ -483,9 +466,11 @@ static int __init ring_buffer_benchmark_init(void)
 
 static void __exit ring_buffer_benchmark_exit(void)
 {
-	kthread_stop(producer);
-	if (consumer)
-		kthread_stop(consumer);
+	test_end = 1;
+	cancel_delayed_kthread_work_sync(&rb_producer_hammer_work);
+	destroy_kthread_worker(rb_producer_worker);
+	if (rb_consumer_worker)
+		destroy_kthread_worker(rb_consumer_worker);
 	ring_buffer_free(buffer);
 }
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 14/22] ring_buffer: Convert benchmark kthreads into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the ring buffer benchmark producer into a kthread
worker because it modifies the scheduling priority and policy.
Also, it is a benchmark. It makes CPU very busy. It will most likely
run only limited time. IMHO, it does not make sense to mess the system
workqueues with it.

The thread is split into two independent works. It might look more
complicated but it helped me to find a race in the sleeping part
that was fixed separately.

kthread_should_stop() could not longer be used inside the works
because it defines the life of the worker and it needs to stay
usable until all works are done. Instead, we add @test_end
global variable. It is set during normal termination in compare
with @test_error.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/trace/ring_buffer_benchmark.c | 133 ++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 74 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 6df9a83e20d7..7ff443f1e406 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -26,10 +26,17 @@ static int wakeup_interval = 100;
 static int reader_finish;
 static DECLARE_COMPLETION(read_start);
 static DECLARE_COMPLETION(read_done);
-
 static struct ring_buffer *buffer;
-static struct task_struct *producer;
-static struct task_struct *consumer;
+
+static void rb_producer_hammer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_producer_worker;
+static DEFINE_DELAYED_KTHREAD_WORK(rb_producer_hammer_work,
+				   rb_producer_hammer_func);
+
+static void rb_consumer_func(struct kthread_work *dummy);
+static struct kthread_worker *rb_consumer_worker;
+static DEFINE_KTHREAD_WORK(rb_consumer_work, rb_consumer_func);
+
 static unsigned long read;
 
 static unsigned int disable_reader;
@@ -61,6 +68,7 @@ MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
 static int read_events;
 
 static int test_error;
+static int test_end;
 
 #define TEST_ERROR()				\
 	do {					\
@@ -77,7 +85,7 @@ enum event_status {
 
 static bool break_test(void)
 {
-	return test_error || kthread_should_stop();
+	return test_error || test_end;
 }
 
 static enum event_status read_event(int cpu)
@@ -262,8 +270,8 @@ static void ring_buffer_producer(void)
 		end_time = ktime_get();
 
 		cnt++;
-		if (consumer && !(cnt % wakeup_interval))
-			wake_up_process(consumer);
+		if (rb_consumer_worker && !(cnt % wakeup_interval))
+			wake_up_process(rb_consumer_worker->task);
 
 #ifndef CONFIG_PREEMPT
 		/*
@@ -281,14 +289,14 @@ static void ring_buffer_producer(void)
 	} while (ktime_before(end_time, timeout) && !break_test());
 	trace_printk("End ring buffer hammer\n");
 
-	if (consumer) {
+	if (rb_consumer_worker) {
 		/* Init both completions here to avoid races */
 		init_completion(&read_start);
 		init_completion(&read_done);
 		/* the completions must be visible before the finish var */
 		smp_wmb();
 		reader_finish = 1;
-		wake_up_process(consumer);
+		wake_up_process(rb_consumer_worker->task);
 		wait_for_completion(&read_done);
 	}
 
@@ -366,68 +374,39 @@ static void ring_buffer_producer(void)
 	}
 }
 
-static void wait_to_die(void)
-{
-	set_current_state(TASK_INTERRUPTIBLE);
-	while (!kthread_should_stop()) {
-		schedule();
-		set_current_state(TASK_INTERRUPTIBLE);
-	}
-	__set_current_state(TASK_RUNNING);
-}
-
-static int ring_buffer_consumer_thread(void *arg)
+static void rb_consumer_func(struct kthread_work *dummy)
 {
-	while (!break_test()) {
-		complete(&read_start);
-
-		ring_buffer_consumer();
+	complete(&read_start);
 
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (break_test())
-			break;
-		schedule();
-	}
-	__set_current_state(TASK_RUNNING);
-
-	if (!kthread_should_stop())
-		wait_to_die();
-
-	return 0;
+	ring_buffer_consumer();
 }
 
-static int ring_buffer_producer_thread(void *arg)
+static void rb_producer_hammer_func(struct kthread_work *dummy)
 {
-	while (!break_test()) {
-		ring_buffer_reset(buffer);
+	if (break_test())
+		return;
 
-		if (consumer) {
-			wake_up_process(consumer);
-			wait_for_completion(&read_start);
-		}
-
-		ring_buffer_producer();
-		if (break_test())
-			goto out_kill;
+	ring_buffer_reset(buffer);
 
-		trace_printk("Sleeping for 10 secs\n");
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (break_test())
-			goto out_kill;
-		schedule_timeout(HZ * SLEEP_TIME);
+	if (rb_consumer_worker) {
+		queue_kthread_work(rb_consumer_worker, &rb_consumer_work);
+		wait_for_completion(&read_start);
 	}
 
-out_kill:
-	__set_current_state(TASK_RUNNING);
-	if (!kthread_should_stop())
-		wait_to_die();
+	ring_buffer_producer();
 
-	return 0;
+	if (break_test())
+		return;
+
+	trace_printk("Sleeping for 10 secs\n");
+	queue_delayed_kthread_work(rb_producer_worker,
+				   &rb_producer_hammer_work,
+				   HZ * SLEEP_TIME);
 }
 
 static int __init ring_buffer_benchmark_init(void)
 {
-	int ret;
+	int ret = 0;
 
 	/* make a one meg buffer in overwite mode */
 	buffer = ring_buffer_alloc(1000000, RB_FL_OVERWRITE);
@@ -435,19 +414,21 @@ static int __init ring_buffer_benchmark_init(void)
 		return -ENOMEM;
 
 	if (!disable_reader) {
-		consumer = kthread_create(ring_buffer_consumer_thread,
-					  NULL, "rb_consumer");
-		ret = PTR_ERR(consumer);
-		if (IS_ERR(consumer))
+		rb_consumer_worker = create_kthread_worker(0, "rb_consumer");
+		if (IS_ERR(rb_consumer_worker)) {
+			ret = PTR_ERR(rb_consumer_worker);
 			goto out_fail;
+		}
 	}
 
-	producer = kthread_run(ring_buffer_producer_thread,
-			       NULL, "rb_producer");
-	ret = PTR_ERR(producer);
-
-	if (IS_ERR(producer))
+	rb_producer_worker = create_kthread_worker(0, "rb_producer");
+	if (IS_ERR(rb_producer_worker)) {
+		ret = PTR_ERR(rb_producer_worker);
 		goto out_kill;
+	}
+
+	queue_delayed_kthread_work(rb_producer_worker,
+				   &rb_producer_hammer_work, 0);
 
 	/*
 	 * Run them as low-prio background tasks by default:
@@ -457,24 +438,26 @@ static int __init ring_buffer_benchmark_init(void)
 			struct sched_param param = {
 				.sched_priority = consumer_fifo
 			};
-			sched_setscheduler(consumer, SCHED_FIFO, &param);
+			sched_setscheduler(rb_consumer_worker->task,
+					   SCHED_FIFO, &param);
 		} else
-			set_user_nice(consumer, consumer_nice);
+			set_user_nice(rb_consumer_worker->task, consumer_nice);
 	}
 
 	if (producer_fifo >= 0) {
 		struct sched_param param = {
 			.sched_priority = producer_fifo
 		};
-		sched_setscheduler(producer, SCHED_FIFO, &param);
+		sched_setscheduler(rb_producer_worker->task,
+				   SCHED_FIFO, &param);
 	} else
-		set_user_nice(producer, producer_nice);
+		set_user_nice(rb_producer_worker->task, producer_nice);
 
 	return 0;
 
  out_kill:
-	if (consumer)
-		kthread_stop(consumer);
+	if (rb_consumer_worker)
+		destroy_kthread_worker(rb_consumer_worker);
 
  out_fail:
 	ring_buffer_free(buffer);
@@ -483,9 +466,11 @@ static int __init ring_buffer_benchmark_init(void)
 
 static void __exit ring_buffer_benchmark_exit(void)
 {
-	kthread_stop(producer);
-	if (consumer)
-		kthread_stop(consumer);
+	test_end = 1;
+	cancel_delayed_kthread_work_sync(&rb_producer_hammer_work);
+	destroy_kthread_worker(rb_producer_worker);
+	if (rb_consumer_worker)
+		destroy_kthread_worker(rb_consumer_worker);
 	ring_buffer_free(buffer);
 }
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 15/22] hung_task: Convert hungtaskd into kthread worker API
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	linux-watchdog

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts hungtaskd() in kthread worker API because
it modifies the priority.

The conversion is pretty straightforward. One iteration of the
main cycle is transferred into a self-queuing delayed kthread work.
We do not longer need to check if it was waken earlier. Instead,
the work timeout is modified when the timeout value is changed.

The user nice value is set from hung_task_init(). Otherwise, we
would need to add an extra init_work.

The patch also handles the error when the kthead worker could not
be crated from some reasons. It was broken before. For example,
wake_up_process would have failed if watchdog_task inclueded an error
code instead of a valid pointer.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: linux-watchdog@vger.kernel.org
---
 kernel/hung_task.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index e0f90c2b57aa..65026f8b750e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -41,7 +41,9 @@ int __read_mostly sysctl_hung_task_warnings = 10;
 
 static int __read_mostly did_panic;
 
-static struct task_struct *watchdog_task;
+static struct kthread_worker *watchdog_worker;
+static void watchdog_func(struct kthread_work *dummy);
+static DEFINE_DELAYED_KTHREAD_WORK(watchdog_work, watchdog_func);
 
 /*
  * Should we panic (and reboot, if panic_timeout= is set) when a
@@ -205,7 +207,9 @@ int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
 	if (ret || !write)
 		goto out;
 
-	wake_up_process(watchdog_task);
+	if (watchdog_worker)
+		mod_delayed_kthread_work(watchdog_worker, &watchdog_work,
+			 timeout_jiffies(sysctl_hung_task_timeout_secs));
 
  out:
 	return ret;
@@ -222,30 +226,35 @@ EXPORT_SYMBOL_GPL(reset_hung_task_detector);
 /*
  * kthread which checks for tasks stuck in D state
  */
-static int watchdog(void *dummy)
+static void watchdog_func(struct kthread_work *dummy)
 {
-	set_user_nice(current, 0);
+	unsigned long timeout = sysctl_hung_task_timeout_secs;
 
-	for ( ; ; ) {
-		unsigned long timeout = sysctl_hung_task_timeout_secs;
+	if (atomic_xchg(&reset_hung_task, 0))
+		goto next;
 
-		while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
-			timeout = sysctl_hung_task_timeout_secs;
+	check_hung_uninterruptible_tasks(timeout);
 
-		if (atomic_xchg(&reset_hung_task, 0))
-			continue;
-
-		check_hung_uninterruptible_tasks(timeout);
-	}
-
-	return 0;
+next:
+	queue_delayed_kthread_work(watchdog_worker, &watchdog_work,
+				   timeout_jiffies(timeout));
 }
 
 static int __init hung_task_init(void)
 {
+	struct kthread_worker *worker;
+
 	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
-	watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
+	worker = create_kthread_worker(0, "khungtaskd");
+	if (IS_ERR(worker)) {
+		pr_warn("Failed to create khungtaskd\n");
+		goto out;
+	}
+	watchdog_worker = worker;
+	set_user_nice(worker->task, 0);
+	queue_delayed_kthread_work(worker, &watchdog_work, 0);
 
+out:
 	return 0;
 }
 subsys_initcall(hung_task_init);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 15/22] hung_task: Convert hungtaskd into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	linux-watchdog

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts hungtaskd() in kthread worker API because
it modifies the priority.

The conversion is pretty straightforward. One iteration of the
main cycle is transferred into a self-queuing delayed kthread work.
We do not longer need to check if it was waken earlier. Instead,
the work timeout is modified when the timeout value is changed.

The user nice value is set from hung_task_init(). Otherwise, we
would need to add an extra init_work.

The patch also handles the error when the kthead worker could not
be crated from some reasons. It was broken before. For example,
wake_up_process would have failed if watchdog_task inclueded an error
code instead of a valid pointer.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: linux-watchdog@vger.kernel.org
---
 kernel/hung_task.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index e0f90c2b57aa..65026f8b750e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -41,7 +41,9 @@ int __read_mostly sysctl_hung_task_warnings = 10;
 
 static int __read_mostly did_panic;
 
-static struct task_struct *watchdog_task;
+static struct kthread_worker *watchdog_worker;
+static void watchdog_func(struct kthread_work *dummy);
+static DEFINE_DELAYED_KTHREAD_WORK(watchdog_work, watchdog_func);
 
 /*
  * Should we panic (and reboot, if panic_timeout= is set) when a
@@ -205,7 +207,9 @@ int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
 	if (ret || !write)
 		goto out;
 
-	wake_up_process(watchdog_task);
+	if (watchdog_worker)
+		mod_delayed_kthread_work(watchdog_worker, &watchdog_work,
+			 timeout_jiffies(sysctl_hung_task_timeout_secs));
 
  out:
 	return ret;
@@ -222,30 +226,35 @@ EXPORT_SYMBOL_GPL(reset_hung_task_detector);
 /*
  * kthread which checks for tasks stuck in D state
  */
-static int watchdog(void *dummy)
+static void watchdog_func(struct kthread_work *dummy)
 {
-	set_user_nice(current, 0);
+	unsigned long timeout = sysctl_hung_task_timeout_secs;
 
-	for ( ; ; ) {
-		unsigned long timeout = sysctl_hung_task_timeout_secs;
+	if (atomic_xchg(&reset_hung_task, 0))
+		goto next;
 
-		while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
-			timeout = sysctl_hung_task_timeout_secs;
+	check_hung_uninterruptible_tasks(timeout);
 
-		if (atomic_xchg(&reset_hung_task, 0))
-			continue;
-
-		check_hung_uninterruptible_tasks(timeout);
-	}
-
-	return 0;
+next:
+	queue_delayed_kthread_work(watchdog_worker, &watchdog_work,
+				   timeout_jiffies(timeout));
 }
 
 static int __init hung_task_init(void)
 {
+	struct kthread_worker *worker;
+
 	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
-	watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
+	worker = create_kthread_worker(0, "khungtaskd");
+	if (IS_ERR(worker)) {
+		pr_warn("Failed to create khungtaskd\n");
+		goto out;
+	}
+	watchdog_worker = worker;
+	set_user_nice(worker->task, 0);
+	queue_delayed_kthread_work(worker, &watchdog_work, 0);
 
+out:
 	return 0;
 }
 subsys_initcall(hung_task_init);
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 16/22] kmemleak: Convert kmemleak kthread into kthread worker API
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Catalin Marinas

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the kmemleak kthread into the kthread worker
API because it modifies the scheduling priority.

The result is a simple self-queuing work that just calls kmemleak_scan().

The info messages and set_user_nice() are moved to the functions that
start and stop the worker. These are also renamed to mention worker
instead of thread.

We do not longer need to handle a spurious wakeup and count the remaining
timeout. It is handled by the worker. The delayed work is queued after
the full timeout passes.

Finally, the initial delay is done only when the kthread is started
during the boot. For this we added a parameter to the start function.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/kmemleak.c | 87 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 43 insertions(+), 44 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 25c0ad36fe38..16f1a7bb1697 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -216,7 +216,8 @@ static int kmemleak_error;
 static unsigned long min_addr = ULONG_MAX;
 static unsigned long max_addr;
 
-static struct task_struct *scan_thread;
+static struct kthread_worker *kmemleak_scan_worker;
+static struct delayed_kthread_work kmemleak_scan_work;
 /* used to avoid reporting of recently allocated objects */
 static unsigned long jiffies_min_age;
 static unsigned long jiffies_last_scan;
@@ -1470,54 +1471,48 @@ static void kmemleak_scan(void)
 }
 
 /*
- * Thread function performing automatic memory scanning. Unreferenced objects
- * at the end of a memory scan are reported but only the first time.
+ * Kthread worker function performing automatic memory scanning.
+ * Unreferenced objects at the end of a memory scan are reported
+ * but only the first time.
  */
-static int kmemleak_scan_thread(void *arg)
+static void kmemleak_scan_func(struct kthread_work *dummy)
 {
-	static int first_run = 1;
-
-	pr_info("Automatic memory scanning thread started\n");
-	set_user_nice(current, 10);
-
-	/*
-	 * Wait before the first scan to allow the system to fully initialize.
-	 */
-	if (first_run) {
-		first_run = 0;
-		ssleep(SECS_FIRST_SCAN);
-	}
-
-	while (!kthread_should_stop()) {
-		signed long timeout = jiffies_scan_wait;
-
-		mutex_lock(&scan_mutex);
-		kmemleak_scan();
-		mutex_unlock(&scan_mutex);
-
-		/* wait before the next scan */
-		while (timeout && !kthread_should_stop())
-			timeout = schedule_timeout_interruptible(timeout);
-	}
-
-	pr_info("Automatic memory scanning thread ended\n");
+	mutex_lock(&scan_mutex);
+	kmemleak_scan();
+	mutex_unlock(&scan_mutex);
 
-	return 0;
+	queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+				   jiffies_scan_wait);
 }
 
 /*
  * Start the automatic memory scanning thread. This function must be called
  * with the scan_mutex held.
  */
-static void start_scan_thread(void)
+static void start_scan_thread(bool boot)
 {
-	if (scan_thread)
+	unsigned long timeout = 0;
+
+	if (kmemleak_scan_worker)
 		return;
-	scan_thread = kthread_run(kmemleak_scan_thread, NULL, "kmemleak");
-	if (IS_ERR(scan_thread)) {
-		pr_warning("Failed to create the scan thread\n");
-		scan_thread = NULL;
+
+	init_delayed_kthread_work(&kmemleak_scan_work, kmemleak_scan_func);
+	kmemleak_scan_worker = create_kthread_worker(0, "kmemleak");
+	if (IS_ERR(kmemleak_scan_worker)) {
+		pr_warn("Failed to create the memory scan worker\n");
+		kmemleak_scan_worker = NULL;
 	}
+	pr_info("Automatic memory scanning thread started\n");
+	set_user_nice(kmemleak_scan_worker->task, 10);
+
+	/*
+	 * Wait before the first scan to allow the system to fully initialize.
+	 */
+	if (boot)
+		timeout = msecs_to_jiffies(SECS_FIRST_SCAN * MSEC_PER_SEC);
+
+	queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+				   timeout);
 }
 
 /*
@@ -1526,10 +1521,14 @@ static void start_scan_thread(void)
  */
 static void stop_scan_thread(void)
 {
-	if (scan_thread) {
-		kthread_stop(scan_thread);
-		scan_thread = NULL;
-	}
+	if (!kmemleak_scan_worker)
+		return;
+
+	cancel_delayed_kthread_work_sync(&kmemleak_scan_work);
+	destroy_kthread_worker(kmemleak_scan_worker);
+	kmemleak_scan_worker = NULL;
+
+	pr_info("Automatic memory scanning thread ended\n");
 }
 
 /*
@@ -1726,7 +1725,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
 	else if (strncmp(buf, "stack=off", 9) == 0)
 		kmemleak_stack_scan = 0;
 	else if (strncmp(buf, "scan=on", 7) == 0)
-		start_scan_thread();
+		start_scan_thread(false);
 	else if (strncmp(buf, "scan=off", 8) == 0)
 		stop_scan_thread();
 	else if (strncmp(buf, "scan=", 5) == 0) {
@@ -1738,7 +1737,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
 		stop_scan_thread();
 		if (secs) {
 			jiffies_scan_wait = msecs_to_jiffies(secs * 1000);
-			start_scan_thread();
+			start_scan_thread(false);
 		}
 	} else if (strncmp(buf, "scan", 4) == 0)
 		kmemleak_scan();
@@ -1962,7 +1961,7 @@ static int __init kmemleak_late_init(void)
 	if (!dentry)
 		pr_warning("Failed to create the debugfs kmemleak file\n");
 	mutex_lock(&scan_mutex);
-	start_scan_thread();
+	start_scan_thread(true);
 	mutex_unlock(&scan_mutex);
 
 	pr_info("Kernel memory leak detector initialized\n");
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 16/22] kmemleak: Convert kmemleak kthread into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Catalin Marinas

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the kmemleak kthread into the kthread worker
API because it modifies the scheduling priority.

The result is a simple self-queuing work that just calls kmemleak_scan().

The info messages and set_user_nice() are moved to the functions that
start and stop the worker. These are also renamed to mention worker
instead of thread.

We do not longer need to handle a spurious wakeup and count the remaining
timeout. It is handled by the worker. The delayed work is queued after
the full timeout passes.

Finally, the initial delay is done only when the kthread is started
during the boot. For this we added a parameter to the start function.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/kmemleak.c | 87 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 43 insertions(+), 44 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 25c0ad36fe38..16f1a7bb1697 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -216,7 +216,8 @@ static int kmemleak_error;
 static unsigned long min_addr = ULONG_MAX;
 static unsigned long max_addr;
 
-static struct task_struct *scan_thread;
+static struct kthread_worker *kmemleak_scan_worker;
+static struct delayed_kthread_work kmemleak_scan_work;
 /* used to avoid reporting of recently allocated objects */
 static unsigned long jiffies_min_age;
 static unsigned long jiffies_last_scan;
@@ -1470,54 +1471,48 @@ static void kmemleak_scan(void)
 }
 
 /*
- * Thread function performing automatic memory scanning. Unreferenced objects
- * at the end of a memory scan are reported but only the first time.
+ * Kthread worker function performing automatic memory scanning.
+ * Unreferenced objects at the end of a memory scan are reported
+ * but only the first time.
  */
-static int kmemleak_scan_thread(void *arg)
+static void kmemleak_scan_func(struct kthread_work *dummy)
 {
-	static int first_run = 1;
-
-	pr_info("Automatic memory scanning thread started\n");
-	set_user_nice(current, 10);
-
-	/*
-	 * Wait before the first scan to allow the system to fully initialize.
-	 */
-	if (first_run) {
-		first_run = 0;
-		ssleep(SECS_FIRST_SCAN);
-	}
-
-	while (!kthread_should_stop()) {
-		signed long timeout = jiffies_scan_wait;
-
-		mutex_lock(&scan_mutex);
-		kmemleak_scan();
-		mutex_unlock(&scan_mutex);
-
-		/* wait before the next scan */
-		while (timeout && !kthread_should_stop())
-			timeout = schedule_timeout_interruptible(timeout);
-	}
-
-	pr_info("Automatic memory scanning thread ended\n");
+	mutex_lock(&scan_mutex);
+	kmemleak_scan();
+	mutex_unlock(&scan_mutex);
 
-	return 0;
+	queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+				   jiffies_scan_wait);
 }
 
 /*
  * Start the automatic memory scanning thread. This function must be called
  * with the scan_mutex held.
  */
-static void start_scan_thread(void)
+static void start_scan_thread(bool boot)
 {
-	if (scan_thread)
+	unsigned long timeout = 0;
+
+	if (kmemleak_scan_worker)
 		return;
-	scan_thread = kthread_run(kmemleak_scan_thread, NULL, "kmemleak");
-	if (IS_ERR(scan_thread)) {
-		pr_warning("Failed to create the scan thread\n");
-		scan_thread = NULL;
+
+	init_delayed_kthread_work(&kmemleak_scan_work, kmemleak_scan_func);
+	kmemleak_scan_worker = create_kthread_worker(0, "kmemleak");
+	if (IS_ERR(kmemleak_scan_worker)) {
+		pr_warn("Failed to create the memory scan worker\n");
+		kmemleak_scan_worker = NULL;
 	}
+	pr_info("Automatic memory scanning thread started\n");
+	set_user_nice(kmemleak_scan_worker->task, 10);
+
+	/*
+	 * Wait before the first scan to allow the system to fully initialize.
+	 */
+	if (boot)
+		timeout = msecs_to_jiffies(SECS_FIRST_SCAN * MSEC_PER_SEC);
+
+	queue_delayed_kthread_work(kmemleak_scan_worker, &kmemleak_scan_work,
+				   timeout);
 }
 
 /*
@@ -1526,10 +1521,14 @@ static void start_scan_thread(void)
  */
 static void stop_scan_thread(void)
 {
-	if (scan_thread) {
-		kthread_stop(scan_thread);
-		scan_thread = NULL;
-	}
+	if (!kmemleak_scan_worker)
+		return;
+
+	cancel_delayed_kthread_work_sync(&kmemleak_scan_work);
+	destroy_kthread_worker(kmemleak_scan_worker);
+	kmemleak_scan_worker = NULL;
+
+	pr_info("Automatic memory scanning thread ended\n");
 }
 
 /*
@@ -1726,7 +1725,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
 	else if (strncmp(buf, "stack=off", 9) == 0)
 		kmemleak_stack_scan = 0;
 	else if (strncmp(buf, "scan=on", 7) == 0)
-		start_scan_thread();
+		start_scan_thread(false);
 	else if (strncmp(buf, "scan=off", 8) == 0)
 		stop_scan_thread();
 	else if (strncmp(buf, "scan=", 5) == 0) {
@@ -1738,7 +1737,7 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
 		stop_scan_thread();
 		if (secs) {
 			jiffies_scan_wait = msecs_to_jiffies(secs * 1000);
-			start_scan_thread();
+			start_scan_thread(false);
 		}
 	} else if (strncmp(buf, "scan", 4) == 0)
 		kmemleak_scan();
@@ -1962,7 +1961,7 @@ static int __init kmemleak_late_init(void)
 	if (!dentry)
 		pr_warning("Failed to create the debugfs kmemleak file\n");
 	mutex_lock(&scan_mutex);
-	start_scan_thread();
+	start_scan_thread(true);
 	mutex_unlock(&scan_mutex);
 
 	pr_info("Kernel memory leak detector initialized\n");
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 17/22] ipmi: Convert kipmi kthread into kthread worker API
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts kipmi kthread into the kthread worker API because
it modifies the scheduling priority. The change is quite straightforward.

First, we move the per-thread variable "busy_until" into the per-thread
structure struct smi_info. As a side effect, we could omit one parameter
in ipmi_thread_busy_wait(). On the other hand, the structure could not
longer be passed with the const qualifier.

The value of "busy_until" is initialized when the kthread is created.
Also the scheduling priority is set there. This helps to avoid an extra
init work.

One iteration of the kthread cycle is moved to a delayed work function.
The different delays between the cycles are solved the following way:

  + immediate cycle (nope) is converted into goto within the same work

  + immediate cycle with a possible reschedule is converted into
    re-queuing with a zero delay

  + schedule_timeout() is converted into re-queuing with the given
    delay

  + interruptible sleep is converted into nothing; The work
    will get queued again from the check_start_timer_thread().
    By other words the external wakeup_up_process() will get
    replaced by queuing with a zero delay.

Probably the most tricky change is when the worker is being stopped.
We need to explicitly cancel the work to prevent it from re-queuing.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
---
 drivers/char/ipmi/ipmi_si_intf.c | 121 ++++++++++++++++++++++-----------------
 1 file changed, 69 insertions(+), 52 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 9fda22e3387e..84d4e8158e92 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -303,7 +303,9 @@ struct smi_info {
 	/* Counters and things for the proc filesystem. */
 	atomic_t stats[SI_NUM_STATS];
 
-	struct task_struct *thread;
+	struct kthread_worker *worker;
+	struct delayed_kthread_work work;
+	struct timespec64 busy_until;
 
 	struct list_head link;
 	union ipmi_smi_info_union addr_info;
@@ -428,8 +430,8 @@ static void start_new_msg(struct smi_info *smi_info, unsigned char *msg,
 {
 	smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
-	if (smi_info->thread)
-		wake_up_process(smi_info->thread);
+	if (smi_info->worker)
+		mod_delayed_kthread_work(smi_info->worker, &smi_info->work, 0);
 
 	smi_info->handlers->start_transaction(smi_info->si_sm, msg, size);
 }
@@ -952,8 +954,9 @@ static void check_start_timer_thread(struct smi_info *smi_info)
 	if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL) {
 		smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
-		if (smi_info->thread)
-			wake_up_process(smi_info->thread);
+		if (smi_info->worker)
+			mod_delayed_kthread_work(smi_info->worker,
+						 &smi_info->work, 0);
 
 		start_next_msg(smi_info);
 		smi_event_handler(smi_info, 0);
@@ -1031,10 +1034,10 @@ static inline int ipmi_si_is_busy(struct timespec64 *ts)
 }
 
 static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
-					const struct smi_info *smi_info,
-					struct timespec64 *busy_until)
+					struct smi_info *smi_info)
 {
 	unsigned int max_busy_us = 0;
+	struct timespec64 *busy_until = &smi_info->busy_until;
 
 	if (smi_info->intf_num < num_max_busy_us)
 		max_busy_us = kipmid_max_busy_us[smi_info->intf_num];
@@ -1065,53 +1068,49 @@ static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
  * (if that is enabled).  See the paragraph on kimid_max_busy_us in
  * Documentation/IPMI.txt for details.
  */
-static int ipmi_thread(void *data)
+static void ipmi_kthread_worker_func(struct kthread_work *work)
 {
-	struct smi_info *smi_info = data;
+	struct smi_info *smi_info = container_of(work, struct smi_info,
+						 work.work);
 	unsigned long flags;
 	enum si_sm_result smi_result;
-	struct timespec64 busy_until;
+	int busy_wait;
 
-	ipmi_si_set_not_busy(&busy_until);
-	set_user_nice(current, MAX_NICE);
-	while (!kthread_should_stop()) {
-		int busy_wait;
+next:
+	spin_lock_irqsave(&(smi_info->si_lock), flags);
+	smi_result = smi_event_handler(smi_info, 0);
 
-		spin_lock_irqsave(&(smi_info->si_lock), flags);
-		smi_result = smi_event_handler(smi_info, 0);
+	/*
+	 * If the driver is doing something, there is a possible
+	 * race with the timer.  If the timer handler see idle,
+	 * and the thread here sees something else, the timer
+	 * handler won't restart the timer even though it is
+	 * required.  So start it here if necessary.
+	 */
+	if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
+		smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
-		/*
-		 * If the driver is doing something, there is a possible
-		 * race with the timer.  If the timer handler see idle,
-		 * and the thread here sees something else, the timer
-		 * handler won't restart the timer even though it is
-		 * required.  So start it here if necessary.
-		 */
-		if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
-			smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
-
-		spin_unlock_irqrestore(&(smi_info->si_lock), flags);
-		busy_wait = ipmi_thread_busy_wait(smi_result, smi_info,
-						  &busy_until);
-		if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
-			; /* do nothing */
-		else if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait)
-			schedule();
-		else if (smi_result == SI_SM_IDLE) {
-			if (atomic_read(&smi_info->need_watch)) {
-				schedule_timeout_interruptible(100);
-			} else {
-				/* Wait to be woken up when we are needed. */
-				__set_current_state(TASK_INTERRUPTIBLE);
-				schedule();
-			}
-		} else
-			schedule_timeout_interruptible(1);
+	spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+	busy_wait = ipmi_thread_busy_wait(smi_result, smi_info);
+
+	if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
+		goto next;
+	if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait) {
+		queue_delayed_kthread_work(smi_info->worker,
+					   &smi_info->work, 0);
+	} else if (smi_result == SI_SM_IDLE) {
+		if (atomic_read(&smi_info->need_watch)) {
+			queue_delayed_kthread_work(smi_info->worker,
+						   &smi_info->work, 100);
+		} else {
+			/* Nope. Wait to be queued when we are needed. */
+		}
+	} else {
+		queue_delayed_kthread_work(smi_info->worker,
+					   &smi_info->work, 1);
 	}
-	return 0;
 }
 
-
 static void poll(void *send_info)
 {
 	struct smi_info *smi_info = send_info;
@@ -1252,17 +1251,30 @@ static int smi_start_processing(void       *send_info,
 		enable = 1;
 
 	if (enable) {
-		new_smi->thread = kthread_run(ipmi_thread, new_smi,
-					      "kipmi%d", new_smi->intf_num);
-		if (IS_ERR(new_smi->thread)) {
+		struct kthread_worker *worker;
+
+		worker = create_kthread_worker(0, "kipmi%d",
+					       new_smi->intf_num);
+
+		if (IS_ERR(worker)) {
 			dev_notice(new_smi->dev, "Could not start"
 				   " kernel thread due to error %ld, only using"
 				   " timers to drive the interface\n",
-				   PTR_ERR(new_smi->thread));
-			new_smi->thread = NULL;
+				   PTR_ERR(worker));
+			goto out;
 		}
+
+		ipmi_si_set_not_busy(&new_smi->busy_until);
+		set_user_nice(worker->task, MAX_NICE);
+
+		init_delayed_kthread_work(&new_smi->work,
+					  ipmi_kthread_worker_func);
+		queue_delayed_kthread_work(worker, &new_smi->work, 0);
+
+		new_smi->worker = worker;
 	}
 
+out:
 	return 0;
 }
 
@@ -3440,8 +3452,13 @@ static void check_for_broken_irqs(struct smi_info *smi_info)
 
 static inline void wait_for_timer_and_thread(struct smi_info *smi_info)
 {
-	if (smi_info->thread != NULL)
-		kthread_stop(smi_info->thread);
+	if (smi_info->worker != NULL) {
+		struct kthread_worker *worker = smi_info->worker;
+
+		smi_info->worker = NULL;
+		cancel_delayed_kthread_work_sync(&smi_info->work);
+		destroy_kthread_worker(worker);
+	}
 	if (smi_info->timer_running)
 		del_timer_sync(&smi_info->si_timer);
 }
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 17/22] ipmi: Convert kipmi kthread into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts kipmi kthread into the kthread worker API because
it modifies the scheduling priority. The change is quite straightforward.

First, we move the per-thread variable "busy_until" into the per-thread
structure struct smi_info. As a side effect, we could omit one parameter
in ipmi_thread_busy_wait(). On the other hand, the structure could not
longer be passed with the const qualifier.

The value of "busy_until" is initialized when the kthread is created.
Also the scheduling priority is set there. This helps to avoid an extra
init work.

One iteration of the kthread cycle is moved to a delayed work function.
The different delays between the cycles are solved the following way:

  + immediate cycle (nope) is converted into goto within the same work

  + immediate cycle with a possible reschedule is converted into
    re-queuing with a zero delay

  + schedule_timeout() is converted into re-queuing with the given
    delay

  + interruptible sleep is converted into nothing; The work
    will get queued again from the check_start_timer_thread().
    By other words the external wakeup_up_process() will get
    replaced by queuing with a zero delay.

Probably the most tricky change is when the worker is being stopped.
We need to explicitly cancel the work to prevent it from re-queuing.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
---
 drivers/char/ipmi/ipmi_si_intf.c | 121 ++++++++++++++++++++++-----------------
 1 file changed, 69 insertions(+), 52 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 9fda22e3387e..84d4e8158e92 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -303,7 +303,9 @@ struct smi_info {
 	/* Counters and things for the proc filesystem. */
 	atomic_t stats[SI_NUM_STATS];
 
-	struct task_struct *thread;
+	struct kthread_worker *worker;
+	struct delayed_kthread_work work;
+	struct timespec64 busy_until;
 
 	struct list_head link;
 	union ipmi_smi_info_union addr_info;
@@ -428,8 +430,8 @@ static void start_new_msg(struct smi_info *smi_info, unsigned char *msg,
 {
 	smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
-	if (smi_info->thread)
-		wake_up_process(smi_info->thread);
+	if (smi_info->worker)
+		mod_delayed_kthread_work(smi_info->worker, &smi_info->work, 0);
 
 	smi_info->handlers->start_transaction(smi_info->si_sm, msg, size);
 }
@@ -952,8 +954,9 @@ static void check_start_timer_thread(struct smi_info *smi_info)
 	if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL) {
 		smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
-		if (smi_info->thread)
-			wake_up_process(smi_info->thread);
+		if (smi_info->worker)
+			mod_delayed_kthread_work(smi_info->worker,
+						 &smi_info->work, 0);
 
 		start_next_msg(smi_info);
 		smi_event_handler(smi_info, 0);
@@ -1031,10 +1034,10 @@ static inline int ipmi_si_is_busy(struct timespec64 *ts)
 }
 
 static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
-					const struct smi_info *smi_info,
-					struct timespec64 *busy_until)
+					struct smi_info *smi_info)
 {
 	unsigned int max_busy_us = 0;
+	struct timespec64 *busy_until = &smi_info->busy_until;
 
 	if (smi_info->intf_num < num_max_busy_us)
 		max_busy_us = kipmid_max_busy_us[smi_info->intf_num];
@@ -1065,53 +1068,49 @@ static inline int ipmi_thread_busy_wait(enum si_sm_result smi_result,
  * (if that is enabled).  See the paragraph on kimid_max_busy_us in
  * Documentation/IPMI.txt for details.
  */
-static int ipmi_thread(void *data)
+static void ipmi_kthread_worker_func(struct kthread_work *work)
 {
-	struct smi_info *smi_info = data;
+	struct smi_info *smi_info = container_of(work, struct smi_info,
+						 work.work);
 	unsigned long flags;
 	enum si_sm_result smi_result;
-	struct timespec64 busy_until;
+	int busy_wait;
 
-	ipmi_si_set_not_busy(&busy_until);
-	set_user_nice(current, MAX_NICE);
-	while (!kthread_should_stop()) {
-		int busy_wait;
+next:
+	spin_lock_irqsave(&(smi_info->si_lock), flags);
+	smi_result = smi_event_handler(smi_info, 0);
 
-		spin_lock_irqsave(&(smi_info->si_lock), flags);
-		smi_result = smi_event_handler(smi_info, 0);
+	/*
+	 * If the driver is doing something, there is a possible
+	 * race with the timer.  If the timer handler see idle,
+	 * and the thread here sees something else, the timer
+	 * handler won't restart the timer even though it is
+	 * required.  So start it here if necessary.
+	 */
+	if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
+		smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
 
-		/*
-		 * If the driver is doing something, there is a possible
-		 * race with the timer.  If the timer handler see idle,
-		 * and the thread here sees something else, the timer
-		 * handler won't restart the timer even though it is
-		 * required.  So start it here if necessary.
-		 */
-		if (smi_result != SI_SM_IDLE && !smi_info->timer_running)
-			smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
-
-		spin_unlock_irqrestore(&(smi_info->si_lock), flags);
-		busy_wait = ipmi_thread_busy_wait(smi_result, smi_info,
-						  &busy_until);
-		if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
-			; /* do nothing */
-		else if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait)
-			schedule();
-		else if (smi_result == SI_SM_IDLE) {
-			if (atomic_read(&smi_info->need_watch)) {
-				schedule_timeout_interruptible(100);
-			} else {
-				/* Wait to be woken up when we are needed. */
-				__set_current_state(TASK_INTERRUPTIBLE);
-				schedule();
-			}
-		} else
-			schedule_timeout_interruptible(1);
+	spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+	busy_wait = ipmi_thread_busy_wait(smi_result, smi_info);
+
+	if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
+		goto next;
+	if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait) {
+		queue_delayed_kthread_work(smi_info->worker,
+					   &smi_info->work, 0);
+	} else if (smi_result == SI_SM_IDLE) {
+		if (atomic_read(&smi_info->need_watch)) {
+			queue_delayed_kthread_work(smi_info->worker,
+						   &smi_info->work, 100);
+		} else {
+			/* Nope. Wait to be queued when we are needed. */
+		}
+	} else {
+		queue_delayed_kthread_work(smi_info->worker,
+					   &smi_info->work, 1);
 	}
-	return 0;
 }
 
-
 static void poll(void *send_info)
 {
 	struct smi_info *smi_info = send_info;
@@ -1252,17 +1251,30 @@ static int smi_start_processing(void       *send_info,
 		enable = 1;
 
 	if (enable) {
-		new_smi->thread = kthread_run(ipmi_thread, new_smi,
-					      "kipmi%d", new_smi->intf_num);
-		if (IS_ERR(new_smi->thread)) {
+		struct kthread_worker *worker;
+
+		worker = create_kthread_worker(0, "kipmi%d",
+					       new_smi->intf_num);
+
+		if (IS_ERR(worker)) {
 			dev_notice(new_smi->dev, "Could not start"
 				   " kernel thread due to error %ld, only using"
 				   " timers to drive the interface\n",
-				   PTR_ERR(new_smi->thread));
-			new_smi->thread = NULL;
+				   PTR_ERR(worker));
+			goto out;
 		}
+
+		ipmi_si_set_not_busy(&new_smi->busy_until);
+		set_user_nice(worker->task, MAX_NICE);
+
+		init_delayed_kthread_work(&new_smi->work,
+					  ipmi_kthread_worker_func);
+		queue_delayed_kthread_work(worker, &new_smi->work, 0);
+
+		new_smi->worker = worker;
 	}
 
+out:
 	return 0;
 }
 
@@ -3440,8 +3452,13 @@ static void check_for_broken_irqs(struct smi_info *smi_info)
 
 static inline void wait_for_timer_and_thread(struct smi_info *smi_info)
 {
-	if (smi_info->thread != NULL)
-		kthread_stop(smi_info->thread);
+	if (smi_info->worker != NULL) {
+		struct kthread_worker *worker = smi_info->worker;
+
+		smi_info->worker = NULL;
+		cancel_delayed_kthread_work_sync(&smi_info->work);
+		destroy_kthread_worker(worker);
+	}
 	if (smi_info->timer_running)
 		del_timer_sync(&smi_info->si_timer);
 }
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 18/22] IB/fmr_pool: Convert the cleanup thread into kthread worker API
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Doug Ledford, Sean Hefty, Hal Rosenstock, linux-rdma

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the frm_pool kthread into the kthread worker
API because I am not sure how busy the thread is. It is well
possible that it does not need a dedicated kthread and workqueues
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.

The patch moves one iteration from the kthread into the work function.
It preserves the check for a spurious queuing (wake up). Then it
processes one request. Finally, it re-queues itself if more requests
are pending.

Otherwise, wake_up_process() is replaced by queuing the work.

Important: The change is only compile tested. I did not find an easy
way how to check it in a real life.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Doug Ledford <dledford@redhat.com>
CC: Sean Hefty <sean.hefty@intel.com>
CC: Hal Rosenstock <hal.rosenstock@gmail.com>
CC: linux-rdma@vger.kernel.org
---
 drivers/infiniband/core/fmr_pool.c | 54 ++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
index 6ac3683c144b..8f0307b74fcd 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -96,7 +96,8 @@ struct ib_fmr_pool {
 						   void *              arg);
 	void                     *flush_arg;
 
-	struct task_struct       *thread;
+	struct kthread_worker	  *worker;
+	struct kthread_work	  work;
 
 	atomic_t                  req_ser;
 	atomic_t                  flush_ser;
@@ -174,29 +175,26 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
 	spin_unlock_irq(&pool->pool_lock);
 }
 
-static int ib_fmr_cleanup_thread(void *pool_ptr)
+static void ib_fmr_cleanup_func(struct kthread_work *work)
 {
-	struct ib_fmr_pool *pool = pool_ptr;
+	struct ib_fmr_pool *pool = container_of(work, struct ib_fmr_pool, work);
 
-	do {
-		if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0) {
-			ib_fmr_batch_release(pool);
-
-			atomic_inc(&pool->flush_ser);
-			wake_up_interruptible(&pool->force_wait);
+	/*
+	 * The same request might be queued twice when it appears and
+	 * by re-queuing from this work.
+	 */
+	if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0)
+		return;
 
-			if (pool->flush_function)
-				pool->flush_function(pool, pool->flush_arg);
-		}
+	ib_fmr_batch_release(pool);
+	atomic_inc(&pool->flush_ser);
+	wake_up_interruptible(&pool->force_wait);
 
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0 &&
-		    !kthread_should_stop())
-			schedule();
-		__set_current_state(TASK_RUNNING);
-	} while (!kthread_should_stop());
+	if (pool->flush_function)
+		pool->flush_function(pool, pool->flush_arg);
 
-	return 0;
+	if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0)
+		queue_kthread_work(pool->worker, &pool->work);
 }
 
 /**
@@ -270,15 +268,13 @@ struct ib_fmr_pool *ib_create_fmr_pool(struct ib_pd             *pd,
 	atomic_set(&pool->flush_ser, 0);
 	init_waitqueue_head(&pool->force_wait);
 
-	pool->thread = kthread_run(ib_fmr_cleanup_thread,
-				   pool,
-				   "ib_fmr(%s)",
-				   device->name);
-	if (IS_ERR(pool->thread)) {
-		printk(KERN_WARNING PFX "couldn't start cleanup thread\n");
-		ret = PTR_ERR(pool->thread);
+	pool->worker = create_kthread_worker(0, "ib_fmr(%s)", device->name);
+	if (IS_ERR(pool->worker)) {
+		pr_warn(PFX "couldn't start cleanup kthread worker\n");
+		ret = PTR_ERR(pool->worker);
 		goto out_free_pool;
 	}
+	init_kthread_work(&pool->work, ib_fmr_cleanup_func);
 
 	{
 		struct ib_pool_fmr *fmr;
@@ -346,7 +342,7 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
 	LIST_HEAD(fmr_list);
 	int                 i;
 
-	kthread_stop(pool->thread);
+	destroy_kthread_worker(pool->worker);
 	ib_fmr_batch_release(pool);
 
 	i = 0;
@@ -396,7 +392,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
 	spin_unlock_irq(&pool->pool_lock);
 
 	serial = atomic_inc_return(&pool->req_ser);
-	wake_up_process(pool->thread);
+	queue_kthread_work(pool->worker, &pool->work);
 
 	if (wait_event_interruptible(pool->force_wait,
 				     atomic_read(&pool->flush_ser) - serial >= 0))
@@ -510,7 +506,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr)
 			list_add_tail(&fmr->list, &pool->dirty_list);
 			if (++pool->dirty_len >= pool->dirty_watermark) {
 				atomic_inc(&pool->req_ser);
-				wake_up_process(pool->thread);
+				queue_kthread_work(pool->worker, &pool->work);
 			}
 		}
 	}
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 18/22] IB/fmr_pool: Convert the cleanup thread into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Doug Ledford, Sean Hefty, Hal Rosenstock, linux-rdma

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the frm_pool kthread into the kthread worker
API because I am not sure how busy the thread is. It is well
possible that it does not need a dedicated kthread and workqueues
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.

The patch moves one iteration from the kthread into the work function.
It preserves the check for a spurious queuing (wake up). Then it
processes one request. Finally, it re-queues itself if more requests
are pending.

Otherwise, wake_up_process() is replaced by queuing the work.

Important: The change is only compile tested. I did not find an easy
way how to check it in a real life.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Doug Ledford <dledford@redhat.com>
CC: Sean Hefty <sean.hefty@intel.com>
CC: Hal Rosenstock <hal.rosenstock@gmail.com>
CC: linux-rdma@vger.kernel.org
---
 drivers/infiniband/core/fmr_pool.c | 54 ++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
index 6ac3683c144b..8f0307b74fcd 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -96,7 +96,8 @@ struct ib_fmr_pool {
 						   void *              arg);
 	void                     *flush_arg;
 
-	struct task_struct       *thread;
+	struct kthread_worker	  *worker;
+	struct kthread_work	  work;
 
 	atomic_t                  req_ser;
 	atomic_t                  flush_ser;
@@ -174,29 +175,26 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
 	spin_unlock_irq(&pool->pool_lock);
 }
 
-static int ib_fmr_cleanup_thread(void *pool_ptr)
+static void ib_fmr_cleanup_func(struct kthread_work *work)
 {
-	struct ib_fmr_pool *pool = pool_ptr;
+	struct ib_fmr_pool *pool = container_of(work, struct ib_fmr_pool, work);
 
-	do {
-		if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0) {
-			ib_fmr_batch_release(pool);
-
-			atomic_inc(&pool->flush_ser);
-			wake_up_interruptible(&pool->force_wait);
+	/*
+	 * The same request might be queued twice when it appears and
+	 * by re-queuing from this work.
+	 */
+	if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0)
+		return;
 
-			if (pool->flush_function)
-				pool->flush_function(pool, pool->flush_arg);
-		}
+	ib_fmr_batch_release(pool);
+	atomic_inc(&pool->flush_ser);
+	wake_up_interruptible(&pool->force_wait);
 
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) >= 0 &&
-		    !kthread_should_stop())
-			schedule();
-		__set_current_state(TASK_RUNNING);
-	} while (!kthread_should_stop());
+	if (pool->flush_function)
+		pool->flush_function(pool, pool->flush_arg);
 
-	return 0;
+	if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0)
+		queue_kthread_work(pool->worker, &pool->work);
 }
 
 /**
@@ -270,15 +268,13 @@ struct ib_fmr_pool *ib_create_fmr_pool(struct ib_pd             *pd,
 	atomic_set(&pool->flush_ser, 0);
 	init_waitqueue_head(&pool->force_wait);
 
-	pool->thread = kthread_run(ib_fmr_cleanup_thread,
-				   pool,
-				   "ib_fmr(%s)",
-				   device->name);
-	if (IS_ERR(pool->thread)) {
-		printk(KERN_WARNING PFX "couldn't start cleanup thread\n");
-		ret = PTR_ERR(pool->thread);
+	pool->worker = create_kthread_worker(0, "ib_fmr(%s)", device->name);
+	if (IS_ERR(pool->worker)) {
+		pr_warn(PFX "couldn't start cleanup kthread worker\n");
+		ret = PTR_ERR(pool->worker);
 		goto out_free_pool;
 	}
+	init_kthread_work(&pool->work, ib_fmr_cleanup_func);
 
 	{
 		struct ib_pool_fmr *fmr;
@@ -346,7 +342,7 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
 	LIST_HEAD(fmr_list);
 	int                 i;
 
-	kthread_stop(pool->thread);
+	destroy_kthread_worker(pool->worker);
 	ib_fmr_batch_release(pool);
 
 	i = 0;
@@ -396,7 +392,7 @@ int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
 	spin_unlock_irq(&pool->pool_lock);
 
 	serial = atomic_inc_return(&pool->req_ser);
-	wake_up_process(pool->thread);
+	queue_kthread_work(pool->worker, &pool->work);
 
 	if (wait_event_interruptible(pool->force_wait,
 				     atomic_read(&pool->flush_ser) - serial >= 0))
@@ -510,7 +506,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr *fmr)
 			list_add_tail(&fmr->list, &pool->dirty_list);
 			if (++pool->dirty_len >= pool->dirty_watermark) {
 				atomic_inc(&pool->req_ser);
-				wake_up_process(pool->thread);
+				queue_kthread_work(pool->worker, &pool->work);
 			}
 		}
 	}
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 19/22] memstick/r592: Better synchronize debug messages in r592_io kthread
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Maxim Levitsky

There is an attempt to print debug messages when the kthread is waken
and when it goes into sleep. It does not work well because the spin lock
does not guard all manipulations with the thread state.

I did not find a way how to print a message when the kthread really
goes into sleep. Instead, I added a state variable. It clearly marks
when a series of IO requests is started and finished. It makes sure
that we always have a pair of started/done messages.

The only problem is that it will print these messages also when
the kthread is created and there is no real work. We might want
to use create_kthread() instead of run_kthread(). Then the kthread
will stay stopped until the first request.

Important: This change is only compile tested. I did not find an easy
way how to test it. This is why I was conservative and did not modify
the kthread creation.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Maxim Levitsky <maximlevitsky@gmail.com>
---
 drivers/memstick/host/r592.c | 19 +++++++++----------
 drivers/memstick/host/r592.h |  2 +-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index ef09ba0289d7..dd73c5506fdf 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -568,21 +568,24 @@ static int r592_process_thread(void *data)
 {
 	int error;
 	struct r592_device *dev = (struct r592_device *)data;
-	unsigned long flags;
 
 	while (!kthread_should_stop()) {
-		spin_lock_irqsave(&dev->io_thread_lock, flags);
+		if (!dev->io_started) {
+			dbg_verbose("IO: started");
+			dev->io_started = true;
+		}
+
 		set_current_state(TASK_INTERRUPTIBLE);
 		error = memstick_next_req(dev->host, &dev->req);
-		spin_unlock_irqrestore(&dev->io_thread_lock, flags);
 
 		if (error) {
 			if (error == -ENXIO || error == -EAGAIN) {
-				dbg_verbose("IO: done IO, sleeping");
+				dbg_verbose("IO: done");
 			} else {
 				dbg("IO: unknown error from "
 					"memstick_next_req %d", error);
 			}
+			dev->io_started = false;
 
 			if (kthread_should_stop())
 				set_current_state(TASK_RUNNING);
@@ -714,15 +717,11 @@ static int r592_set_param(struct memstick_host *host,
 static void r592_submit_req(struct memstick_host *host)
 {
 	struct r592_device *dev = memstick_priv(host);
-	unsigned long flags;
 
 	if (dev->req)
 		return;
 
-	spin_lock_irqsave(&dev->io_thread_lock, flags);
-	if (wake_up_process(dev->io_thread))
-		dbg_verbose("IO thread woken to process requests");
-	spin_unlock_irqrestore(&dev->io_thread_lock, flags);
+	wake_up_process(dev->io_thread);
 }
 
 static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -768,7 +767,6 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	dev->irq = pdev->irq;
 	spin_lock_init(&dev->irq_lock);
-	spin_lock_init(&dev->io_thread_lock);
 	init_completion(&dev->dma_done);
 	INIT_KFIFO(dev->pio_fifo);
 	setup_timer(&dev->detect_timer,
@@ -780,6 +778,7 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	host->set_param = r592_set_param;
 	r592_check_dma(dev);
 
+	dev->io_started = false;
 	dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
 	if (IS_ERR(dev->io_thread)) {
 		error = PTR_ERR(dev->io_thread);
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index c5726c1e8832..aa8f0f22f4ce 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -137,10 +137,10 @@ struct r592_device {
 	void __iomem *mmio;
 	int irq;
 	spinlock_t irq_lock;
-	spinlock_t io_thread_lock;
 	struct timer_list detect_timer;
 
 	struct task_struct *io_thread;
+	bool io_started;
 	bool parallel_mode;
 
 	DECLARE_KFIFO(pio_fifo, u8, sizeof(u32));
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 19/22] memstick/r592: Better synchronize debug messages in r592_io kthread
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Petr Mladek, Maxim Levitsky

There is an attempt to print debug messages when the kthread is waken
and when it goes into sleep. It does not work well because the spin lock
does not guard all manipulations with the thread state.

I did not find a way how to print a message when the kthread really
goes into sleep. Instead, I added a state variable. It clearly marks
when a series of IO requests is started and finished. It makes sure
that we always have a pair of started/done messages.

The only problem is that it will print these messages also when
the kthread is created and there is no real work. We might want
to use create_kthread() instead of run_kthread(). Then the kthread
will stay stopped until the first request.

Important: This change is only compile tested. I did not find an easy
way how to test it. This is why I was conservative and did not modify
the kthread creation.

Signed-off-by: Petr Mladek <pmladek-IBi9RG/b67k@public.gmane.org>
CC: Maxim Levitsky <maximlevitsky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/memstick/host/r592.c | 19 +++++++++----------
 drivers/memstick/host/r592.h |  2 +-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index ef09ba0289d7..dd73c5506fdf 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -568,21 +568,24 @@ static int r592_process_thread(void *data)
 {
 	int error;
 	struct r592_device *dev = (struct r592_device *)data;
-	unsigned long flags;
 
 	while (!kthread_should_stop()) {
-		spin_lock_irqsave(&dev->io_thread_lock, flags);
+		if (!dev->io_started) {
+			dbg_verbose("IO: started");
+			dev->io_started = true;
+		}
+
 		set_current_state(TASK_INTERRUPTIBLE);
 		error = memstick_next_req(dev->host, &dev->req);
-		spin_unlock_irqrestore(&dev->io_thread_lock, flags);
 
 		if (error) {
 			if (error == -ENXIO || error == -EAGAIN) {
-				dbg_verbose("IO: done IO, sleeping");
+				dbg_verbose("IO: done");
 			} else {
 				dbg("IO: unknown error from "
 					"memstick_next_req %d", error);
 			}
+			dev->io_started = false;
 
 			if (kthread_should_stop())
 				set_current_state(TASK_RUNNING);
@@ -714,15 +717,11 @@ static int r592_set_param(struct memstick_host *host,
 static void r592_submit_req(struct memstick_host *host)
 {
 	struct r592_device *dev = memstick_priv(host);
-	unsigned long flags;
 
 	if (dev->req)
 		return;
 
-	spin_lock_irqsave(&dev->io_thread_lock, flags);
-	if (wake_up_process(dev->io_thread))
-		dbg_verbose("IO thread woken to process requests");
-	spin_unlock_irqrestore(&dev->io_thread_lock, flags);
+	wake_up_process(dev->io_thread);
 }
 
 static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -768,7 +767,6 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	dev->irq = pdev->irq;
 	spin_lock_init(&dev->irq_lock);
-	spin_lock_init(&dev->io_thread_lock);
 	init_completion(&dev->dma_done);
 	INIT_KFIFO(dev->pio_fifo);
 	setup_timer(&dev->detect_timer,
@@ -780,6 +778,7 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	host->set_param = r592_set_param;
 	r592_check_dma(dev);
 
+	dev->io_started = false;
 	dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
 	if (IS_ERR(dev->io_thread)) {
 		error = PTR_ERR(dev->io_thread);
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index c5726c1e8832..aa8f0f22f4ce 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -137,10 +137,10 @@ struct r592_device {
 	void __iomem *mmio;
 	int irq;
 	spinlock_t irq_lock;
-	spinlock_t io_thread_lock;
 	struct timer_list detect_timer;
 
 	struct task_struct *io_thread;
+	bool io_started;
 	bool parallel_mode;
 
 	DECLARE_KFIFO(pio_fifo, u8, sizeof(u32));
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 19/22] memstick/r592: Better synchronize debug messages in r592_io kthread
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Maxim Levitsky

There is an attempt to print debug messages when the kthread is waken
and when it goes into sleep. It does not work well because the spin lock
does not guard all manipulations with the thread state.

I did not find a way how to print a message when the kthread really
goes into sleep. Instead, I added a state variable. It clearly marks
when a series of IO requests is started and finished. It makes sure
that we always have a pair of started/done messages.

The only problem is that it will print these messages also when
the kthread is created and there is no real work. We might want
to use create_kthread() instead of run_kthread(). Then the kthread
will stay stopped until the first request.

Important: This change is only compile tested. I did not find an easy
way how to test it. This is why I was conservative and did not modify
the kthread creation.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Maxim Levitsky <maximlevitsky@gmail.com>
---
 drivers/memstick/host/r592.c | 19 +++++++++----------
 drivers/memstick/host/r592.h |  2 +-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index ef09ba0289d7..dd73c5506fdf 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -568,21 +568,24 @@ static int r592_process_thread(void *data)
 {
 	int error;
 	struct r592_device *dev = (struct r592_device *)data;
-	unsigned long flags;
 
 	while (!kthread_should_stop()) {
-		spin_lock_irqsave(&dev->io_thread_lock, flags);
+		if (!dev->io_started) {
+			dbg_verbose("IO: started");
+			dev->io_started = true;
+		}
+
 		set_current_state(TASK_INTERRUPTIBLE);
 		error = memstick_next_req(dev->host, &dev->req);
-		spin_unlock_irqrestore(&dev->io_thread_lock, flags);
 
 		if (error) {
 			if (error == -ENXIO || error == -EAGAIN) {
-				dbg_verbose("IO: done IO, sleeping");
+				dbg_verbose("IO: done");
 			} else {
 				dbg("IO: unknown error from "
 					"memstick_next_req %d", error);
 			}
+			dev->io_started = false;
 
 			if (kthread_should_stop())
 				set_current_state(TASK_RUNNING);
@@ -714,15 +717,11 @@ static int r592_set_param(struct memstick_host *host,
 static void r592_submit_req(struct memstick_host *host)
 {
 	struct r592_device *dev = memstick_priv(host);
-	unsigned long flags;
 
 	if (dev->req)
 		return;
 
-	spin_lock_irqsave(&dev->io_thread_lock, flags);
-	if (wake_up_process(dev->io_thread))
-		dbg_verbose("IO thread woken to process requests");
-	spin_unlock_irqrestore(&dev->io_thread_lock, flags);
+	wake_up_process(dev->io_thread);
 }
 
 static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -768,7 +767,6 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	dev->irq = pdev->irq;
 	spin_lock_init(&dev->irq_lock);
-	spin_lock_init(&dev->io_thread_lock);
 	init_completion(&dev->dma_done);
 	INIT_KFIFO(dev->pio_fifo);
 	setup_timer(&dev->detect_timer,
@@ -780,6 +778,7 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	host->set_param = r592_set_param;
 	r592_check_dma(dev);
 
+	dev->io_started = false;
 	dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
 	if (IS_ERR(dev->io_thread)) {
 		error = PTR_ERR(dev->io_thread);
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index c5726c1e8832..aa8f0f22f4ce 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -137,10 +137,10 @@ struct r592_device {
 	void __iomem *mmio;
 	int irq;
 	spinlock_t irq_lock;
-	spinlock_t io_thread_lock;
 	struct timer_list detect_timer;
 
 	struct task_struct *io_thread;
+	bool io_started;
 	bool parallel_mode;
 
 	DECLARE_KFIFO(pio_fifo, u8, sizeof(u32));
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 20/22] memstick/r592: convert r592_io kthread into kthread worker API
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Maxim Levitsky

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the r592_io kthread into the kthread worker
API. I am not sure how busy the kthread is and if anyone would
like to control the resources. It is well possible that a workqueue
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.

The patch moves one iteration from the kthread into the kthread
worker function. It helps to remove all the hackery with process
state and kthread_should_stop().

The work is queued instead of waking the thread.

The work is explicitly canceled before the worker is destroyed.
It is self-queuing and it might take a long time until the queue
is drained, otherwise.

Important: The change is only compile tested. I did not find an easy
way how to check it in use.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Maxim Levitsky <maximlevitsky@gmail.com>
---
 drivers/memstick/host/r592.c | 58 ++++++++++++++++++++------------------------
 drivers/memstick/host/r592.h |  3 ++-
 2 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index dd73c5506fdf..9d7cc27d8edc 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -563,40 +563,32 @@ out:
 	return;
 }
 
-/* Main request processing thread */
-static int r592_process_thread(void *data)
+/* Main request processing work */
+static void r592_process_func(struct kthread_work *work)
 {
 	int error;
-	struct r592_device *dev = (struct r592_device *)data;
-
-	while (!kthread_should_stop()) {
-		if (!dev->io_started) {
-			dbg_verbose("IO: started");
-			dev->io_started = true;
-		}
-
-		set_current_state(TASK_INTERRUPTIBLE);
-		error = memstick_next_req(dev->host, &dev->req);
+	struct r592_device *dev =
+		container_of(work, struct r592_device, io_work);
 
-		if (error) {
-			if (error == -ENXIO || error == -EAGAIN) {
-				dbg_verbose("IO: done");
-			} else {
-				dbg("IO: unknown error from "
-					"memstick_next_req %d", error);
-			}
-			dev->io_started = false;
+	if (!dev->io_started) {
+		dbg_verbose("IO: started");
+		dev->io_started = true;
+	}
 
-			if (kthread_should_stop())
-				set_current_state(TASK_RUNNING);
+	error = memstick_next_req(dev->host, &dev->req);
 
-			schedule();
+	if (error) {
+		if (error == -ENXIO || error == -EAGAIN) {
+			dbg_verbose("IO: done");
 		} else {
-			set_current_state(TASK_RUNNING);
-			r592_execute_tpc(dev);
+			dbg("IO: unknown error from memstick_next_req %d",
+			    error);
 		}
+		dev->io_started = false;
+	} else {
+		r592_execute_tpc(dev);
+		queue_kthread_work(dev->io_worker, &dev->io_work);
 	}
-	return 0;
 }
 
 /* Reprogram chip to detect change in card state */
@@ -721,7 +713,7 @@ static void r592_submit_req(struct memstick_host *host)
 	if (dev->req)
 		return;
 
-	wake_up_process(dev->io_thread);
+	queue_kthread_work(dev->io_worker, &dev->io_work);
 }
 
 static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -779,9 +771,10 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	r592_check_dma(dev);
 
 	dev->io_started = false;
-	dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
-	if (IS_ERR(dev->io_thread)) {
-		error = PTR_ERR(dev->io_thread);
+	init_kthread_work(&dev->io_work, r592_process_func);
+	dev->io_worker = create_kthread_worker(0, "r592_io");
+	if (IS_ERR(dev->io_worker)) {
+		error = PTR_ERR(dev->io_worker);
 		goto error5;
 	}
 
@@ -807,7 +800,7 @@ error6:
 		dma_free_coherent(&pdev->dev, PAGE_SIZE, dev->dummy_dma_page,
 			dev->dummy_dma_page_physical_address);
 
-	kthread_stop(dev->io_thread);
+	destroy_kthread_worker(dev->io_worker);
 error5:
 	iounmap(dev->mmio);
 error4:
@@ -827,7 +820,8 @@ static void r592_remove(struct pci_dev *pdev)
 
 	/* Stop the processing thread.
 	That ensures that we won't take any more requests */
-	kthread_stop(dev->io_thread);
+	cancel_kthread_work_sync(&dev->io_work);
+	destroy_kthread_worker(dev->io_worker);
 
 	r592_enable_device(dev, false);
 
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index aa8f0f22f4ce..1ac71380ac04 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -139,7 +139,8 @@ struct r592_device {
 	spinlock_t irq_lock;
 	struct timer_list detect_timer;
 
-	struct task_struct *io_thread;
+	struct kthread_worker *io_worker;
+	struct kthread_work io_work;
 	bool io_started;
 	bool parallel_mode;
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 20/22] memstick/r592: convert r592_io kthread into kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Maxim Levitsky

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the r592_io kthread into the kthread worker
API. I am not sure how busy the kthread is and if anyone would
like to control the resources. It is well possible that a workqueue
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.

The patch moves one iteration from the kthread into the kthread
worker function. It helps to remove all the hackery with process
state and kthread_should_stop().

The work is queued instead of waking the thread.

The work is explicitly canceled before the worker is destroyed.
It is self-queuing and it might take a long time until the queue
is drained, otherwise.

Important: The change is only compile tested. I did not find an easy
way how to check it in use.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Maxim Levitsky <maximlevitsky@gmail.com>
---
 drivers/memstick/host/r592.c | 58 ++++++++++++++++++++------------------------
 drivers/memstick/host/r592.h |  3 ++-
 2 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/memstick/host/r592.c b/drivers/memstick/host/r592.c
index dd73c5506fdf..9d7cc27d8edc 100644
--- a/drivers/memstick/host/r592.c
+++ b/drivers/memstick/host/r592.c
@@ -563,40 +563,32 @@ out:
 	return;
 }
 
-/* Main request processing thread */
-static int r592_process_thread(void *data)
+/* Main request processing work */
+static void r592_process_func(struct kthread_work *work)
 {
 	int error;
-	struct r592_device *dev = (struct r592_device *)data;
-
-	while (!kthread_should_stop()) {
-		if (!dev->io_started) {
-			dbg_verbose("IO: started");
-			dev->io_started = true;
-		}
-
-		set_current_state(TASK_INTERRUPTIBLE);
-		error = memstick_next_req(dev->host, &dev->req);
+	struct r592_device *dev =
+		container_of(work, struct r592_device, io_work);
 
-		if (error) {
-			if (error == -ENXIO || error == -EAGAIN) {
-				dbg_verbose("IO: done");
-			} else {
-				dbg("IO: unknown error from "
-					"memstick_next_req %d", error);
-			}
-			dev->io_started = false;
+	if (!dev->io_started) {
+		dbg_verbose("IO: started");
+		dev->io_started = true;
+	}
 
-			if (kthread_should_stop())
-				set_current_state(TASK_RUNNING);
+	error = memstick_next_req(dev->host, &dev->req);
 
-			schedule();
+	if (error) {
+		if (error == -ENXIO || error == -EAGAIN) {
+			dbg_verbose("IO: done");
 		} else {
-			set_current_state(TASK_RUNNING);
-			r592_execute_tpc(dev);
+			dbg("IO: unknown error from memstick_next_req %d",
+			    error);
 		}
+		dev->io_started = false;
+	} else {
+		r592_execute_tpc(dev);
+		queue_kthread_work(dev->io_worker, &dev->io_work);
 	}
-	return 0;
 }
 
 /* Reprogram chip to detect change in card state */
@@ -721,7 +713,7 @@ static void r592_submit_req(struct memstick_host *host)
 	if (dev->req)
 		return;
 
-	wake_up_process(dev->io_thread);
+	queue_kthread_work(dev->io_worker, &dev->io_work);
 }
 
 static const struct pci_device_id r592_pci_id_tbl[] = {
@@ -779,9 +771,10 @@ static int r592_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	r592_check_dma(dev);
 
 	dev->io_started = false;
-	dev->io_thread = kthread_run(r592_process_thread, dev, "r592_io");
-	if (IS_ERR(dev->io_thread)) {
-		error = PTR_ERR(dev->io_thread);
+	init_kthread_work(&dev->io_work, r592_process_func);
+	dev->io_worker = create_kthread_worker(0, "r592_io");
+	if (IS_ERR(dev->io_worker)) {
+		error = PTR_ERR(dev->io_worker);
 		goto error5;
 	}
 
@@ -807,7 +800,7 @@ error6:
 		dma_free_coherent(&pdev->dev, PAGE_SIZE, dev->dummy_dma_page,
 			dev->dummy_dma_page_physical_address);
 
-	kthread_stop(dev->io_thread);
+	destroy_kthread_worker(dev->io_worker);
 error5:
 	iounmap(dev->mmio);
 error4:
@@ -827,7 +820,8 @@ static void r592_remove(struct pci_dev *pdev)
 
 	/* Stop the processing thread.
 	That ensures that we won't take any more requests */
-	kthread_stop(dev->io_thread);
+	cancel_kthread_work_sync(&dev->io_work);
+	destroy_kthread_worker(dev->io_worker);
 
 	r592_enable_device(dev, false);
 
diff --git a/drivers/memstick/host/r592.h b/drivers/memstick/host/r592.h
index aa8f0f22f4ce..1ac71380ac04 100644
--- a/drivers/memstick/host/r592.h
+++ b/drivers/memstick/host/r592.h
@@ -139,7 +139,8 @@ struct r592_device {
 	spinlock_t irq_lock;
 	struct timer_list detect_timer;
 
-	struct task_struct *io_thread;
+	struct kthread_worker *io_worker;
+	struct kthread_work io_work;
 	bool io_started;
 	bool parallel_mode;
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 21/22] thermal/intel_powerclamp: Remove duplicated code that starts the kthread
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Zhang Rui, Eduardo Valentin, Jacob Pan, linux-pm

This patch removes a code duplication. It does not modify
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: linux-pm@vger.kernel.org
---
 drivers/thermal/intel_powerclamp.c | 45 +++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 6c79588251d5..cb32c38f9828 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -505,10 +505,27 @@ static void poll_pkg_cstate(struct work_struct *dummy)
 		schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
+static void start_power_clamp_thread(unsigned long cpu)
+{
+	struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
+	struct task_struct *thread;
+
+	thread = kthread_create_on_node(clamp_thread,
+					(void *) cpu,
+					cpu_to_node(cpu),
+					"kidle_inject/%ld", cpu);
+	if (IS_ERR(thread))
+		return;
+
+	/* bind to cpu here */
+	kthread_bind(thread, cpu);
+	wake_up_process(thread);
+	*p = thread;
+}
+
 static int start_power_clamp(void)
 {
 	unsigned long cpu;
-	struct task_struct *thread;
 
 	/* check if pkg cstate counter is completely 0, abort in this case */
 	if (!has_pkg_state_counter()) {
@@ -530,20 +547,7 @@ static int start_power_clamp(void)
 
 	/* start one thread per online cpu */
 	for_each_online_cpu(cpu) {
-		struct task_struct **p =
-			per_cpu_ptr(powerclamp_thread, cpu);
-
-		thread = kthread_create_on_node(clamp_thread,
-						(void *) cpu,
-						cpu_to_node(cpu),
-						"kidle_inject/%ld", cpu);
-		/* bind to cpu here */
-		if (likely(!IS_ERR(thread))) {
-			kthread_bind(thread, cpu);
-			wake_up_process(thread);
-			*p = thread;
-		}
-
+		start_power_clamp_thread(cpu);
 	}
 	put_online_cpus();
 
@@ -575,7 +579,6 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 				unsigned long action, void *hcpu)
 {
 	unsigned long cpu = (unsigned long)hcpu;
-	struct task_struct *thread;
 	struct task_struct **percpu_thread =
 		per_cpu_ptr(powerclamp_thread, cpu);
 
@@ -584,15 +587,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 
 	switch (action) {
 	case CPU_ONLINE:
-		thread = kthread_create_on_node(clamp_thread,
-						(void *) cpu,
-						cpu_to_node(cpu),
-						"kidle_inject/%lu", cpu);
-		if (likely(!IS_ERR(thread))) {
-			kthread_bind(thread, cpu);
-			wake_up_process(thread);
-			*percpu_thread = thread;
-		}
+		start_power_clamp_thread(cpu);
 		/* prefer BSP as controlling CPU */
 		if (cpu == 0) {
 			control_cpu = 0;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 21/22] thermal/intel_powerclamp: Remove duplicated code that starts the kthread
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Zhang Rui, Eduardo Valentin, Jacob Pan, linux-pm

This patch removes a code duplication. It does not modify
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: linux-pm@vger.kernel.org
---
 drivers/thermal/intel_powerclamp.c | 45 +++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 6c79588251d5..cb32c38f9828 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -505,10 +505,27 @@ static void poll_pkg_cstate(struct work_struct *dummy)
 		schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
+static void start_power_clamp_thread(unsigned long cpu)
+{
+	struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
+	struct task_struct *thread;
+
+	thread = kthread_create_on_node(clamp_thread,
+					(void *) cpu,
+					cpu_to_node(cpu),
+					"kidle_inject/%ld", cpu);
+	if (IS_ERR(thread))
+		return;
+
+	/* bind to cpu here */
+	kthread_bind(thread, cpu);
+	wake_up_process(thread);
+	*p = thread;
+}
+
 static int start_power_clamp(void)
 {
 	unsigned long cpu;
-	struct task_struct *thread;
 
 	/* check if pkg cstate counter is completely 0, abort in this case */
 	if (!has_pkg_state_counter()) {
@@ -530,20 +547,7 @@ static int start_power_clamp(void)
 
 	/* start one thread per online cpu */
 	for_each_online_cpu(cpu) {
-		struct task_struct **p =
-			per_cpu_ptr(powerclamp_thread, cpu);
-
-		thread = kthread_create_on_node(clamp_thread,
-						(void *) cpu,
-						cpu_to_node(cpu),
-						"kidle_inject/%ld", cpu);
-		/* bind to cpu here */
-		if (likely(!IS_ERR(thread))) {
-			kthread_bind(thread, cpu);
-			wake_up_process(thread);
-			*p = thread;
-		}
-
+		start_power_clamp_thread(cpu);
 	}
 	put_online_cpus();
 
@@ -575,7 +579,6 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 				unsigned long action, void *hcpu)
 {
 	unsigned long cpu = (unsigned long)hcpu;
-	struct task_struct *thread;
 	struct task_struct **percpu_thread =
 		per_cpu_ptr(powerclamp_thread, cpu);
 
@@ -584,15 +587,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 
 	switch (action) {
 	case CPU_ONLINE:
-		thread = kthread_create_on_node(clamp_thread,
-						(void *) cpu,
-						cpu_to_node(cpu),
-						"kidle_inject/%lu", cpu);
-		if (likely(!IS_ERR(thread))) {
-			kthread_bind(thread, cpu);
-			wake_up_process(thread);
-			*percpu_thread = thread;
-		}
+		start_power_clamp_thread(cpu);
 		/* prefer BSP as controlling CPU */
 		if (cpu == 0) {
 			control_cpu = 0;
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 22/22] thermal/intel_powerclamp: Convert the kthread to kthread worker API
  2016-01-25 15:44 ` Petr Mladek
@ 2016-01-25 15:45   ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Zhang Rui, Eduardo Valentin, Jacob Pan, linux-pm

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the intel powerclamp kthreads into the kthread
worker because they need to have a good control over the assigned
CPUs.

IMHO, the most natural way is to split one cycle into two works.
First one does some balancing and let the CPU work normal
way for some time. The second work checks what the CPU has done
in the meantime and put it into C-state to reach the required
idle time ratio. The delay between the two works is achieved
by the delayed kthread work.

The two works have to share some data that used to be local
variables of the single kthread function. This is achieved
by the new per-CPU struct kthread_worker_data. It might look
as a complication. On the other hand, the long original kthread
function was not nice either.

The patch tries to avoid extra init and cleanup works. All the
actions might be done outside the thread. They are moved
to the functions that create or destroy the worker. Especially,
I checked that the timers are assigned to the right CPU.

The two works are queuing each other. It makes it a bit tricky to
break it when we want to stop the worker. We use the global and
per-worker "clamping" variables to make sure that the re-queuing
eventually stops. We also cancel the works to make it faster.
Note that the canceling is not reliable because the handling
of the two variables and queuing is not synchronized via a lock.
But it is not a big deal because it is just an optimization.
The job is stopped faster than before in most cases.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: linux-pm@vger.kernel.org
---
 drivers/thermal/intel_powerclamp.c | 287 ++++++++++++++++++++++---------------
 1 file changed, 168 insertions(+), 119 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index cb32c38f9828..58ea1862d412 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -86,11 +86,27 @@ static unsigned int control_cpu; /* The cpu assigned to collect stat and update
 				  */
 static bool clamping;
 
+static const struct sched_param sparam = {
+	.sched_priority = MAX_USER_RT_PRIO / 2,
+};
+struct powerclamp_worker_data {
+	struct kthread_worker *worker;
+	struct kthread_work balancing_work;
+	struct delayed_kthread_work idle_injection_work;
+	struct timer_list wakeup_timer;
+	unsigned int cpu;
+	unsigned int count;
+	unsigned int guard;
+	unsigned int window_size_now;
+	unsigned int target_ratio;
+	unsigned int duration_jiffies;
+	bool clamping;
+};
 
-static struct task_struct * __percpu *powerclamp_thread;
+static struct powerclamp_worker_data * __percpu worker_data;
 static struct thermal_cooling_device *cooling_dev;
 static unsigned long *cpu_clamping_mask;  /* bit map for tracking per cpu
-					   * clamping thread
+					   * clamping kthread worker
 					   */
 
 static unsigned int duration;
@@ -368,100 +384,102 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio,
 	return set_target_ratio + guard <= current_ratio;
 }
 
-static int clamp_thread(void *arg)
+static void clamp_balancing_func(struct kthread_work *work)
 {
-	int cpunr = (unsigned long)arg;
-	DEFINE_TIMER(wakeup_timer, noop_timer, 0, 0);
-	static const struct sched_param param = {
-		.sched_priority = MAX_USER_RT_PRIO/2,
-	};
-	unsigned int count = 0;
-	unsigned int target_ratio;
+	struct powerclamp_worker_data *w_data;
+	int sleeptime;
+	unsigned long target_jiffies;
+	unsigned int compensation;
+	int interval; /* jiffies to sleep for each attempt */
 
-	set_bit(cpunr, cpu_clamping_mask);
-	set_freezable();
-	init_timer_on_stack(&wakeup_timer);
-	sched_setscheduler(current, SCHED_FIFO, &param);
-
-	while (true == clamping && !kthread_should_stop() &&
-		cpu_online(cpunr)) {
-		int sleeptime;
-		unsigned long target_jiffies;
-		unsigned int guard;
-		unsigned int compensation = 0;
-		int interval; /* jiffies to sleep for each attempt */
-		unsigned int duration_jiffies = msecs_to_jiffies(duration);
-		unsigned int window_size_now;
-
-		try_to_freeze();
-		/*
-		 * make sure user selected ratio does not take effect until
-		 * the next round. adjust target_ratio if user has changed
-		 * target such that we can converge quickly.
-		 */
-		target_ratio = set_target_ratio;
-		guard = 1 + target_ratio/20;
-		window_size_now = window_size;
-		count++;
+	w_data = container_of(work, struct powerclamp_worker_data,
+			      balancing_work);
 
-		/*
-		 * systems may have different ability to enter package level
-		 * c-states, thus we need to compensate the injected idle ratio
-		 * to achieve the actual target reported by the HW.
-		 */
-		compensation = get_compensation(target_ratio);
-		interval = duration_jiffies*100/(target_ratio+compensation);
-
-		/* align idle time */
-		target_jiffies = roundup(jiffies, interval);
-		sleeptime = target_jiffies - jiffies;
-		if (sleeptime <= 0)
-			sleeptime = 1;
-		schedule_timeout_interruptible(sleeptime);
-		/*
-		 * only elected controlling cpu can collect stats and update
-		 * control parameters.
-		 */
-		if (cpunr == control_cpu && !(count%window_size_now)) {
-			should_skip =
-				powerclamp_adjust_controls(target_ratio,
-							guard, window_size_now);
-			smp_mb();
-		}
+	/*
+	 * make sure user selected ratio does not take effect until
+	 * the next round. adjust target_ratio if user has changed
+	 * target such that we can converge quickly.
+	 */
+	w_data->target_ratio = READ_ONCE(set_target_ratio);
+	w_data->guard = 1 + w_data->target_ratio / 20;
+	w_data->window_size_now = window_size;
+	w_data->duration_jiffies = msecs_to_jiffies(duration);
+	w_data->count++;
+
+	/*
+	 * systems may have different ability to enter package level
+	 * c-states, thus we need to compensate the injected idle ratio
+	 * to achieve the actual target reported by the HW.
+	 */
+	compensation = get_compensation(w_data->target_ratio);
+	interval = w_data->duration_jiffies * 100 /
+		(w_data->target_ratio + compensation);
+
+	/* align idle time */
+	target_jiffies = roundup(jiffies, interval);
+	sleeptime = target_jiffies - jiffies;
+	if (sleeptime <= 0)
+		sleeptime = 1;
+
+	if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+		queue_delayed_kthread_work(w_data->worker,
+					   &w_data->idle_injection_work,
+					   sleeptime);
+}
+
+static void clamp_idle_injection_func(struct kthread_work *work)
+{
+	struct powerclamp_worker_data *w_data;
+	unsigned long target_jiffies;
+
+	w_data = container_of(work, struct powerclamp_worker_data,
+			      idle_injection_work.work);
+
+	/*
+	 * only elected controlling cpu can collect stats and update
+	 * control parameters.
+	 */
+	if (w_data->cpu == control_cpu &&
+	    !(w_data->count % w_data->window_size_now)) {
+		should_skip =
+			powerclamp_adjust_controls(w_data->target_ratio,
+						   w_data->guard,
+						   w_data->window_size_now);
+		smp_mb();
+	}
 
-		if (should_skip)
-			continue;
+	if (should_skip)
+		goto balance;
+
+	target_jiffies = jiffies + w_data->duration_jiffies;
+	mod_timer(&w_data->wakeup_timer, target_jiffies);
+	if (unlikely(local_softirq_pending()))
+		goto balance;
+	/*
+	 * stop tick sched during idle time, interrupts are still
+	 * allowed. thus jiffies are updated properly.
+	 */
+	preempt_disable();
+	/* mwait until target jiffies is reached */
+	while (time_before(jiffies, target_jiffies)) {
+		unsigned long ecx = 1;
+		unsigned long eax = target_mwait;
 
-		target_jiffies = jiffies + duration_jiffies;
-		mod_timer(&wakeup_timer, target_jiffies);
-		if (unlikely(local_softirq_pending()))
-			continue;
 		/*
-		 * stop tick sched during idle time, interrupts are still
-		 * allowed. thus jiffies are updated properly.
+		 * REVISIT: may call enter_idle() to notify drivers who
+		 * can save power during cpu idle. same for exit_idle()
 		 */
-		preempt_disable();
-		/* mwait until target jiffies is reached */
-		while (time_before(jiffies, target_jiffies)) {
-			unsigned long ecx = 1;
-			unsigned long eax = target_mwait;
-
-			/*
-			 * REVISIT: may call enter_idle() to notify drivers who
-			 * can save power during cpu idle. same for exit_idle()
-			 */
-			local_touch_nmi();
-			stop_critical_timings();
-			mwait_idle_with_hints(eax, ecx);
-			start_critical_timings();
-			atomic_inc(&idle_wakeup_counter);
-		}
-		preempt_enable();
+		local_touch_nmi();
+		stop_critical_timings();
+		mwait_idle_with_hints(eax, ecx);
+		start_critical_timings();
+		atomic_inc(&idle_wakeup_counter);
 	}
-	del_timer_sync(&wakeup_timer);
-	clear_bit(cpunr, cpu_clamping_mask);
+	preempt_enable();
 
-	return 0;
+balance:
+	if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+		queue_kthread_work(w_data->worker, &w_data->balancing_work);
 }
 
 /*
@@ -505,22 +523,58 @@ static void poll_pkg_cstate(struct work_struct *dummy)
 		schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
-static void start_power_clamp_thread(unsigned long cpu)
+static void start_power_clamp_worker(unsigned long cpu)
 {
-	struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
-	struct task_struct *thread;
-
-	thread = kthread_create_on_node(clamp_thread,
-					(void *) cpu,
-					cpu_to_node(cpu),
-					"kidle_inject/%ld", cpu);
-	if (IS_ERR(thread))
+	struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+	struct kthread_worker *worker;
+
+	worker = create_kthread_worker_on_cpu(KTW_FREEZABLE, cpu,
+					      "kidle_inject/%ld");
+	if (IS_ERR(worker))
 		return;
 
-	/* bind to cpu here */
-	kthread_bind(thread, cpu);
-	wake_up_process(thread);
-	*p = thread;
+	w_data->worker = worker;
+	w_data->count = 0;
+	w_data->cpu = cpu;
+	w_data->clamping = true;
+	set_bit(cpu, cpu_clamping_mask);
+	setup_timer(&w_data->wakeup_timer, noop_timer, 0);
+	sched_setscheduler(worker->task, SCHED_FIFO, &sparam);
+	init_kthread_work(&w_data->balancing_work, clamp_balancing_func);
+	init_delayed_kthread_work(&w_data->idle_injection_work,
+				  clamp_idle_injection_func);
+	queue_kthread_work(w_data->worker, &w_data->balancing_work);
+}
+
+static void stop_power_clamp_worker(unsigned long cpu)
+{
+	struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+
+	if (!w_data->worker)
+		return;
+
+	w_data->clamping = false;
+	/*
+	 * Make sure that all works that get queued after this point see
+	 * the clamping disabled. The counter part is not needed because
+	 * there is an implicit memory barrier when the queued work
+	 * is proceed.
+	 */
+	smp_wmb();
+	cancel_kthread_work_sync(&w_data->balancing_work);
+	cancel_delayed_kthread_work_sync(&w_data->idle_injection_work);
+	/*
+	 * The balancing work still might be queued here because
+	 * the handling of the "clapming" variable, cancel, and queue
+	 * operations are not synchronized via a lock. But it is not
+	 * a big deal. The balancing work is fast and destroy kthread
+	 * will wait for it.
+	 */
+	del_timer_sync(&w_data->wakeup_timer);
+	clear_bit(w_data->cpu, cpu_clamping_mask);
+	destroy_kthread_worker(w_data->worker);
+
+	w_data->worker = NULL;
 }
 
 static int start_power_clamp(void)
@@ -545,9 +599,9 @@ static int start_power_clamp(void)
 	clamping = true;
 	schedule_delayed_work(&poll_pkg_cstate_work, 0);
 
-	/* start one thread per online cpu */
+	/* start one kthread worker per online cpu */
 	for_each_online_cpu(cpu) {
-		start_power_clamp_thread(cpu);
+		start_power_clamp_worker(cpu);
 	}
 	put_online_cpus();
 
@@ -557,20 +611,17 @@ static int start_power_clamp(void)
 static void end_power_clamp(void)
 {
 	int i;
-	struct task_struct *thread;
 
-	clamping = false;
 	/*
-	 * make clamping visible to other cpus and give per cpu clamping threads
-	 * sometime to exit, or gets killed later.
+	 * Block requeuing in all the kthread workers. They will drain and
+	 * stop faster.
 	 */
-	smp_mb();
-	msleep(20);
+	clamping = false;
 	if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
 		for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
-			pr_debug("clamping thread for cpu %d alive, kill\n", i);
-			thread = *per_cpu_ptr(powerclamp_thread, i);
-			kthread_stop(thread);
+			pr_debug("clamping worker for cpu %d alive, destroy\n",
+				 i);
+			stop_power_clamp_worker(i);
 		}
 	}
 }
@@ -579,15 +630,13 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 				unsigned long action, void *hcpu)
 {
 	unsigned long cpu = (unsigned long)hcpu;
-	struct task_struct **percpu_thread =
-		per_cpu_ptr(powerclamp_thread, cpu);
 
 	if (false == clamping)
 		goto exit_ok;
 
 	switch (action) {
 	case CPU_ONLINE:
-		start_power_clamp_thread(cpu);
+		start_power_clamp_worker(cpu);
 		/* prefer BSP as controlling CPU */
 		if (cpu == 0) {
 			control_cpu = 0;
@@ -598,7 +647,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 		if (test_bit(cpu, cpu_clamping_mask)) {
 			pr_err("cpu %lu dead but powerclamping thread is not\n",
 				cpu);
-			kthread_stop(*percpu_thread);
+			stop_power_clamp_worker(cpu);
 		}
 		if (cpu == control_cpu) {
 			control_cpu = smp_processor_id();
@@ -785,8 +834,8 @@ static int __init powerclamp_init(void)
 	window_size = 2;
 	register_hotcpu_notifier(&powerclamp_cpu_notifier);
 
-	powerclamp_thread = alloc_percpu(struct task_struct *);
-	if (!powerclamp_thread) {
+	worker_data = alloc_percpu(struct powerclamp_worker_data);
+	if (!worker_data) {
 		retval = -ENOMEM;
 		goto exit_unregister;
 	}
@@ -806,7 +855,7 @@ static int __init powerclamp_init(void)
 	return 0;
 
 exit_free_thread:
-	free_percpu(powerclamp_thread);
+	free_percpu(worker_data);
 exit_unregister:
 	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 exit_free:
@@ -819,7 +868,7 @@ static void __exit powerclamp_exit(void)
 {
 	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 	end_power_clamp();
-	free_percpu(powerclamp_thread);
+	free_percpu(worker_data);
 	thermal_cooling_device_unregister(cooling_dev);
 	kfree(cpu_clamping_mask);
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v4 22/22] thermal/intel_powerclamp: Convert the kthread to kthread worker API
@ 2016-01-25 15:45   ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-01-25 15:45 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar, Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
	Zhang Rui, Eduardo Valentin, Jacob Pan, linux-pm

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the intel powerclamp kthreads into the kthread
worker because they need to have a good control over the assigned
CPUs.

IMHO, the most natural way is to split one cycle into two works.
First one does some balancing and let the CPU work normal
way for some time. The second work checks what the CPU has done
in the meantime and put it into C-state to reach the required
idle time ratio. The delay between the two works is achieved
by the delayed kthread work.

The two works have to share some data that used to be local
variables of the single kthread function. This is achieved
by the new per-CPU struct kthread_worker_data. It might look
as a complication. On the other hand, the long original kthread
function was not nice either.

The patch tries to avoid extra init and cleanup works. All the
actions might be done outside the thread. They are moved
to the functions that create or destroy the worker. Especially,
I checked that the timers are assigned to the right CPU.

The two works are queuing each other. It makes it a bit tricky to
break it when we want to stop the worker. We use the global and
per-worker "clamping" variables to make sure that the re-queuing
eventually stops. We also cancel the works to make it faster.
Note that the canceling is not reliable because the handling
of the two variables and queuing is not synchronized via a lock.
But it is not a big deal because it is just an optimization.
The job is stopped faster than before in most cases.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: linux-pm@vger.kernel.org
---
 drivers/thermal/intel_powerclamp.c | 287 ++++++++++++++++++++++---------------
 1 file changed, 168 insertions(+), 119 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index cb32c38f9828..58ea1862d412 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -86,11 +86,27 @@ static unsigned int control_cpu; /* The cpu assigned to collect stat and update
 				  */
 static bool clamping;
 
+static const struct sched_param sparam = {
+	.sched_priority = MAX_USER_RT_PRIO / 2,
+};
+struct powerclamp_worker_data {
+	struct kthread_worker *worker;
+	struct kthread_work balancing_work;
+	struct delayed_kthread_work idle_injection_work;
+	struct timer_list wakeup_timer;
+	unsigned int cpu;
+	unsigned int count;
+	unsigned int guard;
+	unsigned int window_size_now;
+	unsigned int target_ratio;
+	unsigned int duration_jiffies;
+	bool clamping;
+};
 
-static struct task_struct * __percpu *powerclamp_thread;
+static struct powerclamp_worker_data * __percpu worker_data;
 static struct thermal_cooling_device *cooling_dev;
 static unsigned long *cpu_clamping_mask;  /* bit map for tracking per cpu
-					   * clamping thread
+					   * clamping kthread worker
 					   */
 
 static unsigned int duration;
@@ -368,100 +384,102 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio,
 	return set_target_ratio + guard <= current_ratio;
 }
 
-static int clamp_thread(void *arg)
+static void clamp_balancing_func(struct kthread_work *work)
 {
-	int cpunr = (unsigned long)arg;
-	DEFINE_TIMER(wakeup_timer, noop_timer, 0, 0);
-	static const struct sched_param param = {
-		.sched_priority = MAX_USER_RT_PRIO/2,
-	};
-	unsigned int count = 0;
-	unsigned int target_ratio;
+	struct powerclamp_worker_data *w_data;
+	int sleeptime;
+	unsigned long target_jiffies;
+	unsigned int compensation;
+	int interval; /* jiffies to sleep for each attempt */
 
-	set_bit(cpunr, cpu_clamping_mask);
-	set_freezable();
-	init_timer_on_stack(&wakeup_timer);
-	sched_setscheduler(current, SCHED_FIFO, &param);
-
-	while (true == clamping && !kthread_should_stop() &&
-		cpu_online(cpunr)) {
-		int sleeptime;
-		unsigned long target_jiffies;
-		unsigned int guard;
-		unsigned int compensation = 0;
-		int interval; /* jiffies to sleep for each attempt */
-		unsigned int duration_jiffies = msecs_to_jiffies(duration);
-		unsigned int window_size_now;
-
-		try_to_freeze();
-		/*
-		 * make sure user selected ratio does not take effect until
-		 * the next round. adjust target_ratio if user has changed
-		 * target such that we can converge quickly.
-		 */
-		target_ratio = set_target_ratio;
-		guard = 1 + target_ratio/20;
-		window_size_now = window_size;
-		count++;
+	w_data = container_of(work, struct powerclamp_worker_data,
+			      balancing_work);
 
-		/*
-		 * systems may have different ability to enter package level
-		 * c-states, thus we need to compensate the injected idle ratio
-		 * to achieve the actual target reported by the HW.
-		 */
-		compensation = get_compensation(target_ratio);
-		interval = duration_jiffies*100/(target_ratio+compensation);
-
-		/* align idle time */
-		target_jiffies = roundup(jiffies, interval);
-		sleeptime = target_jiffies - jiffies;
-		if (sleeptime <= 0)
-			sleeptime = 1;
-		schedule_timeout_interruptible(sleeptime);
-		/*
-		 * only elected controlling cpu can collect stats and update
-		 * control parameters.
-		 */
-		if (cpunr == control_cpu && !(count%window_size_now)) {
-			should_skip =
-				powerclamp_adjust_controls(target_ratio,
-							guard, window_size_now);
-			smp_mb();
-		}
+	/*
+	 * make sure user selected ratio does not take effect until
+	 * the next round. adjust target_ratio if user has changed
+	 * target such that we can converge quickly.
+	 */
+	w_data->target_ratio = READ_ONCE(set_target_ratio);
+	w_data->guard = 1 + w_data->target_ratio / 20;
+	w_data->window_size_now = window_size;
+	w_data->duration_jiffies = msecs_to_jiffies(duration);
+	w_data->count++;
+
+	/*
+	 * systems may have different ability to enter package level
+	 * c-states, thus we need to compensate the injected idle ratio
+	 * to achieve the actual target reported by the HW.
+	 */
+	compensation = get_compensation(w_data->target_ratio);
+	interval = w_data->duration_jiffies * 100 /
+		(w_data->target_ratio + compensation);
+
+	/* align idle time */
+	target_jiffies = roundup(jiffies, interval);
+	sleeptime = target_jiffies - jiffies;
+	if (sleeptime <= 0)
+		sleeptime = 1;
+
+	if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+		queue_delayed_kthread_work(w_data->worker,
+					   &w_data->idle_injection_work,
+					   sleeptime);
+}
+
+static void clamp_idle_injection_func(struct kthread_work *work)
+{
+	struct powerclamp_worker_data *w_data;
+	unsigned long target_jiffies;
+
+	w_data = container_of(work, struct powerclamp_worker_data,
+			      idle_injection_work.work);
+
+	/*
+	 * only elected controlling cpu can collect stats and update
+	 * control parameters.
+	 */
+	if (w_data->cpu == control_cpu &&
+	    !(w_data->count % w_data->window_size_now)) {
+		should_skip =
+			powerclamp_adjust_controls(w_data->target_ratio,
+						   w_data->guard,
+						   w_data->window_size_now);
+		smp_mb();
+	}
 
-		if (should_skip)
-			continue;
+	if (should_skip)
+		goto balance;
+
+	target_jiffies = jiffies + w_data->duration_jiffies;
+	mod_timer(&w_data->wakeup_timer, target_jiffies);
+	if (unlikely(local_softirq_pending()))
+		goto balance;
+	/*
+	 * stop tick sched during idle time, interrupts are still
+	 * allowed. thus jiffies are updated properly.
+	 */
+	preempt_disable();
+	/* mwait until target jiffies is reached */
+	while (time_before(jiffies, target_jiffies)) {
+		unsigned long ecx = 1;
+		unsigned long eax = target_mwait;
 
-		target_jiffies = jiffies + duration_jiffies;
-		mod_timer(&wakeup_timer, target_jiffies);
-		if (unlikely(local_softirq_pending()))
-			continue;
 		/*
-		 * stop tick sched during idle time, interrupts are still
-		 * allowed. thus jiffies are updated properly.
+		 * REVISIT: may call enter_idle() to notify drivers who
+		 * can save power during cpu idle. same for exit_idle()
 		 */
-		preempt_disable();
-		/* mwait until target jiffies is reached */
-		while (time_before(jiffies, target_jiffies)) {
-			unsigned long ecx = 1;
-			unsigned long eax = target_mwait;
-
-			/*
-			 * REVISIT: may call enter_idle() to notify drivers who
-			 * can save power during cpu idle. same for exit_idle()
-			 */
-			local_touch_nmi();
-			stop_critical_timings();
-			mwait_idle_with_hints(eax, ecx);
-			start_critical_timings();
-			atomic_inc(&idle_wakeup_counter);
-		}
-		preempt_enable();
+		local_touch_nmi();
+		stop_critical_timings();
+		mwait_idle_with_hints(eax, ecx);
+		start_critical_timings();
+		atomic_inc(&idle_wakeup_counter);
 	}
-	del_timer_sync(&wakeup_timer);
-	clear_bit(cpunr, cpu_clamping_mask);
+	preempt_enable();
 
-	return 0;
+balance:
+	if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+		queue_kthread_work(w_data->worker, &w_data->balancing_work);
 }
 
 /*
@@ -505,22 +523,58 @@ static void poll_pkg_cstate(struct work_struct *dummy)
 		schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
-static void start_power_clamp_thread(unsigned long cpu)
+static void start_power_clamp_worker(unsigned long cpu)
 {
-	struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
-	struct task_struct *thread;
-
-	thread = kthread_create_on_node(clamp_thread,
-					(void *) cpu,
-					cpu_to_node(cpu),
-					"kidle_inject/%ld", cpu);
-	if (IS_ERR(thread))
+	struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+	struct kthread_worker *worker;
+
+	worker = create_kthread_worker_on_cpu(KTW_FREEZABLE, cpu,
+					      "kidle_inject/%ld");
+	if (IS_ERR(worker))
 		return;
 
-	/* bind to cpu here */
-	kthread_bind(thread, cpu);
-	wake_up_process(thread);
-	*p = thread;
+	w_data->worker = worker;
+	w_data->count = 0;
+	w_data->cpu = cpu;
+	w_data->clamping = true;
+	set_bit(cpu, cpu_clamping_mask);
+	setup_timer(&w_data->wakeup_timer, noop_timer, 0);
+	sched_setscheduler(worker->task, SCHED_FIFO, &sparam);
+	init_kthread_work(&w_data->balancing_work, clamp_balancing_func);
+	init_delayed_kthread_work(&w_data->idle_injection_work,
+				  clamp_idle_injection_func);
+	queue_kthread_work(w_data->worker, &w_data->balancing_work);
+}
+
+static void stop_power_clamp_worker(unsigned long cpu)
+{
+	struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+
+	if (!w_data->worker)
+		return;
+
+	w_data->clamping = false;
+	/*
+	 * Make sure that all works that get queued after this point see
+	 * the clamping disabled. The counter part is not needed because
+	 * there is an implicit memory barrier when the queued work
+	 * is proceed.
+	 */
+	smp_wmb();
+	cancel_kthread_work_sync(&w_data->balancing_work);
+	cancel_delayed_kthread_work_sync(&w_data->idle_injection_work);
+	/*
+	 * The balancing work still might be queued here because
+	 * the handling of the "clapming" variable, cancel, and queue
+	 * operations are not synchronized via a lock. But it is not
+	 * a big deal. The balancing work is fast and destroy kthread
+	 * will wait for it.
+	 */
+	del_timer_sync(&w_data->wakeup_timer);
+	clear_bit(w_data->cpu, cpu_clamping_mask);
+	destroy_kthread_worker(w_data->worker);
+
+	w_data->worker = NULL;
 }
 
 static int start_power_clamp(void)
@@ -545,9 +599,9 @@ static int start_power_clamp(void)
 	clamping = true;
 	schedule_delayed_work(&poll_pkg_cstate_work, 0);
 
-	/* start one thread per online cpu */
+	/* start one kthread worker per online cpu */
 	for_each_online_cpu(cpu) {
-		start_power_clamp_thread(cpu);
+		start_power_clamp_worker(cpu);
 	}
 	put_online_cpus();
 
@@ -557,20 +611,17 @@ static int start_power_clamp(void)
 static void end_power_clamp(void)
 {
 	int i;
-	struct task_struct *thread;
 
-	clamping = false;
 	/*
-	 * make clamping visible to other cpus and give per cpu clamping threads
-	 * sometime to exit, or gets killed later.
+	 * Block requeuing in all the kthread workers. They will drain and
+	 * stop faster.
 	 */
-	smp_mb();
-	msleep(20);
+	clamping = false;
 	if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
 		for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
-			pr_debug("clamping thread for cpu %d alive, kill\n", i);
-			thread = *per_cpu_ptr(powerclamp_thread, i);
-			kthread_stop(thread);
+			pr_debug("clamping worker for cpu %d alive, destroy\n",
+				 i);
+			stop_power_clamp_worker(i);
 		}
 	}
 }
@@ -579,15 +630,13 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 				unsigned long action, void *hcpu)
 {
 	unsigned long cpu = (unsigned long)hcpu;
-	struct task_struct **percpu_thread =
-		per_cpu_ptr(powerclamp_thread, cpu);
 
 	if (false == clamping)
 		goto exit_ok;
 
 	switch (action) {
 	case CPU_ONLINE:
-		start_power_clamp_thread(cpu);
+		start_power_clamp_worker(cpu);
 		/* prefer BSP as controlling CPU */
 		if (cpu == 0) {
 			control_cpu = 0;
@@ -598,7 +647,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 		if (test_bit(cpu, cpu_clamping_mask)) {
 			pr_err("cpu %lu dead but powerclamping thread is not\n",
 				cpu);
-			kthread_stop(*percpu_thread);
+			stop_power_clamp_worker(cpu);
 		}
 		if (cpu == control_cpu) {
 			control_cpu = smp_processor_id();
@@ -785,8 +834,8 @@ static int __init powerclamp_init(void)
 	window_size = 2;
 	register_hotcpu_notifier(&powerclamp_cpu_notifier);
 
-	powerclamp_thread = alloc_percpu(struct task_struct *);
-	if (!powerclamp_thread) {
+	worker_data = alloc_percpu(struct powerclamp_worker_data);
+	if (!worker_data) {
 		retval = -ENOMEM;
 		goto exit_unregister;
 	}
@@ -806,7 +855,7 @@ static int __init powerclamp_init(void)
 	return 0;
 
 exit_free_thread:
-	free_percpu(powerclamp_thread);
+	free_percpu(worker_data);
 exit_unregister:
 	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 exit_free:
@@ -819,7 +868,7 @@ static void __exit powerclamp_exit(void)
 {
 	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 	end_power_clamp();
-	free_percpu(powerclamp_thread);
+	free_percpu(worker_data);
 	thermal_cooling_device_unregister(cooling_dev);
 	kfree(cpu_clamping_mask);
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 21/22] thermal/intel_powerclamp: Remove duplicated code that starts the kthread
  2016-01-25 15:45   ` Petr Mladek
@ 2016-01-25 16:23     ` Jacob Pan
  -1 siblings, 0 replies; 82+ messages in thread
From: Jacob Pan @ 2016-01-25 16:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Linus Torvalds, Jiri Kosina, Borislav Petkov,
	Michal Hocko, linux-mm, Vlastimil Babka, linux-api, linux-kernel,
	Zhang Rui, Eduardo Valentin, linux-pm, jacob.jun.pan

On Mon, 25 Jan 2016 16:45:10 +0100
Petr Mladek <pmladek@suse.com> wrote:

> This patch removes a code duplication. It does not modify
> the functionality.
> 
Acked-by:Jacob Pan <jacob.jun.pan@linux.intel.com>

> Signed-off-by: Petr Mladek <pmladek@suse.com>
> CC: Zhang Rui <rui.zhang@intel.com>
> CC: Eduardo Valentin <edubezval@gmail.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> CC: linux-pm@vger.kernel.org
> ---
>  drivers/thermal/intel_powerclamp.c | 45
> +++++++++++++++++--------------------- 1 file changed, 20
> insertions(+), 25 deletions(-)

[Jacob Pan]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 21/22] thermal/intel_powerclamp: Remove duplicated code that starts the kthread
@ 2016-01-25 16:23     ` Jacob Pan
  0 siblings, 0 replies; 82+ messages in thread
From: Jacob Pan @ 2016-01-25 16:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Linus Torvalds, Jiri Kosina, Borislav Petkov,
	Michal Hocko, linux-mm, Vlastimil Babka, linux-api, linux-kernel,
	Zhang Rui, Eduardo Valentin, linux-pm, jacob.jun.pan

On Mon, 25 Jan 2016 16:45:10 +0100
Petr Mladek <pmladek@suse.com> wrote:

> This patch removes a code duplication. It does not modify
> the functionality.
> 
Acked-by:Jacob Pan <jacob.jun.pan@linux.intel.com>

> Signed-off-by: Petr Mladek <pmladek@suse.com>
> CC: Zhang Rui <rui.zhang@intel.com>
> CC: Eduardo Valentin <edubezval@gmail.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> CC: linux-pm@vger.kernel.org
> ---
>  drivers/thermal/intel_powerclamp.c | 45
> +++++++++++++++++--------------------- 1 file changed, 20
> insertions(+), 25 deletions(-)

[Jacob Pan]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 22/22] thermal/intel_powerclamp: Convert the kthread to kthread worker API
  2016-01-25 15:45   ` Petr Mladek
@ 2016-01-25 16:28     ` Jacob Pan
  -1 siblings, 0 replies; 82+ messages in thread
From: Jacob Pan @ 2016-01-25 16:28 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Linus Torvalds, Jiri Kosina, Borislav Petkov,
	Michal Hocko, linux-mm, Vlastimil Babka, linux-api, linux-kernel,
	Zhang Rui, Eduardo Valentin, linux-pm, jacob.jun.pan

On Mon, 25 Jan 2016 16:45:11 +0100
Petr Mladek <pmladek@suse.com> wrote:

> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
> 
> The plan is to convert kthreads into kthread_worker or workqueues
> API. It allows to split the functionality into separate operations.
> It helps to make a better structure. Also it defines a clean state
> where no locks are taken, IRQs blocked, the kthread might sleep
> or even be safely migrated.
> 
> The kthread worker API is useful when we want to have a dedicated
> single thread for the work. It helps to make sure that it is
> available when needed. Also it allows a better control, e.g.
> define a scheduling priority.
> 
> This patch converts the intel powerclamp kthreads into the kthread
> worker because they need to have a good control over the assigned
> CPUs.
> 
> IMHO, the most natural way is to split one cycle into two works.
> First one does some balancing and let the CPU work normal
> way for some time. The second work checks what the CPU has done
> in the meantime and put it into C-state to reach the required
> idle time ratio. The delay between the two works is achieved
> by the delayed kthread work.
> 
> The two works have to share some data that used to be local
> variables of the single kthread function. This is achieved
> by the new per-CPU struct kthread_worker_data. It might look
> as a complication. On the other hand, the long original kthread
> function was not nice either.
> 
> The patch tries to avoid extra init and cleanup works. All the
> actions might be done outside the thread. They are moved
> to the functions that create or destroy the worker. Especially,
> I checked that the timers are assigned to the right CPU.
> 
> The two works are queuing each other. It makes it a bit tricky to
> break it when we want to stop the worker. We use the global and
> per-worker "clamping" variables to make sure that the re-queuing
> eventually stops. We also cancel the works to make it faster.
> Note that the canceling is not reliable because the handling
> of the two variables and queuing is not synchronized via a lock.
> But it is not a big deal because it is just an optimization.
> The job is stopped faster than before in most cases.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> CC: Zhang Rui <rui.zhang@intel.com>
> CC: Eduardo Valentin <edubezval@gmail.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> CC: linux-pm@vger.kernel.org
Tested v3 for functionality and performance, v4 seems unchanged for
this patch according to changelog.

Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 22/22] thermal/intel_powerclamp: Convert the kthread to kthread worker API
@ 2016-01-25 16:28     ` Jacob Pan
  0 siblings, 0 replies; 82+ messages in thread
From: Jacob Pan @ 2016-01-25 16:28 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Linus Torvalds, Jiri Kosina, Borislav Petkov,
	Michal Hocko, linux-mm, Vlastimil Babka, linux-api, linux-kernel,
	Zhang Rui, Eduardo Valentin, linux-pm, jacob.jun.pan

On Mon, 25 Jan 2016 16:45:11 +0100
Petr Mladek <pmladek@suse.com> wrote:

> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
> 
> The plan is to convert kthreads into kthread_worker or workqueues
> API. It allows to split the functionality into separate operations.
> It helps to make a better structure. Also it defines a clean state
> where no locks are taken, IRQs blocked, the kthread might sleep
> or even be safely migrated.
> 
> The kthread worker API is useful when we want to have a dedicated
> single thread for the work. It helps to make sure that it is
> available when needed. Also it allows a better control, e.g.
> define a scheduling priority.
> 
> This patch converts the intel powerclamp kthreads into the kthread
> worker because they need to have a good control over the assigned
> CPUs.
> 
> IMHO, the most natural way is to split one cycle into two works.
> First one does some balancing and let the CPU work normal
> way for some time. The second work checks what the CPU has done
> in the meantime and put it into C-state to reach the required
> idle time ratio. The delay between the two works is achieved
> by the delayed kthread work.
> 
> The two works have to share some data that used to be local
> variables of the single kthread function. This is achieved
> by the new per-CPU struct kthread_worker_data. It might look
> as a complication. On the other hand, the long original kthread
> function was not nice either.
> 
> The patch tries to avoid extra init and cleanup works. All the
> actions might be done outside the thread. They are moved
> to the functions that create or destroy the worker. Especially,
> I checked that the timers are assigned to the right CPU.
> 
> The two works are queuing each other. It makes it a bit tricky to
> break it when we want to stop the worker. We use the global and
> per-worker "clamping" variables to make sure that the re-queuing
> eventually stops. We also cancel the works to make it faster.
> Note that the canceling is not reliable because the handling
> of the two variables and queuing is not synchronized via a lock.
> But it is not a big deal because it is just an optimization.
> The job is stopped faster than before in most cases.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> CC: Zhang Rui <rui.zhang@intel.com>
> CC: Eduardo Valentin <edubezval@gmail.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> CC: linux-pm@vger.kernel.org
Tested v3 for functionality and performance, v4 seems unchanged for
this patch according to changelog.

Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 01/22] timer: Allow to check when the timer callback has not finished yet
  2016-01-25 15:44   ` Petr Mladek
@ 2016-01-25 18:44     ` Tejun Heo
  -1 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 18:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Mon, Jan 25, 2016 at 04:44:50PM +0100, Petr Mladek wrote:
> timer_pending() checks whether the list of callbacks is empty.
> Each callback is removed from the list before it is called,
> see call_timer_fn() in __run_timers().
> 
> Sometimes we need to make sure that the callback has finished.
> For example, if we want to free some resources that are accessed
> by the callback.
> 
> For this purpose, this patch adds timer_active(). It checks both
> the list of callbacks and the running_timer. It takes the base_lock
> to see a consistent state.
> 
> I plan to use it to implement delayed works in kthread worker.
> But I guess that it will have wider use. In fact, I wonder if
> timer_pending() is misused in some situations.

I don't think this is still necessary.  More on this later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 01/22] timer: Allow to check when the timer callback has not finished yet
@ 2016-01-25 18:44     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 18:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Mon, Jan 25, 2016 at 04:44:50PM +0100, Petr Mladek wrote:
> timer_pending() checks whether the list of callbacks is empty.
> Each callback is removed from the list before it is called,
> see call_timer_fn() in __run_timers().
> 
> Sometimes we need to make sure that the callback has finished.
> For example, if we want to free some resources that are accessed
> by the callback.
> 
> For this purpose, this patch adds timer_active(). It checks both
> the list of callbacks and the running_timer. It takes the base_lock
> to see a consistent state.
> 
> I plan to use it to implement delayed works in kthread worker.
> But I guess that it will have wider use. In fact, I wonder if
> timer_pending() is misused in some situations.

I don't think this is still necessary.  More on this later.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
  2016-01-25 15:44   ` Petr Mladek
@ 2016-01-25 18:53     ` Tejun Heo
  -1 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 18:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:44:53PM +0100, Petr Mladek wrote:
> +struct kthread_worker *
> +create_kthread_worker_on_cpu(int cpu, const char namefmt[])
> +{
> +	if (cpu < 0 || cpu > num_possible_cpus())
> +		return ERR_PTR(-EINVAL);

Comparing cpu ID to num_possible_cpus() doesn't make any sense.  It
should either be testing against cpu_possible_mask or testing against
nr_cpu_ids.  Does this test need to be in this function at all?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-01-25 18:53     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 18:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:44:53PM +0100, Petr Mladek wrote:
> +struct kthread_worker *
> +create_kthread_worker_on_cpu(int cpu, const char namefmt[])
> +{
> +	if (cpu < 0 || cpu > num_possible_cpus())
> +		return ERR_PTR(-EINVAL);

Comparing cpu ID to num_possible_cpus() doesn't make any sense.  It
should either be testing against cpu_possible_mask or testing against
nr_cpu_ids.  Does this test need to be in this function at all?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
@ 2016-01-25 18:57     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 18:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:44:56PM +0100, Petr Mladek wrote:
> +static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
> +					       struct kthread_work *work)
> +{
> +	lockdep_assert_held(&worker->lock);
> +	WARN_ON_ONCE(!irqs_disabled());

Isn't worker->lock gonna be a irq-safe lock?  If so, why would this
need to be tested separately?

> +	WARN_ON_ONCE(!list_empty(&work->node));
> +	/* Do not use a work with more workers, see queue_kthread_work() */
> +	WARN_ON_ONCE(work->worker && work->worker != worker);
> +}

Is this sanity check function gonna be used from multiple places?

>  /* insert @work before @pos in @worker */
>  static void insert_kthread_work(struct kthread_worker *worker,
> -			       struct kthread_work *work,
> -			       struct list_head *pos)
> +				struct kthread_work *work,
> +				struct list_head *pos)
>  {
> -	lockdep_assert_held(&worker->lock);
> +	insert_kthread_work_sanity_check(worker, work);
>  
>  	list_add_tail(&work->node, pos);
>  	work->worker = worker;
> @@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
>   * Queue @work to work processor @task for async execution.  @task
>   * must have been created with kthread_worker_create().  Returns %true
>   * if @work was successfully queued, %false if it was already pending.
> + *
> + * Never queue a work into a worker when it is being processed by another
> + * one. Otherwise, some operations, e.g. cancel or flush, will not work
> + * correctly or the work might run in parallel. This is not enforced
> + * because it would make the code too complex. There are only warnings
> + * printed when such a situation is detected.

I'm not sure the above paragraph adds much.  It isn't that accurate to
begin with as what's being disallowed is larger scope than the above.
Isn't the paragraph below enough?

> + * Reinitialize the work if it needs to be used by another worker.
> + * For example, when the worker was stopped and started again.
>   */
>  bool queue_kthread_work(struct kthread_worker *worker,
>  			struct kthread_work *work)
> -- 
> 1.8.5.6
> 

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
@ 2016-01-25 18:57     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 18:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, Jan 25, 2016 at 04:44:56PM +0100, Petr Mladek wrote:
> +static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
> +					       struct kthread_work *work)
> +{
> +	lockdep_assert_held(&worker->lock);
> +	WARN_ON_ONCE(!irqs_disabled());

Isn't worker->lock gonna be a irq-safe lock?  If so, why would this
need to be tested separately?

> +	WARN_ON_ONCE(!list_empty(&work->node));
> +	/* Do not use a work with more workers, see queue_kthread_work() */
> +	WARN_ON_ONCE(work->worker && work->worker != worker);
> +}

Is this sanity check function gonna be used from multiple places?

>  /* insert @work before @pos in @worker */
>  static void insert_kthread_work(struct kthread_worker *worker,
> -			       struct kthread_work *work,
> -			       struct list_head *pos)
> +				struct kthread_work *work,
> +				struct list_head *pos)
>  {
> -	lockdep_assert_held(&worker->lock);
> +	insert_kthread_work_sanity_check(worker, work);
>  
>  	list_add_tail(&work->node, pos);
>  	work->worker = worker;
> @@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
>   * Queue @work to work processor @task for async execution.  @task
>   * must have been created with kthread_worker_create().  Returns %true
>   * if @work was successfully queued, %false if it was already pending.
> + *
> + * Never queue a work into a worker when it is being processed by another
> + * one. Otherwise, some operations, e.g. cancel or flush, will not work
> + * correctly or the work might run in parallel. This is not enforced
> + * because it would make the code too complex. There are only warnings
> + * printed when such a situation is detected.

I'm not sure the above paragraph adds much.  It isn't that accurate to
begin with as what's being disallowed is larger scope than the above.
Isn't the paragraph below enough?

> + * Reinitialize the work if it needs to be used by another worker.
> + * For example, when the worker was stopped and started again.
>   */
>  bool queue_kthread_work(struct kthread_worker *worker,
>  			struct kthread_work *work)
> -- 
> 1.8.5.6
> 

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
@ 2016-01-25 18:57     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 18:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:44:56PM +0100, Petr Mladek wrote:
> +static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
> +					       struct kthread_work *work)
> +{
> +	lockdep_assert_held(&worker->lock);
> +	WARN_ON_ONCE(!irqs_disabled());

Isn't worker->lock gonna be a irq-safe lock?  If so, why would this
need to be tested separately?

> +	WARN_ON_ONCE(!list_empty(&work->node));
> +	/* Do not use a work with more workers, see queue_kthread_work() */
> +	WARN_ON_ONCE(work->worker && work->worker != worker);
> +}

Is this sanity check function gonna be used from multiple places?

>  /* insert @work before @pos in @worker */
>  static void insert_kthread_work(struct kthread_worker *worker,
> -			       struct kthread_work *work,
> -			       struct list_head *pos)
> +				struct kthread_work *work,
> +				struct list_head *pos)
>  {
> -	lockdep_assert_held(&worker->lock);
> +	insert_kthread_work_sanity_check(worker, work);
>  
>  	list_add_tail(&work->node, pos);
>  	work->worker = worker;
> @@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
>   * Queue @work to work processor @task for async execution.  @task
>   * must have been created with kthread_worker_create().  Returns %true
>   * if @work was successfully queued, %false if it was already pending.
> + *
> + * Never queue a work into a worker when it is being processed by another
> + * one. Otherwise, some operations, e.g. cancel or flush, will not work
> + * correctly or the work might run in parallel. This is not enforced
> + * because it would make the code too complex. There are only warnings
> + * printed when such a situation is detected.

I'm not sure the above paragraph adds much.  It isn't that accurate to
begin with as what's being disallowed is larger scope than the above.
Isn't the paragraph below enough?

> + * Reinitialize the work if it needs to be used by another worker.
> + * For example, when the worker was stopped and started again.
>   */
>  bool queue_kthread_work(struct kthread_worker *worker,
>  			struct kthread_work *work)
> -- 
> 1.8.5.6
> 

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 08/22] kthread: Initial support for delayed kthread work
@ 2016-01-25 19:04     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Mon, Jan 25, 2016 at 04:44:57PM +0100, Petr Mladek wrote:
> +/*
> + * Returns true when there is a pending operation for this work.
> + * In particular, it checks if the work is:
> + *	- queued
> + *	- a timer is running to queue this delayed work
> + *
> + * This function must be called with locked work.
> + */
> +static inline bool kthread_work_pending(const struct kthread_work *work)
> +{
> +	return !list_empty(&work->node) ||
> +	       (work->timer && timer_active(work->timer));
> +}

Why not just put the work item on a separate list so that
lits_empty(&work->node) is always enough?  IOW, put delayed work items
on timers on worker->delayed or sth.

> +/*
> + * Queue @work right into the worker queue.
> + */
> +static void __queue_kthread_work(struct kthread_worker *worker,
> +			  struct kthread_work *work)
> +{
> +	insert_kthread_work(worker, work, &worker->work_list);
> +}

Does this really need to be an inline function?  This sort of one
liner helpers tend to be obfuscating more than anything else.

> @@ -756,6 +779,121 @@ bool queue_kthread_work(struct kthread_worker *worker,
>  }
>  EXPORT_SYMBOL_GPL(queue_kthread_work);
>  
> +static bool try_lock_kthread_work(struct kthread_work *work)
> +{
> +	struct kthread_worker *worker;
> +	int ret = false;
> +
> +try_again:
> +	worker = work->worker;
> +
> +	if (!worker)
> +		goto out;

		return false;

> +
> +	spin_lock(&worker->lock);
> +	if (worker != work->worker) {
> +		spin_unlock(&worker->lock);
> +		goto try_again;
> +	}

	return true;

> +	ret = true;
> +
> +out:
> +	return ret;
> +}

Stop building unnecessary structures.  Keep it simple.

> +static inline void unlock_kthread_work(struct kthread_work *work)
> +{
> +	spin_unlock(&work->worker->lock);
> +}

Ditto.  Just open code it.  It doesn't add anything.

> +/**
> + * delayed_kthread_work_timer_fn - callback that queues the associated delayed
> + *	kthread work when the timer expires.
> + * @__data: pointer to the data associated with the timer
> + *
> + * The format of the function is defined by struct timer_list.
> + * It should have been called from irqsafe timer with irq already off.
> + */
> +void delayed_kthread_work_timer_fn(unsigned long __data)
> +{
> +	struct delayed_kthread_work *dwork =
> +		(struct delayed_kthread_work *)__data;
> +	struct kthread_work *work = &dwork->work;
> +
> +	if (!try_lock_kthread_work(work))

Can you please explain why try_lock is necessary here?  That's the
most important and non-obvious thing going on here and there's no
explanation of that at all.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 08/22] kthread: Initial support for delayed kthread work
@ 2016-01-25 19:04     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Jan 25, 2016 at 04:44:57PM +0100, Petr Mladek wrote:
> +/*
> + * Returns true when there is a pending operation for this work.
> + * In particular, it checks if the work is:
> + *	- queued
> + *	- a timer is running to queue this delayed work
> + *
> + * This function must be called with locked work.
> + */
> +static inline bool kthread_work_pending(const struct kthread_work *work)
> +{
> +	return !list_empty(&work->node) ||
> +	       (work->timer && timer_active(work->timer));
> +}

Why not just put the work item on a separate list so that
lits_empty(&work->node) is always enough?  IOW, put delayed work items
on timers on worker->delayed or sth.

> +/*
> + * Queue @work right into the worker queue.
> + */
> +static void __queue_kthread_work(struct kthread_worker *worker,
> +			  struct kthread_work *work)
> +{
> +	insert_kthread_work(worker, work, &worker->work_list);
> +}

Does this really need to be an inline function?  This sort of one
liner helpers tend to be obfuscating more than anything else.

> @@ -756,6 +779,121 @@ bool queue_kthread_work(struct kthread_worker *worker,
>  }
>  EXPORT_SYMBOL_GPL(queue_kthread_work);
>  
> +static bool try_lock_kthread_work(struct kthread_work *work)
> +{
> +	struct kthread_worker *worker;
> +	int ret = false;
> +
> +try_again:
> +	worker = work->worker;
> +
> +	if (!worker)
> +		goto out;

		return false;

> +
> +	spin_lock(&worker->lock);
> +	if (worker != work->worker) {
> +		spin_unlock(&worker->lock);
> +		goto try_again;
> +	}

	return true;

> +	ret = true;
> +
> +out:
> +	return ret;
> +}

Stop building unnecessary structures.  Keep it simple.

> +static inline void unlock_kthread_work(struct kthread_work *work)
> +{
> +	spin_unlock(&work->worker->lock);
> +}

Ditto.  Just open code it.  It doesn't add anything.

> +/**
> + * delayed_kthread_work_timer_fn - callback that queues the associated delayed
> + *	kthread work when the timer expires.
> + * @__data: pointer to the data associated with the timer
> + *
> + * The format of the function is defined by struct timer_list.
> + * It should have been called from irqsafe timer with irq already off.
> + */
> +void delayed_kthread_work_timer_fn(unsigned long __data)
> +{
> +	struct delayed_kthread_work *dwork =
> +		(struct delayed_kthread_work *)__data;
> +	struct kthread_work *work = &dwork->work;
> +
> +	if (!try_lock_kthread_work(work))

Can you please explain why try_lock is necessary here?  That's the
most important and non-obvious thing going on here and there's no
explanation of that at all.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 08/22] kthread: Initial support for delayed kthread work
@ 2016-01-25 19:04     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Mon, Jan 25, 2016 at 04:44:57PM +0100, Petr Mladek wrote:
> +/*
> + * Returns true when there is a pending operation for this work.
> + * In particular, it checks if the work is:
> + *	- queued
> + *	- a timer is running to queue this delayed work
> + *
> + * This function must be called with locked work.
> + */
> +static inline bool kthread_work_pending(const struct kthread_work *work)
> +{
> +	return !list_empty(&work->node) ||
> +	       (work->timer && timer_active(work->timer));
> +}

Why not just put the work item on a separate list so that
lits_empty(&work->node) is always enough?  IOW, put delayed work items
on timers on worker->delayed or sth.

> +/*
> + * Queue @work right into the worker queue.
> + */
> +static void __queue_kthread_work(struct kthread_worker *worker,
> +			  struct kthread_work *work)
> +{
> +	insert_kthread_work(worker, work, &worker->work_list);
> +}

Does this really need to be an inline function?  This sort of one
liner helpers tend to be obfuscating more than anything else.

> @@ -756,6 +779,121 @@ bool queue_kthread_work(struct kthread_worker *worker,
>  }
>  EXPORT_SYMBOL_GPL(queue_kthread_work);
>  
> +static bool try_lock_kthread_work(struct kthread_work *work)
> +{
> +	struct kthread_worker *worker;
> +	int ret = false;
> +
> +try_again:
> +	worker = work->worker;
> +
> +	if (!worker)
> +		goto out;

		return false;

> +
> +	spin_lock(&worker->lock);
> +	if (worker != work->worker) {
> +		spin_unlock(&worker->lock);
> +		goto try_again;
> +	}

	return true;

> +	ret = true;
> +
> +out:
> +	return ret;
> +}

Stop building unnecessary structures.  Keep it simple.

> +static inline void unlock_kthread_work(struct kthread_work *work)
> +{
> +	spin_unlock(&work->worker->lock);
> +}

Ditto.  Just open code it.  It doesn't add anything.

> +/**
> + * delayed_kthread_work_timer_fn - callback that queues the associated delayed
> + *	kthread work when the timer expires.
> + * @__data: pointer to the data associated with the timer
> + *
> + * The format of the function is defined by struct timer_list.
> + * It should have been called from irqsafe timer with irq already off.
> + */
> +void delayed_kthread_work_timer_fn(unsigned long __data)
> +{
> +	struct delayed_kthread_work *dwork =
> +		(struct delayed_kthread_work *)__data;
> +	struct kthread_work *work = &dwork->work;
> +
> +	if (!try_lock_kthread_work(work))

Can you please explain why try_lock is necessary here?  That's the
most important and non-obvious thing going on here and there's no
explanation of that at all.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 09/22] kthread: Allow to cancel kthread work
  2016-01-25 15:44   ` Petr Mladek
@ 2016-01-25 19:17     ` Tejun Heo
  -1 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Mon, Jan 25, 2016 at 04:44:58PM +0100, Petr Mladek wrote:
> @@ -574,6 +575,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
>  static inline bool kthread_work_pending(const struct kthread_work *work)
>  {
>  	return !list_empty(&work->node) ||
> +	       work->canceling ||
>  	       (work->timer && timer_active(work->timer));
>  }

So, the reason ->canceling test is necessary is to ensure that
self-requeueing work items can be canceled reliably.  It's not to
block "further queueing" in general.  It's probably worthwhile to
clear that up in the description and comment.

> +/*
> + * Get the worker lock if any worker is associated with the work.
> + * Depending on @check_canceling, it might need to give up the busy
> + * wait when work->canceling gets set.
> + */

While mentioning @check_canceling, the above doesn't actually explain
what it does.

> +static bool try_lock_kthread_work(struct kthread_work *work,
> +				  bool check_canceling)
>  {
>  	struct kthread_worker *worker;
>  	int ret = false;
> @@ -790,7 +798,24 @@ try_again:
>  	if (!worker)
>  		goto out;
>  
> -	spin_lock(&worker->lock);
> +	if (check_canceling) {
> +		if (!spin_trylock(&worker->lock)) {
> +			/*
> +			 * Busy wait with spin_is_locked() to avoid
> +			 * cache bouncing. Break when canceling
> +			 * is set to avoid a deadlock.
> +			 */
> +			do {
> +				if (READ_ONCE(work->canceling))
> +					goto out;

Why READ_ONCE?

> +				cpu_relax();
> +			} while (spin_is_locked(&worker->lock));
> +			goto try_again;
> +		}
> +	} else {
> +		spin_lock(&worker->lock);
> +	}
> +
>  	if (worker != work->worker) {
>  		spin_unlock(&worker->lock);
>  		goto try_again;
> @@ -820,10 +845,13 @@ void delayed_kthread_work_timer_fn(unsigned long __data)
>  		(struct delayed_kthread_work *)__data;
>  	struct kthread_work *work = &dwork->work;
>  
> -	if (!try_lock_kthread_work(work))
> +	/* Give up when the work is being canceled. */
> +	if (!try_lock_kthread_work(work, true))

Again, this is the trickest part of the whole thing.  Please add a
comment explaining why this is necessary.

>  		return;
>  
> -	__queue_kthread_work(work->worker, work);
> +	if (!work->canceling)
> +		__queue_kthread_work(work->worker, work);
> +
...
> +static int
> +try_to_cancel_kthread_work(struct kthread_work *work,
> +				   spinlock_t *lock,
> +				   unsigned long *flags)

bool?

> +{
> +	int ret = 0;
> +
> +	/* Try to cancel the timer if pending. */
> +	if (work->timer && del_timer_sync(work->timer)) {
> +		ret = 1;
> +		goto out;
> +	}
> +
> +	/* Try to remove queued work before it is being executed. */
> +	if (!list_empty(&work->node)) {
> +		list_del_init(&work->node);
> +		ret = 1;
> +	}
> +
> +out:
> +	return ret;

Again, what's up with unnecessary goto exits?

> +static bool __cancel_kthread_work_sync(struct kthread_work *work)
> +{
> +	struct kthread_worker *worker;
> +	unsigned long flags;
> +	int ret;
> +
> +	local_irq_save(flags);
> +	if (!try_lock_kthread_work(work, false)) {
> +		local_irq_restore(flags);

Can't try_lock_kthread_work() take &flags?

> +		ret = 0;
> +		goto out;
> +	}
> +	worker = work->worker;
> +
> +	/*
> +	 * Block further queueing. It must be set before trying to cancel
> +	 * the kthread work. It avoids a possible deadlock between
> +	 * del_timer_sync() and the timer callback.
> +	 */

So, "blocking further queueing" and "a possible deadlock between
del_timer_sync() and the timer callback" don't have anything to do
with each other, do they?  Those are two separate things.  You need
the former to guarantee cancelation of self-requeueing work items and
the latter for deadlock avoidance, no?

> +	work->canceling++;
> +	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
> +
> +	if (worker->current_work != work)
> +		goto out_fast;

If there are two racing cancellers, wouldn't this allow the losing one
to return while the work item is still running?

> +	spin_unlock_irqrestore(&worker->lock, flags);
> +	flush_kthread_work(work);
> +	/*
> +	 * Nobody is allowed to switch the worker or queue the work
> +	 * when .canceling is set.
> +	 */
> +	spin_lock_irqsave(&worker->lock, flags);
> +
> +out_fast:
> +	work->canceling--;
> +	spin_unlock_irqrestore(&worker->lock, flags);
> +out:
> +	return ret;
> +}

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 09/22] kthread: Allow to cancel kthread work
@ 2016-01-25 19:17     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Mon, Jan 25, 2016 at 04:44:58PM +0100, Petr Mladek wrote:
> @@ -574,6 +575,7 @@ EXPORT_SYMBOL_GPL(__init_kthread_worker);
>  static inline bool kthread_work_pending(const struct kthread_work *work)
>  {
>  	return !list_empty(&work->node) ||
> +	       work->canceling ||
>  	       (work->timer && timer_active(work->timer));
>  }

So, the reason ->canceling test is necessary is to ensure that
self-requeueing work items can be canceled reliably.  It's not to
block "further queueing" in general.  It's probably worthwhile to
clear that up in the description and comment.

> +/*
> + * Get the worker lock if any worker is associated with the work.
> + * Depending on @check_canceling, it might need to give up the busy
> + * wait when work->canceling gets set.
> + */

While mentioning @check_canceling, the above doesn't actually explain
what it does.

> +static bool try_lock_kthread_work(struct kthread_work *work,
> +				  bool check_canceling)
>  {
>  	struct kthread_worker *worker;
>  	int ret = false;
> @@ -790,7 +798,24 @@ try_again:
>  	if (!worker)
>  		goto out;
>  
> -	spin_lock(&worker->lock);
> +	if (check_canceling) {
> +		if (!spin_trylock(&worker->lock)) {
> +			/*
> +			 * Busy wait with spin_is_locked() to avoid
> +			 * cache bouncing. Break when canceling
> +			 * is set to avoid a deadlock.
> +			 */
> +			do {
> +				if (READ_ONCE(work->canceling))
> +					goto out;

Why READ_ONCE?

> +				cpu_relax();
> +			} while (spin_is_locked(&worker->lock));
> +			goto try_again;
> +		}
> +	} else {
> +		spin_lock(&worker->lock);
> +	}
> +
>  	if (worker != work->worker) {
>  		spin_unlock(&worker->lock);
>  		goto try_again;
> @@ -820,10 +845,13 @@ void delayed_kthread_work_timer_fn(unsigned long __data)
>  		(struct delayed_kthread_work *)__data;
>  	struct kthread_work *work = &dwork->work;
>  
> -	if (!try_lock_kthread_work(work))
> +	/* Give up when the work is being canceled. */
> +	if (!try_lock_kthread_work(work, true))

Again, this is the trickest part of the whole thing.  Please add a
comment explaining why this is necessary.

>  		return;
>  
> -	__queue_kthread_work(work->worker, work);
> +	if (!work->canceling)
> +		__queue_kthread_work(work->worker, work);
> +
...
> +static int
> +try_to_cancel_kthread_work(struct kthread_work *work,
> +				   spinlock_t *lock,
> +				   unsigned long *flags)

bool?

> +{
> +	int ret = 0;
> +
> +	/* Try to cancel the timer if pending. */
> +	if (work->timer && del_timer_sync(work->timer)) {
> +		ret = 1;
> +		goto out;
> +	}
> +
> +	/* Try to remove queued work before it is being executed. */
> +	if (!list_empty(&work->node)) {
> +		list_del_init(&work->node);
> +		ret = 1;
> +	}
> +
> +out:
> +	return ret;

Again, what's up with unnecessary goto exits?

> +static bool __cancel_kthread_work_sync(struct kthread_work *work)
> +{
> +	struct kthread_worker *worker;
> +	unsigned long flags;
> +	int ret;
> +
> +	local_irq_save(flags);
> +	if (!try_lock_kthread_work(work, false)) {
> +		local_irq_restore(flags);

Can't try_lock_kthread_work() take &flags?

> +		ret = 0;
> +		goto out;
> +	}
> +	worker = work->worker;
> +
> +	/*
> +	 * Block further queueing. It must be set before trying to cancel
> +	 * the kthread work. It avoids a possible deadlock between
> +	 * del_timer_sync() and the timer callback.
> +	 */

So, "blocking further queueing" and "a possible deadlock between
del_timer_sync() and the timer callback" don't have anything to do
with each other, do they?  Those are two separate things.  You need
the former to guarantee cancelation of self-requeueing work items and
the latter for deadlock avoidance, no?

> +	work->canceling++;
> +	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
> +
> +	if (worker->current_work != work)
> +		goto out_fast;

If there are two racing cancellers, wouldn't this allow the losing one
to return while the work item is still running?

> +	spin_unlock_irqrestore(&worker->lock, flags);
> +	flush_kthread_work(work);
> +	/*
> +	 * Nobody is allowed to switch the worker or queue the work
> +	 * when .canceling is set.
> +	 */
> +	spin_lock_irqsave(&worker->lock, flags);
> +
> +out_fast:
> +	work->canceling--;
> +	spin_unlock_irqrestore(&worker->lock, flags);
> +out:
> +	return ret;
> +}

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 10/22] kthread: Allow to modify delayed kthread work
  2016-01-25 15:44   ` Petr Mladek
@ 2016-01-25 19:19     ` Tejun Heo
  -1 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:44:59PM +0100, Petr Mladek wrote:
> +bool mod_delayed_kthread_work(struct kthread_worker *worker,
> +			      struct delayed_kthread_work *dwork,
> +			      unsigned long delay)
> +{
> +	struct kthread_work *work = &dwork->work;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +try_again:
> +	spin_lock_irqsave(&worker->lock, flags);
> +	WARN_ON_ONCE(work->worker && work->worker != worker);
> +
> +	if (work->canceling)
> +		goto out;
> +
> +	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
> +	if (ret == -EAGAIN)
> +		goto try_again;
> +
> +	if (work->canceling)

Does this test need to be repeated?  How would ->canceling change
while worker->lock is held?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 10/22] kthread: Allow to modify delayed kthread work
@ 2016-01-25 19:19     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:44:59PM +0100, Petr Mladek wrote:
> +bool mod_delayed_kthread_work(struct kthread_worker *worker,
> +			      struct delayed_kthread_work *dwork,
> +			      unsigned long delay)
> +{
> +	struct kthread_work *work = &dwork->work;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +try_again:
> +	spin_lock_irqsave(&worker->lock, flags);
> +	WARN_ON_ONCE(work->worker && work->worker != worker);
> +
> +	if (work->canceling)
> +		goto out;
> +
> +	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
> +	if (ret == -EAGAIN)
> +		goto try_again;
> +
> +	if (work->canceling)

Does this test need to be repeated?  How would ->canceling change
while worker->lock is held?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 11/22] kthread: Better support freezable kthread workers
  2016-01-25 15:45   ` Petr Mladek
@ 2016-01-25 19:21     ` Tejun Heo
  -1 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:45:00PM +0100, Petr Mladek wrote:
> @@ -556,6 +556,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
>  				const char *name,
>  				struct lock_class_key *key)
>  {
> +	worker->flags = 0;
>  	spin_lock_init(&worker->lock);
>  	lockdep_set_class_and_name(&worker->lock, key, name);
>  	INIT_LIST_HEAD(&worker->work_list);

Maybe memset the thing and drop 0, NULL inits?

> @@ -638,7 +643,8 @@ repeat:
>  EXPORT_SYMBOL_GPL(kthread_worker_fn);
>  
>  static struct kthread_worker *
> -__create_kthread_worker(int cpu, const char namefmt[], va_list args)
> +__create_kthread_worker(unsigned int flags, int cpu,
> +			const char namefmt[], va_list args)

Wouldn't @cpu, @flags be less confusing?  You would end up with, (A,
B, C) and (B, C) instead of (A, B, C) and (A, C).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 11/22] kthread: Better support freezable kthread workers
@ 2016-01-25 19:21     ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-01-25 19:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon, Jan 25, 2016 at 04:45:00PM +0100, Petr Mladek wrote:
> @@ -556,6 +556,7 @@ void __init_kthread_worker(struct kthread_worker *worker,
>  				const char *name,
>  				struct lock_class_key *key)
>  {
> +	worker->flags = 0;
>  	spin_lock_init(&worker->lock);
>  	lockdep_set_class_and_name(&worker->lock, key, name);
>  	INIT_LIST_HEAD(&worker->work_list);

Maybe memset the thing and drop 0, NULL inits?

> @@ -638,7 +643,8 @@ repeat:
>  EXPORT_SYMBOL_GPL(kthread_worker_fn);
>  
>  static struct kthread_worker *
> -__create_kthread_worker(int cpu, const char namefmt[], va_list args)
> +__create_kthread_worker(unsigned int flags, int cpu,
> +			const char namefmt[], va_list args)

Wouldn't @cpu, @flags be less confusing?  You would end up with, (A,
B, C) and (B, C) instead of (A, B, C) and (A, C).

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
  2016-01-25 18:53     ` Tejun Heo
@ 2016-02-16 15:44       ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 15:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon 2016-01-25 13:53:39, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 04:44:53PM +0100, Petr Mladek wrote:
> > +struct kthread_worker *
> > +create_kthread_worker_on_cpu(int cpu, const char namefmt[])
> > +{
> > +	if (cpu < 0 || cpu > num_possible_cpus())
> > +		return ERR_PTR(-EINVAL);
> 
> Comparing cpu ID to num_possible_cpus() doesn't make any sense.  It
> should either be testing against cpu_possible_mask or testing against
> nr_cpu_ids.  Does this test need to be in this function at all?

I wanted to be sure. The cpu number is later passed to
cpu_to_node(cpu) in kthread_create_on_cpu().

I am going to replace this with a check against nr_cpu_ids in
kthread_create_on_cpu() which makes more sense.

I might be too paranoid. But this is slow path. People
do mistakes...

Thanks,
Petr

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-02-16 15:44       ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 15:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon 2016-01-25 13:53:39, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 04:44:53PM +0100, Petr Mladek wrote:
> > +struct kthread_worker *
> > +create_kthread_worker_on_cpu(int cpu, const char namefmt[])
> > +{
> > +	if (cpu < 0 || cpu > num_possible_cpus())
> > +		return ERR_PTR(-EINVAL);
> 
> Comparing cpu ID to num_possible_cpus() doesn't make any sense.  It
> should either be testing against cpu_possible_mask or testing against
> nr_cpu_ids.  Does this test need to be in this function at all?

I wanted to be sure. The cpu number is later passed to
cpu_to_node(cpu) in kthread_create_on_cpu().

I am going to replace this with a check against nr_cpu_ids in
kthread_create_on_cpu() which makes more sense.

I might be too paranoid. But this is slow path. People
do mistakes...

Thanks,
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-02-16 16:08         ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-02-16 16:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Tue, Feb 16, 2016 at 04:44:43PM +0100, Petr Mladek wrote:
> I wanted to be sure. The cpu number is later passed to
> cpu_to_node(cpu) in kthread_create_on_cpu().
> 
> I am going to replace this with a check against nr_cpu_ids in
> kthread_create_on_cpu() which makes more sense.
> 
> I might be too paranoid. But this is slow path. People
> do mistakes...

idk, that just ended up adding a subtly broken code which checks for
an unlikely condition which would cause a crash anyway.  I don't see
the point.  If you want to insist on it, please at least make it a
WARN_ON().  It's a clear kernel bug.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-02-16 16:08         ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-02-16 16:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Feb 16, 2016 at 04:44:43PM +0100, Petr Mladek wrote:
> I wanted to be sure. The cpu number is later passed to
> cpu_to_node(cpu) in kthread_create_on_cpu().
> 
> I am going to replace this with a check against nr_cpu_ids in
> kthread_create_on_cpu() which makes more sense.
> 
> I might be too paranoid. But this is slow path. People
> do mistakes...

idk, that just ended up adding a subtly broken code which checks for
an unlikely condition which would cause a crash anyway.  I don't see
the point.  If you want to insist on it, please at least make it a
WARN_ON().  It's a clear kernel bug.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-02-16 16:08         ` Tejun Heo
  0 siblings, 0 replies; 82+ messages in thread
From: Tejun Heo @ 2016-02-16 16:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

Hello,

On Tue, Feb 16, 2016 at 04:44:43PM +0100, Petr Mladek wrote:
> I wanted to be sure. The cpu number is later passed to
> cpu_to_node(cpu) in kthread_create_on_cpu().
> 
> I am going to replace this with a check against nr_cpu_ids in
> kthread_create_on_cpu() which makes more sense.
> 
> I might be too paranoid. But this is slow path. People
> do mistakes...

idk, that just ended up adding a subtly broken code which checks for
an unlikely condition which would cause a crash anyway.  I don't see
the point.  If you want to insist on it, please at least make it a
WARN_ON().  It's a clear kernel bug.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-02-16 16:10         ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 16:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Tue 2016-02-16 16:44:43, Petr Mladek wrote:
> On Mon 2016-01-25 13:53:39, Tejun Heo wrote:
> > On Mon, Jan 25, 2016 at 04:44:53PM +0100, Petr Mladek wrote:
> > > +struct kthread_worker *
> > > +create_kthread_worker_on_cpu(int cpu, const char namefmt[])
> > > +{
> > > +	if (cpu < 0 || cpu > num_possible_cpus())
> > > +		return ERR_PTR(-EINVAL);
> > 
> > Comparing cpu ID to num_possible_cpus() doesn't make any sense.  It
> > should either be testing against cpu_possible_mask or testing against
> > nr_cpu_ids.  Does this test need to be in this function at all?
> 
> I wanted to be sure. The cpu number is later passed to
> cpu_to_node(cpu) in kthread_create_on_cpu().
> 
> I am going to replace this with a check against nr_cpu_ids in
> kthread_create_on_cpu() which makes more sense.
> 
> I might be too paranoid. But this is slow path. People
> do mistakes...

I take it back. I will remove the check at all.
Michal Hocko persuaded me offline that it does not make
much sense. This function is used from kernel code.
I need to believe that the usage is sane at this level.
Also too many checks makes the code harder to read.

Thanks,
Petr

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-02-16 16:10         ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 16:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue 2016-02-16 16:44:43, Petr Mladek wrote:
> On Mon 2016-01-25 13:53:39, Tejun Heo wrote:
> > On Mon, Jan 25, 2016 at 04:44:53PM +0100, Petr Mladek wrote:
> > > +struct kthread_worker *
> > > +create_kthread_worker_on_cpu(int cpu, const char namefmt[])
> > > +{
> > > +	if (cpu < 0 || cpu > num_possible_cpus())
> > > +		return ERR_PTR(-EINVAL);
> > 
> > Comparing cpu ID to num_possible_cpus() doesn't make any sense.  It
> > should either be testing against cpu_possible_mask or testing against
> > nr_cpu_ids.  Does this test need to be in this function at all?
> 
> I wanted to be sure. The cpu number is later passed to
> cpu_to_node(cpu) in kthread_create_on_cpu().
> 
> I am going to replace this with a check against nr_cpu_ids in
> kthread_create_on_cpu() which makes more sense.
> 
> I might be too paranoid. But this is slow path. People
> do mistakes...

I take it back. I will remove the check at all.
Michal Hocko persuaded me offline that it does not make
much sense. This function is used from kernel code.
I need to believe that the usage is sane at this level.
Also too many checks makes the code harder to read.

Thanks,
Petr

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 04/22] kthread: Add create_kthread_worker*()
@ 2016-02-16 16:10         ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 16:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Tue 2016-02-16 16:44:43, Petr Mladek wrote:
> On Mon 2016-01-25 13:53:39, Tejun Heo wrote:
> > On Mon, Jan 25, 2016 at 04:44:53PM +0100, Petr Mladek wrote:
> > > +struct kthread_worker *
> > > +create_kthread_worker_on_cpu(int cpu, const char namefmt[])
> > > +{
> > > +	if (cpu < 0 || cpu > num_possible_cpus())
> > > +		return ERR_PTR(-EINVAL);
> > 
> > Comparing cpu ID to num_possible_cpus() doesn't make any sense.  It
> > should either be testing against cpu_possible_mask or testing against
> > nr_cpu_ids.  Does this test need to be in this function at all?
> 
> I wanted to be sure. The cpu number is later passed to
> cpu_to_node(cpu) in kthread_create_on_cpu().
> 
> I am going to replace this with a check against nr_cpu_ids in
> kthread_create_on_cpu() which makes more sense.
> 
> I might be too paranoid. But this is slow path. People
> do mistakes...

I take it back. I will remove the check at all.
Michal Hocko persuaded me offline that it does not make
much sense. This function is used from kernel code.
I need to believe that the usage is sane at this level.
Also too many checks makes the code harder to read.

Thanks,
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
@ 2016-02-16 16:38       ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 16:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon 2016-01-25 13:57:47, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 04:44:56PM +0100, Petr Mladek wrote:
> > +static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
> > +					       struct kthread_work *work)
> > +{
> > +	lockdep_assert_held(&worker->lock);
> > +	WARN_ON_ONCE(!irqs_disabled());
> 
> Isn't worker->lock gonna be a irq-safe lock?  If so, why would this
> need to be tested separately?

I see. I'll remove it.

> > +	WARN_ON_ONCE(!list_empty(&work->node));
> > +	/* Do not use a work with more workers, see queue_kthread_work() */
> > +	WARN_ON_ONCE(work->worker && work->worker != worker);
> > +}
> 
> Is this sanity check function gonna be used from multiple places?

It will be reused when implementing __queue_delayed_kthread_work().
We will want to do these checks also before setting the timer.
See the 8th patch, e.g. at
http://thread.gmane.org/gmane.linux.kernel.mm/144964/focus=144973


> >  /* insert @work before @pos in @worker */
> >  static void insert_kthread_work(struct kthread_worker *worker,
> > -			       struct kthread_work *work,
> > -			       struct list_head *pos)
> > +				struct kthread_work *work,
> > +				struct list_head *pos)
> >  {
> > -	lockdep_assert_held(&worker->lock);
> > +	insert_kthread_work_sanity_check(worker, work);
> >  
> >  	list_add_tail(&work->node, pos);
> >  	work->worker = worker;
> > @@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
> >   * Queue @work to work processor @task for async execution.  @task
> >   * must have been created with kthread_worker_create().  Returns %true
> >   * if @work was successfully queued, %false if it was already pending.
> > + *
> > + * Never queue a work into a worker when it is being processed by another
> > + * one. Otherwise, some operations, e.g. cancel or flush, will not work
> > + * correctly or the work might run in parallel. This is not enforced
> > + * because it would make the code too complex. There are only warnings
> > + * printed when such a situation is detected.
> 
> I'm not sure the above paragraph adds much.  It isn't that accurate to
> begin with as what's being disallowed is larger scope than the above.
> Isn't the paragraph below enough?

Makes sense. I'll remove it.

> > + * Reinitialize the work if it needs to be used by another worker.
> > + * For example, when the worker was stopped and started again.
> >   */
> >  bool queue_kthread_work(struct kthread_worker *worker,
> >  			struct kthread_work *work)

Thanks,
Petr

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
@ 2016-02-16 16:38       ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 16:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon 2016-01-25 13:57:47, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 04:44:56PM +0100, Petr Mladek wrote:
> > +static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
> > +					       struct kthread_work *work)
> > +{
> > +	lockdep_assert_held(&worker->lock);
> > +	WARN_ON_ONCE(!irqs_disabled());
> 
> Isn't worker->lock gonna be a irq-safe lock?  If so, why would this
> need to be tested separately?

I see. I'll remove it.

> > +	WARN_ON_ONCE(!list_empty(&work->node));
> > +	/* Do not use a work with more workers, see queue_kthread_work() */
> > +	WARN_ON_ONCE(work->worker && work->worker != worker);
> > +}
> 
> Is this sanity check function gonna be used from multiple places?

It will be reused when implementing __queue_delayed_kthread_work().
We will want to do these checks also before setting the timer.
See the 8th patch, e.g. at
http://thread.gmane.org/gmane.linux.kernel.mm/144964/focus=144973


> >  /* insert @work before @pos in @worker */
> >  static void insert_kthread_work(struct kthread_worker *worker,
> > -			       struct kthread_work *work,
> > -			       struct list_head *pos)
> > +				struct kthread_work *work,
> > +				struct list_head *pos)
> >  {
> > -	lockdep_assert_held(&worker->lock);
> > +	insert_kthread_work_sanity_check(worker, work);
> >  
> >  	list_add_tail(&work->node, pos);
> >  	work->worker = worker;
> > @@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
> >   * Queue @work to work processor @task for async execution.  @task
> >   * must have been created with kthread_worker_create().  Returns %true
> >   * if @work was successfully queued, %false if it was already pending.
> > + *
> > + * Never queue a work into a worker when it is being processed by another
> > + * one. Otherwise, some operations, e.g. cancel or flush, will not work
> > + * correctly or the work might run in parallel. This is not enforced
> > + * because it would make the code too complex. There are only warnings
> > + * printed when such a situation is detected.
> 
> I'm not sure the above paragraph adds much.  It isn't that accurate to
> begin with as what's being disallowed is larger scope than the above.
> Isn't the paragraph below enough?

Makes sense. I'll remove it.

> > + * Reinitialize the work if it needs to be used by another worker.
> > + * For example, when the worker was stopped and started again.
> >   */
> >  bool queue_kthread_work(struct kthread_worker *worker,
> >  			struct kthread_work *work)

Thanks,
Petr

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers
@ 2016-02-16 16:38       ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-16 16:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon 2016-01-25 13:57:47, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 04:44:56PM +0100, Petr Mladek wrote:
> > +static void insert_kthread_work_sanity_check(struct kthread_worker *worker,
> > +					       struct kthread_work *work)
> > +{
> > +	lockdep_assert_held(&worker->lock);
> > +	WARN_ON_ONCE(!irqs_disabled());
> 
> Isn't worker->lock gonna be a irq-safe lock?  If so, why would this
> need to be tested separately?

I see. I'll remove it.

> > +	WARN_ON_ONCE(!list_empty(&work->node));
> > +	/* Do not use a work with more workers, see queue_kthread_work() */
> > +	WARN_ON_ONCE(work->worker && work->worker != worker);
> > +}
> 
> Is this sanity check function gonna be used from multiple places?

It will be reused when implementing __queue_delayed_kthread_work().
We will want to do these checks also before setting the timer.
See the 8th patch, e.g. at
http://thread.gmane.org/gmane.linux.kernel.mm/144964/focus=144973


> >  /* insert @work before @pos in @worker */
> >  static void insert_kthread_work(struct kthread_worker *worker,
> > -			       struct kthread_work *work,
> > -			       struct list_head *pos)
> > +				struct kthread_work *work,
> > +				struct list_head *pos)
> >  {
> > -	lockdep_assert_held(&worker->lock);
> > +	insert_kthread_work_sanity_check(worker, work);
> >  
> >  	list_add_tail(&work->node, pos);
> >  	work->worker = worker;
> > @@ -717,6 +730,15 @@ static void insert_kthread_work(struct kthread_worker *worker,
> >   * Queue @work to work processor @task for async execution.  @task
> >   * must have been created with kthread_worker_create().  Returns %true
> >   * if @work was successfully queued, %false if it was already pending.
> > + *
> > + * Never queue a work into a worker when it is being processed by another
> > + * one. Otherwise, some operations, e.g. cancel or flush, will not work
> > + * correctly or the work might run in parallel. This is not enforced
> > + * because it would make the code too complex. There are only warnings
> > + * printed when such a situation is detected.
> 
> I'm not sure the above paragraph adds much.  It isn't that accurate to
> begin with as what's being disallowed is larger scope than the above.
> Isn't the paragraph below enough?

Makes sense. I'll remove it.

> > + * Reinitialize the work if it needs to be used by another worker.
> > + * For example, when the worker was stopped and started again.
> >   */
> >  bool queue_kthread_work(struct kthread_worker *worker,
> >  			struct kthread_work *work)

Thanks,
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 09/22] kthread: Allow to cancel kthread work
  2016-01-25 19:17     ` Tejun Heo
@ 2016-02-19 16:22       ` Petr Mladek
  -1 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-19 16:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon 2016-01-25 14:17:09, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 04:44:58PM +0100, Petr Mladek wrote:
> > +static bool __cancel_kthread_work_sync(struct kthread_work *work)
[...]
 > > +	work->canceling++;
> > +	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
> > +
> > +	if (worker->current_work != work)
> > +		goto out_fast;
> 
> If there are two racing cancellers, wouldn't this allow the losing one
> to return while the work item is still running?

If the work is running, worker->current_work must point to it.
All cancelers will see it and queue its own kthread_flush_work.
It is a bit sub-optimal but it is trivial. I doubt that there
will be many parallel cancelers in practice.

> > +	spin_unlock_irqrestore(&worker->lock, flags);
> > +	flush_kthread_work(work);
> > +	/*
> > +	 * Nobody is allowed to switch the worker or queue the work
> > +	 * when .canceling is set.
> > +	 */
> > +	spin_lock_irqsave(&worker->lock, flags);
> > +
> > +out_fast:
> > +	work->canceling--;
> > +	spin_unlock_irqrestore(&worker->lock, flags);
> > +out:
> > +	return ret;

Best Regards,
Petr

PS: I have updated the patchset according to your other comments.
In addition, I got rid of many try_lock games. We do not allow
to queue the same work to different workers. Therefore it should
be enough to warn if the worker changes unexpectedly. It makes
the code even more simple. I still need to do some testing.
I will send it next week, hopefully.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v4 09/22] kthread: Allow to cancel kthread work
@ 2016-02-19 16:22       ` Petr Mladek
  0 siblings, 0 replies; 82+ messages in thread
From: Petr Mladek @ 2016-02-19 16:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
	linux-mm, Vlastimil Babka, linux-api, linux-kernel

On Mon 2016-01-25 14:17:09, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 04:44:58PM +0100, Petr Mladek wrote:
> > +static bool __cancel_kthread_work_sync(struct kthread_work *work)
[...]
 > > +	work->canceling++;
> > +	ret = try_to_cancel_kthread_work(work, &worker->lock, &flags);
> > +
> > +	if (worker->current_work != work)
> > +		goto out_fast;
> 
> If there are two racing cancellers, wouldn't this allow the losing one
> to return while the work item is still running?

If the work is running, worker->current_work must point to it.
All cancelers will see it and queue its own kthread_flush_work.
It is a bit sub-optimal but it is trivial. I doubt that there
will be many parallel cancelers in practice.

> > +	spin_unlock_irqrestore(&worker->lock, flags);
> > +	flush_kthread_work(work);
> > +	/*
> > +	 * Nobody is allowed to switch the worker or queue the work
> > +	 * when .canceling is set.
> > +	 */
> > +	spin_lock_irqsave(&worker->lock, flags);
> > +
> > +out_fast:
> > +	work->canceling--;
> > +	spin_unlock_irqrestore(&worker->lock, flags);
> > +out:
> > +	return ret;

Best Regards,
Petr

PS: I have updated the patchset according to your other comments.
In addition, I got rid of many try_lock games. We do not allow
to queue the same work to different workers. Therefore it should
be enough to warn if the worker changes unexpectedly. It makes
the code even more simple. I still need to do some testing.
I will send it next week, hopefully.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2016-02-19 16:22 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-25 15:44 [PATCH v4 00/22] kthread: Use kthread worker API more widely Petr Mladek
2016-01-25 15:44 ` Petr Mladek
2016-01-25 15:44 ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 01/22] timer: Allow to check when the timer callback has not finished yet Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 18:44   ` Tejun Heo
2016-01-25 18:44     ` Tejun Heo
2016-01-25 15:44 ` [PATCH v4 02/22] kthread/smpboot: Do not park in kthread_create_on_cpu() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 03/22] kthread: Allow to call __kthread_create_on_node() with va_list args Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 04/22] kthread: Add create_kthread_worker*() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 18:53   ` Tejun Heo
2016-01-25 18:53     ` Tejun Heo
2016-02-16 15:44     ` Petr Mladek
2016-02-16 15:44       ` Petr Mladek
2016-02-16 16:08       ` Tejun Heo
2016-02-16 16:08         ` Tejun Heo
2016-02-16 16:08         ` Tejun Heo
2016-02-16 16:10       ` Petr Mladek
2016-02-16 16:10         ` Petr Mladek
2016-02-16 16:10         ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 05/22] kthread: Add drain_kthread_worker() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 06/22] kthread: Add destroy_kthread_worker() Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 07/22] kthread: Detect when a kthread work is used by more workers Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 18:57   ` Tejun Heo
2016-01-25 18:57     ` Tejun Heo
2016-01-25 18:57     ` Tejun Heo
2016-02-16 16:38     ` Petr Mladek
2016-02-16 16:38       ` Petr Mladek
2016-02-16 16:38       ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 08/22] kthread: Initial support for delayed kthread work Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 19:04   ` Tejun Heo
2016-01-25 19:04     ` Tejun Heo
2016-01-25 19:04     ` Tejun Heo
2016-01-25 15:44 ` [PATCH v4 09/22] kthread: Allow to cancel " Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 19:17   ` Tejun Heo
2016-01-25 19:17     ` Tejun Heo
2016-02-19 16:22     ` Petr Mladek
2016-02-19 16:22       ` Petr Mladek
2016-01-25 15:44 ` [PATCH v4 10/22] kthread: Allow to modify delayed " Petr Mladek
2016-01-25 15:44   ` Petr Mladek
2016-01-25 19:19   ` Tejun Heo
2016-01-25 19:19     ` Tejun Heo
2016-01-25 15:45 ` [PATCH v4 11/22] kthread: Better support freezable kthread workers Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 19:21   ` Tejun Heo
2016-01-25 19:21     ` Tejun Heo
2016-01-25 15:45 ` [PATCH v4 12/22] kthread: Use try_lock_kthread_work() in flush_kthread_work() Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 13/22] mm/huge_page: Convert khugepaged() into kthread worker API Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 14/22] ring_buffer: Convert benchmark kthreads " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 15/22] hung_task: Convert hungtaskd " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 16/22] kmemleak: Convert kmemleak kthread " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 17/22] ipmi: Convert kipmi " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 18/22] IB/fmr_pool: Convert the cleanup thread " Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 19/22] memstick/r592: Better synchronize debug messages in r592_io kthread Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 20/22] memstick/r592: convert r592_io kthread into kthread worker API Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 15:45 ` [PATCH v4 21/22] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 16:23   ` Jacob Pan
2016-01-25 16:23     ` Jacob Pan
2016-01-25 15:45 ` [PATCH v4 22/22] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
2016-01-25 15:45   ` Petr Mladek
2016-01-25 16:28   ` Jacob Pan
2016-01-25 16:28     ` Jacob Pan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.