All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]()
@ 2012-07-27 23:54 Tejun Heo
  2012-07-27 23:54 ` [PATCH 01/15] workqueue: reorder queueing functions so that _on() variants are on top Tejun Heo
                   ` (14 more replies)
  0 siblings, 15 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang

delayed_work has been annoyingly missing the mechanism to modify timer
of a pending delayed_work - ie. mod_timer() counterpart.  delayed_work
users have been working around this using several methods - using an
explicit timer + work item, messing directly with delayed_work->timer,
and canceling before re-queueing, all of which are error-prone and/or
ugly.

Gustavo Padovan posted a RFC implementation[1] of mod_delayed_work() a
while back but it wasn't complete.  To properly implement
mod_delayed_work[_on](), it should be able to steal pending work items
which may be on timer or worklist or anywhere inbetween.  This is
similar to what __cancel_work_timer() does but it turns out that there
are a lot of holes around this area and try_to_grab_pending() needs
considerable amount of work to be used for other purposes too.

This patchset improves canceling and try_to_grab_pending(), use it to
implement mod_delayed_work[_on](), convert easy ones, and drop
__cancel_delayed_work_sync() which doesn't have relevant users
afterwards.

 0001-workqueue-reorder-queueing-functions-so-that-_on-var.patch
 0002-workqueue-make-queueing-functions-return-bool.patch
 0003-workqueue-add-missing-smp_wmb-in-process_one_work.patch
 0004-workqueue-disable-preemption-while-manipulating-PEND.patch
 0005-workqueue-set-delayed_work-timer-function-on-initial.patch
 0006-workqueue-unify-local-CPU-queueing-handling.patch
 0007-workqueue-fix-zero-delay-handling-of-queue_delayed_w.patch
 0008-workqueue-move-try_to_grab_pending-upwards.patch
 0009-workqueue-introduce-WORK_OFFQ_FLAG_.patch
 0010-workqueue-factor-out-__queue_delayed_work-from-queue.patch
 0011-workqueue-reorganize-try_to_grab_pending-and-__cance.patch
 0012-workqueue-mark-a-work-item-being-canceled-as-such.patch
 0013-workqueue-implement-mod_delayed_work-_on.patch
 0014-workqueue-use-mod_delayed_work-instead-of-cancel-que.patch
 0015-workqueue-deprecate-__cancel_delayed_work.patch

0001-0003 are preparatory.

0004 removes the possibility of cancel_sync spinning for extended
period of time while another task holding PENDING is preempted.

0005-0007 clean up local queueing handling.

0008-0011 prepare for try_to_grab_pending() improvements.

0012 makes try_to_grab_pending() distinguish transient failure which
can be safely busy-retried and failure because the work item is being
canceled, which may take arbitrary amount of time.

0013 uses the improve try_to_grab_pending() to implement
mod_delayed_work[_on]().

0014 converts cancel + queue sequences to mod_delayed_work().

0015 drops __cancel_delayed_work() which doesn't have any relevant
user left.  Not all conversions are trivial.  If you're involved with
edac, thinkpad_acpi, charger-manager, thermal_sys or nic link_watch,
please take a look at the conversion made in 0015.

This patchset is also available at the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-wq-mod_delayed

If nobody objects, I'd like to route this series through wq/for-3.7.
Changes to other subsystems are fairly localized and conflicts, if
they occur, shouldn't be too painful to handle.

Although this ends up adding ~90 LOC, it contains a lot more
documentation, converted only the apparent ones, and IMHO is
worthwhile to have regardless as it removes an annoyance which is
pretty easy to encounter while using delayed_work.

Thanks.

 block/blk-core.c                       |    8 
 block/blk-throttle.c                   |    7 
 block/genhd.c                          |    6 
 drivers/block/floppy.c                 |    5 
 drivers/edac/edac_mc.c                 |   17 
 drivers/infiniband/core/addr.c         |    4 
 drivers/infiniband/core/mad.c          |   16 
 drivers/infiniband/hw/nes/nes_hw.c     |    6 
 drivers/infiniband/hw/nes/nes_nic.c    |    5 
 drivers/input/keyboard/qt2160.c        |    3 
 drivers/input/mouse/synaptics_i2c.c    |    7 
 drivers/net/wireless/ipw2x00/ipw2100.c |    8 
 drivers/net/wireless/zd1211rw/zd_usb.c |    3 
 drivers/platform/x86/thinkpad_acpi.c   |   20 
 drivers/power/charger-manager.c        |    9 
 drivers/power/ds2760_battery.c         |    9 
 drivers/power/jz4740-battery.c         |    6 
 drivers/thermal/thermal_sys.c          |   15 
 drivers/video/omap2/dss/dsi.c          |    2 
 fs/afs/callback.c                      |    4 
 fs/afs/server.c                        |   10 
 fs/afs/vlocation.c                     |   14 
 fs/nfs/nfs4renewd.c                    |    3 
 include/linux/workqueue.h              |   49 +-
 kernel/workqueue.c                     |  690 ++++++++++++++++++++-------------
 net/core/dst.c                         |    4 
 net/core/link_watch.c                  |   21 -
 net/rfkill/input.c                     |    3 
 28 files changed, 521 insertions(+), 433 deletions(-)

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1159922

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/15] workqueue: reorder queueing functions so that _on() variants are on top
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
@ 2012-07-27 23:54 ` Tejun Heo
  2012-07-27 23:54 ` [PATCH 02/15] workqueue: make queueing functions return bool Tejun Heo
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

Currently, queue/schedule[_delayed]_work_on() are located below the
counterpart without the _on postifx even though the latter is usually
implemented using the former.  Swap them.

This is cleanup and doesn't cause any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |   13 +++--
 kernel/workqueue.c        |  124 ++++++++++++++++++++++----------------------
 2 files changed, 69 insertions(+), 68 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index af15545..5970342 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -365,23 +365,24 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
 
 extern void destroy_workqueue(struct workqueue_struct *wq);
 
-extern int queue_work(struct workqueue_struct *wq, struct work_struct *work);
 extern int queue_work_on(int cpu, struct workqueue_struct *wq,
 			struct work_struct *work);
-extern int queue_delayed_work(struct workqueue_struct *wq,
-			struct delayed_work *work, unsigned long delay);
+extern int queue_work(struct workqueue_struct *wq, struct work_struct *work);
 extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
+extern int queue_delayed_work(struct workqueue_struct *wq,
+			struct delayed_work *work, unsigned long delay);
 
 extern void flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
 extern void flush_scheduled_work(void);
 
-extern int schedule_work(struct work_struct *work);
 extern int schedule_work_on(int cpu, struct work_struct *work);
-extern int schedule_delayed_work(struct delayed_work *work, unsigned long delay);
+extern int schedule_work(struct work_struct *work);
 extern int schedule_delayed_work_on(int cpu, struct delayed_work *work,
-					unsigned long delay);
+				    unsigned long delay);
+extern int schedule_delayed_work(struct delayed_work *work,
+				 unsigned long delay);
 extern int schedule_on_each_cpu(work_func_t func);
 extern int keventd_up(void);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 692d976..07d309e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1053,27 +1053,6 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 }
 
 /**
- * queue_work - queue work on a workqueue
- * @wq: workqueue to use
- * @work: work to queue
- *
- * Returns 0 if @work was already on a queue, non-zero otherwise.
- *
- * We queue the work to the CPU on which it was submitted, but if the CPU dies
- * it can be processed by another CPU.
- */
-int queue_work(struct workqueue_struct *wq, struct work_struct *work)
-{
-	int ret;
-
-	ret = queue_work_on(get_cpu(), wq, work);
-	put_cpu();
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(queue_work);
-
-/**
  * queue_work_on - queue work on specific cpu
  * @cpu: CPU number to execute work on
  * @wq: workqueue to use
@@ -1097,31 +1076,34 @@ queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
 }
 EXPORT_SYMBOL_GPL(queue_work_on);
 
-static void delayed_work_timer_fn(unsigned long __data)
-{
-	struct delayed_work *dwork = (struct delayed_work *)__data;
-	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
-
-	__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
-}
-
 /**
- * queue_delayed_work - queue work on a workqueue after delay
+ * queue_work - queue work on a workqueue
  * @wq: workqueue to use
- * @dwork: delayable work to queue
- * @delay: number of jiffies to wait before queueing
+ * @work: work to queue
  *
  * Returns 0 if @work was already on a queue, non-zero otherwise.
+ *
+ * We queue the work to the CPU on which it was submitted, but if the CPU dies
+ * it can be processed by another CPU.
  */
-int queue_delayed_work(struct workqueue_struct *wq,
-			struct delayed_work *dwork, unsigned long delay)
+int queue_work(struct workqueue_struct *wq, struct work_struct *work)
 {
-	if (delay == 0)
-		return queue_work(wq, &dwork->work);
+	int ret;
 
-	return queue_delayed_work_on(-1, wq, dwork, delay);
+	ret = queue_work_on(get_cpu(), wq, work);
+	put_cpu();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(queue_work);
+
+static void delayed_work_timer_fn(unsigned long __data)
+{
+	struct delayed_work *dwork = (struct delayed_work *)__data;
+	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
+
+	__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
 }
-EXPORT_SYMBOL_GPL(queue_delayed_work);
 
 /**
  * queue_delayed_work_on - queue work on specific CPU after delay
@@ -1179,6 +1161,24 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 EXPORT_SYMBOL_GPL(queue_delayed_work_on);
 
 /**
+ * queue_delayed_work - queue work on a workqueue after delay
+ * @wq: workqueue to use
+ * @dwork: delayable work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * Returns 0 if @work was already on a queue, non-zero otherwise.
+ */
+int queue_delayed_work(struct workqueue_struct *wq,
+			struct delayed_work *dwork, unsigned long delay)
+{
+	if (delay == 0)
+		return queue_work(wq, &dwork->work);
+
+	return queue_delayed_work_on(-1, wq, dwork, delay);
+}
+EXPORT_SYMBOL_GPL(queue_delayed_work);
+
+/**
  * worker_enter_idle - enter idle state
  * @worker: worker which is entering idle state
  *
@@ -2877,6 +2877,19 @@ bool cancel_delayed_work_sync(struct delayed_work *dwork)
 }
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
+/*
+ * schedule_work_on - put work task on a specific cpu
+ * @cpu: cpu to put the work task on
+ * @work: job to be done
+ *
+ * This puts a job on a specific cpu
+ */
+int schedule_work_on(int cpu, struct work_struct *work)
+{
+	return queue_work_on(cpu, system_wq, work);
+}
+EXPORT_SYMBOL(schedule_work_on);
+
 /**
  * schedule_work - put work task in global workqueue
  * @work: job to be done
@@ -2894,18 +2907,21 @@ int schedule_work(struct work_struct *work)
 }
 EXPORT_SYMBOL(schedule_work);
 
-/*
- * schedule_work_on - put work task on a specific cpu
- * @cpu: cpu to put the work task on
- * @work: job to be done
+/**
+ * schedule_delayed_work_on - queue work in global workqueue on CPU after delay
+ * @cpu: cpu to use
+ * @dwork: job to be done
+ * @delay: number of jiffies to wait
  *
- * This puts a job on a specific cpu
+ * After waiting for a given time this puts a job in the kernel-global
+ * workqueue on the specified CPU.
  */
-int schedule_work_on(int cpu, struct work_struct *work)
+int schedule_delayed_work_on(int cpu,
+			struct delayed_work *dwork, unsigned long delay)
 {
-	return queue_work_on(cpu, system_wq, work);
+	return queue_delayed_work_on(cpu, system_wq, dwork, delay);
 }
-EXPORT_SYMBOL(schedule_work_on);
+EXPORT_SYMBOL(schedule_delayed_work_on);
 
 /**
  * schedule_delayed_work - put work task in global workqueue after delay
@@ -2923,22 +2939,6 @@ int schedule_delayed_work(struct delayed_work *dwork,
 EXPORT_SYMBOL(schedule_delayed_work);
 
 /**
- * schedule_delayed_work_on - queue work in global workqueue on CPU after delay
- * @cpu: cpu to use
- * @dwork: job to be done
- * @delay: number of jiffies to wait
- *
- * After waiting for a given time this puts a job in the kernel-global
- * workqueue on the specified CPU.
- */
-int schedule_delayed_work_on(int cpu,
-			struct delayed_work *dwork, unsigned long delay)
-{
-	return queue_delayed_work_on(cpu, system_wq, dwork, delay);
-}
-EXPORT_SYMBOL(schedule_delayed_work_on);
-
-/**
  * schedule_on_each_cpu - execute a function synchronously on each online CPU
  * @func: the function to call
  *
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/15] workqueue: make queueing functions return bool
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
  2012-07-27 23:54 ` [PATCH 01/15] workqueue: reorder queueing functions so that _on() variants are on top Tejun Heo
@ 2012-07-27 23:54 ` Tejun Heo
  2012-07-27 23:54 ` [PATCH 03/15] workqueue: add missing smp_wmb() in process_one_work() Tejun Heo
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

All queueing functions return 1 on success, 0 if the work item was
already pending.  Update them to return bool instead.  This signifies
better that they don't return 0 / -errno.

This is cleanup and doesn't cause any functional difference.

While at it, fix comment opening for schedule_work_on().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |   20 +++++++++---------
 kernel/workqueue.c        |   47 ++++++++++++++++++++++-----------------------
 2 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 5970342..278dc5d 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -365,24 +365,24 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
 
 extern void destroy_workqueue(struct workqueue_struct *wq);
 
-extern int queue_work_on(int cpu, struct workqueue_struct *wq,
+extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
 			struct work_struct *work);
-extern int queue_work(struct workqueue_struct *wq, struct work_struct *work);
-extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+extern bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
+extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
-extern int queue_delayed_work(struct workqueue_struct *wq,
+extern bool queue_delayed_work(struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
 
 extern void flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
 extern void flush_scheduled_work(void);
 
-extern int schedule_work_on(int cpu, struct work_struct *work);
-extern int schedule_work(struct work_struct *work);
-extern int schedule_delayed_work_on(int cpu, struct delayed_work *work,
-				    unsigned long delay);
-extern int schedule_delayed_work(struct delayed_work *work,
-				 unsigned long delay);
+extern bool schedule_work_on(int cpu, struct work_struct *work);
+extern bool schedule_work(struct work_struct *work);
+extern bool schedule_delayed_work_on(int cpu, struct delayed_work *work,
+				     unsigned long delay);
+extern bool schedule_delayed_work(struct delayed_work *work,
+				  unsigned long delay);
 extern int schedule_on_each_cpu(work_func_t func);
 extern int keventd_up(void);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 07d309e..70f95ab 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1058,19 +1058,19 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
  * @wq: workqueue to use
  * @work: work to queue
  *
- * Returns 0 if @work was already on a queue, non-zero otherwise.
+ * Returns %false if @work was already on a queue, %true otherwise.
  *
  * We queue the work to a specific CPU, the caller must ensure it
  * can't go away.
  */
-int
-queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
+bool queue_work_on(int cpu, struct workqueue_struct *wq,
+		   struct work_struct *work)
 {
-	int ret = 0;
+	bool ret = false;
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		__queue_work(cpu, wq, work);
-		ret = 1;
+		ret = true;
 	}
 	return ret;
 }
@@ -1081,14 +1081,14 @@ EXPORT_SYMBOL_GPL(queue_work_on);
  * @wq: workqueue to use
  * @work: work to queue
  *
- * Returns 0 if @work was already on a queue, non-zero otherwise.
+ * Returns %false if @work was already on a queue, %true otherwise.
  *
  * We queue the work to the CPU on which it was submitted, but if the CPU dies
  * it can be processed by another CPU.
  */
-int queue_work(struct workqueue_struct *wq, struct work_struct *work)
+bool queue_work(struct workqueue_struct *wq, struct work_struct *work)
 {
-	int ret;
+	bool ret;
 
 	ret = queue_work_on(get_cpu(), wq, work);
 	put_cpu();
@@ -1112,14 +1112,14 @@ static void delayed_work_timer_fn(unsigned long __data)
  * @dwork: work to queue
  * @delay: number of jiffies to wait before queueing
  *
- * Returns 0 if @work was already on a queue, non-zero otherwise.
+ * Returns %false if @work was already on a queue, %true otherwise.
  */
-int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
-			struct delayed_work *dwork, unsigned long delay)
+bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			   struct delayed_work *dwork, unsigned long delay)
 {
-	int ret = 0;
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
+	bool ret = false;
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		unsigned int lcpu;
@@ -1154,7 +1154,7 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			add_timer_on(timer, cpu);
 		else
 			add_timer(timer);
-		ret = 1;
+		ret = true;
 	}
 	return ret;
 }
@@ -1166,9 +1166,9 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
  * @dwork: delayable work to queue
  * @delay: number of jiffies to wait before queueing
  *
- * Returns 0 if @work was already on a queue, non-zero otherwise.
+ * Returns %false if @work was already on a queue, %true otherwise.
  */
-int queue_delayed_work(struct workqueue_struct *wq,
+bool queue_delayed_work(struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay)
 {
 	if (delay == 0)
@@ -2877,14 +2877,14 @@ bool cancel_delayed_work_sync(struct delayed_work *dwork)
 }
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
-/*
+/**
  * schedule_work_on - put work task on a specific cpu
  * @cpu: cpu to put the work task on
  * @work: job to be done
  *
  * This puts a job on a specific cpu
  */
-int schedule_work_on(int cpu, struct work_struct *work)
+bool schedule_work_on(int cpu, struct work_struct *work)
 {
 	return queue_work_on(cpu, system_wq, work);
 }
@@ -2894,14 +2894,14 @@ EXPORT_SYMBOL(schedule_work_on);
  * schedule_work - put work task in global workqueue
  * @work: job to be done
  *
- * Returns zero if @work was already on the kernel-global workqueue and
- * non-zero otherwise.
+ * Returns %false if @work was already on the kernel-global workqueue and
+ * %true otherwise.
  *
  * This puts a job in the kernel-global workqueue if it was not already
  * queued and leaves it in the same position on the kernel-global
  * workqueue otherwise.
  */
-int schedule_work(struct work_struct *work)
+bool schedule_work(struct work_struct *work)
 {
 	return queue_work(system_wq, work);
 }
@@ -2916,8 +2916,8 @@ EXPORT_SYMBOL(schedule_work);
  * After waiting for a given time this puts a job in the kernel-global
  * workqueue on the specified CPU.
  */
-int schedule_delayed_work_on(int cpu,
-			struct delayed_work *dwork, unsigned long delay)
+bool schedule_delayed_work_on(int cpu, struct delayed_work *dwork,
+			      unsigned long delay)
 {
 	return queue_delayed_work_on(cpu, system_wq, dwork, delay);
 }
@@ -2931,8 +2931,7 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
  * After waiting for a given time this puts a job in the kernel-global
  * workqueue.
  */
-int schedule_delayed_work(struct delayed_work *dwork,
-					unsigned long delay)
+bool schedule_delayed_work(struct delayed_work *dwork, unsigned long delay)
 {
 	return queue_delayed_work(system_wq, dwork, delay);
 }
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/15] workqueue: add missing smp_wmb() in process_one_work()
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
  2012-07-27 23:54 ` [PATCH 01/15] workqueue: reorder queueing functions so that _on() variants are on top Tejun Heo
  2012-07-27 23:54 ` [PATCH 02/15] workqueue: make queueing functions return bool Tejun Heo
@ 2012-07-27 23:54 ` Tejun Heo
  2012-07-27 23:54 ` [PATCH 04/15] workqueue: disable preemption while manipulating PENDING Tejun Heo
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo,
	Oleg Nesterov, stable

WORK_STRUCT_PENDING is used to claim ownership of a work item and
process_one_work() releases it before starting execution.  When
someone else grabs PENDING, all pre-release updates to the work item
should be visible and all updates made by the new owner should happen
afterwards.

Grabbing PENDING uses test_and_set_bit() and thus has a full barrier;
however, clearing doesn't have a matching wmb.  Given the preceding
spin_unlock and use of clear_bit, I don't believe this can be a
problem on an actual machine and there hasn't been any related report
but it still is theretically possible for clear_pending to permeate
upwards and happen before work->entry update.

Add an explicit smp_wmb() before work_clear_pending().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
---
 kernel/workqueue.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 70f95ab..5c26d36 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1997,7 +1997,9 @@ __acquires(&gcwq->lock)
 
 	spin_unlock_irq(&gcwq->lock);
 
+	smp_wmb();	/* paired with test_and_set_bit(PENDING) */
 	work_clear_pending(work);
+
 	lock_map_acquire_read(&cwq->wq->lockdep_map);
 	lock_map_acquire(&lockdep_map);
 	trace_workqueue_execute_start(work);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/15] workqueue: disable preemption while manipulating PENDING
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (2 preceding siblings ...)
  2012-07-27 23:54 ` [PATCH 03/15] workqueue: add missing smp_wmb() in process_one_work() Tejun Heo
@ 2012-07-27 23:54 ` Tejun Heo
  2012-07-30 20:10   ` [PATCH v2 " Tejun Heo
  2012-07-27 23:54 ` [PATCH 05/15] workqueue: set delayed_work->timer function on initialization Tejun Heo
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo,
	Oleg Nesterov

Queueing operations use WORK_STRUCT_PENDING_BIT to synchronize access
to the target work item.  They first try to claim the bit and proceed
with queueing only after that succeeds and there's a window between
PENDING being set and the actual queueing where the task can be
preempted.

There's also a similar window in process_one_work() when clearing
PENDING.  A work item is dequeued, gcwq->lock is released and then
PENDING is cleared and the worker might get preempted between
releasing gcwq->lock and clearing PENDING.

cancel[_delayed]_work_sync() tries to claim or steal PENDING.  The
function assumes that a work item with PENDING is either queued, or in
the process of being queued.  In the latter case, it busy-loops until
either the work item loses PENDING or is queued.  If canceling
coincides with the above described preemptions, the canceling task
will busy-loop while the queueing or executing task is preempted.

This patch keeps preemption disabled across claiming PENDING and
actual queueing and moves PENDING clearing in process_one_work()
inside gcwq->lock so that busy looping from PENDING && !queued doesn't
wait for preempted tasks.  Note that, in process_one_work(), setting
last CPU and clearing PENDING got merged into single operation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/workqueue.c |   55 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5c26d36..147869a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -537,9 +537,10 @@ static int work_next_color(int color)
  * work is on queue.  Once execution starts, WORK_STRUCT_CWQ is
  * cleared and the work data contains the cpu number it was last on.
  *
- * set_work_{cwq|cpu}() and clear_work_data() can be used to set the
- * cwq, cpu or clear work->data.  These functions should only be
- * called while the work is owned - ie. while the PENDING bit is set.
+ * set_work_cwq(), set_work_cpu_and_clear_pending() and clear_work_data()
+ * can be used to set the cwq, cpu or clear work->data.  These functions
+ * should only be called while the work is owned - ie. while the PENDING
+ * bit is set.
  *
  * get_work_[g]cwq() can be used to obtain the gcwq or cwq
  * corresponding to a work.  gcwq is available once the work has been
@@ -561,9 +562,10 @@ static void set_work_cwq(struct work_struct *work,
 		      WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags);
 }
 
-static void set_work_cpu(struct work_struct *work, unsigned int cpu)
+static void set_work_cpu_and_clear_pending(struct work_struct *work,
+					   unsigned int cpu)
 {
-	set_work_data(work, cpu << WORK_STRUCT_FLAG_BITS, WORK_STRUCT_PENDING);
+	set_work_data(work, cpu << WORK_STRUCT_FLAG_BITS, 0);
 }
 
 static void clear_work_data(struct work_struct *work)
@@ -983,6 +985,15 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	unsigned int work_flags;
 	unsigned long flags;
 
+	/*
+	 * While a work item is PENDING && off queue, a task trying to
+	 * steal the PENDING will busy-loop waiting for it to either get
+	 * queued or lose PENDING, so a task shouldn't be preempted between
+	 * grabbing PENDING and queueing @work.  Make sure the caller has
+	 * preemption disabled.
+	 */
+	WARN_ON_ONCE(!preempt_count());
+
 	debug_work_activate(work);
 
 	/* if dying, only works from the same workqueue are allowed */
@@ -1068,10 +1079,14 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq,
 {
 	bool ret = false;
 
+	preempt_disable();
+
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		__queue_work(cpu, wq, work);
 		ret = true;
 	}
+
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(queue_work_on);
@@ -1121,6 +1136,12 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	struct work_struct *work = &dwork->work;
 	bool ret = false;
 
+	/*
+	 * We shouldn't get preempted between claiming PENDING and adding
+	 * timer.  Read the comment in __queue_work() for details.
+	 */
+	preempt_disable();
+
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		unsigned int lcpu;
 
@@ -1156,6 +1177,8 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			add_timer(timer);
 		ret = true;
 	}
+
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(queue_delayed_work_on);
@@ -1970,15 +1993,13 @@ __acquires(&gcwq->lock)
 		return;
 	}
 
-	/* claim and process */
+	/* claim and dequeue */
 	debug_work_deactivate(work);
 	hlist_add_head(&worker->hentry, bwh);
 	worker->current_work = work;
 	worker->current_cwq = cwq;
 	work_color = get_work_color(work);
 
-	/* record the current cpu number in the work data and dequeue */
-	set_work_cpu(work, gcwq->cpu);
 	list_del_init(&work->entry);
 
 	/*
@@ -1995,10 +2016,18 @@ __acquires(&gcwq->lock)
 	if ((worker->flags & WORKER_UNBOUND) && need_more_worker(pool))
 		wake_up_worker(pool);
 
-	spin_unlock_irq(&gcwq->lock);
+	/*
+	 * Record the last CPU and clear PENDING.  The following wmb is
+	 * paired with the implied mb in test_and_set_bit(PENDING) and
+	 * ensures all updates to @work made here are visible to and
+	 * precede any updates by the next PENDING owner.  Also, clearing
+	 * PENDING is inside @gcwq->lock so that we don't get preempted
+	 * with PENDING set and @work off queue.
+	 */
+	smp_wmb();
+	set_work_cpu_and_clear_pending(work, gcwq->cpu);
 
-	smp_wmb();	/* paired with test_and_set_bit(PENDING) */
-	work_clear_pending(work);
+	spin_unlock_irq(&gcwq->lock);
 
 	lock_map_acquire_read(&cwq->wq->lockdep_map);
 	lock_map_acquire(&lockdep_map);
@@ -2836,9 +2865,11 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
  */
 bool flush_delayed_work(struct delayed_work *dwork)
 {
+	preempt_disable();
 	if (del_timer_sync(&dwork->timer))
 		__queue_work(raw_smp_processor_id(),
 			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+	preempt_enable();
 	return flush_work(&dwork->work);
 }
 EXPORT_SYMBOL(flush_delayed_work);
@@ -2857,9 +2888,11 @@ EXPORT_SYMBOL(flush_delayed_work);
  */
 bool flush_delayed_work_sync(struct delayed_work *dwork)
 {
+	preempt_disable();
 	if (del_timer_sync(&dwork->timer))
 		__queue_work(raw_smp_processor_id(),
 			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+	preempt_enable();
 	return flush_work_sync(&dwork->work);
 }
 EXPORT_SYMBOL(flush_delayed_work_sync);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/15] workqueue: set delayed_work->timer function on initialization
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (3 preceding siblings ...)
  2012-07-27 23:54 ` [PATCH 04/15] workqueue: disable preemption while manipulating PENDING Tejun Heo
@ 2012-07-27 23:54 ` Tejun Heo
  2012-07-27 23:54 ` [PATCH 06/15] workqueue: unify local CPU queueing handling Tejun Heo
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

delayed_work->timer.function is currently initialized during
queue_delayed_work_on().  Export delayed_work_timer_fn() and set
delayed_work timer function during delayed_work initialization
together with other fields.

This ensures the timer function is always valid on an initialized
delayed_work.  This is to help mod_delayed_work() implementation.

To detect delayed_work users which diddle with the internal timer,
trigger WARN if timer function doesn't match on queue.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |   13 +++++++++++--
 kernel/workqueue.c        |    7 ++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 278dc5d..ab95fef 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -16,6 +16,7 @@ struct workqueue_struct;
 
 struct work_struct;
 typedef void (*work_func_t)(struct work_struct *work);
+void delayed_work_timer_fn(unsigned long __data);
 
 /*
  * The first word is the work queue pointer and the flags rolled into
@@ -124,12 +125,14 @@ struct execute_work {
 
 #define __DELAYED_WORK_INITIALIZER(n, f) {			\
 	.work = __WORK_INITIALIZER((n).work, (f)),		\
-	.timer = TIMER_INITIALIZER(NULL, 0, 0),			\
+	.timer = TIMER_INITIALIZER(delayed_work_timer_fn,	\
+				0, (unsigned long)&(n)),	\
 	}
 
 #define __DEFERRED_WORK_INITIALIZER(n, f) {			\
 	.work = __WORK_INITIALIZER((n).work, (f)),		\
-	.timer = TIMER_DEFERRED_INITIALIZER(NULL, 0, 0),	\
+	.timer = TIMER_DEFERRED_INITIALIZER(delayed_work_timer_fn, \
+				0, (unsigned long)&(n)),	\
 	}
 
 #define DECLARE_WORK(n, f)					\
@@ -207,18 +210,24 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; }
 	do {							\
 		INIT_WORK(&(_work)->work, (_func));		\
 		init_timer(&(_work)->timer);			\
+		(_work)->timer.function = delayed_work_timer_fn;\
+		(_work)->timer.data = (unsigned long)(_work);	\
 	} while (0)
 
 #define INIT_DELAYED_WORK_ONSTACK(_work, _func)			\
 	do {							\
 		INIT_WORK_ONSTACK(&(_work)->work, (_func));	\
 		init_timer_on_stack(&(_work)->timer);		\
+		(_work)->timer.function = delayed_work_timer_fn;\
+		(_work)->timer.data = (unsigned long)(_work);	\
 	} while (0)
 
 #define INIT_DELAYED_WORK_DEFERRABLE(_work, _func)		\
 	do {							\
 		INIT_WORK(&(_work)->work, (_func));		\
 		init_timer_deferrable(&(_work)->timer);		\
+		(_work)->timer.function = delayed_work_timer_fn;\
+		(_work)->timer.data = (unsigned long)(_work);	\
 	} while (0)
 
 /**
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 147869a..db8098b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1112,13 +1112,14 @@ bool queue_work(struct workqueue_struct *wq, struct work_struct *work)
 }
 EXPORT_SYMBOL_GPL(queue_work);
 
-static void delayed_work_timer_fn(unsigned long __data)
+void delayed_work_timer_fn(unsigned long __data)
 {
 	struct delayed_work *dwork = (struct delayed_work *)__data;
 	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
 
 	__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
 }
+EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
 
 /**
  * queue_delayed_work_on - queue work on specific CPU after delay
@@ -1145,6 +1146,8 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		unsigned int lcpu;
 
+		WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
+			     timer->data != (unsigned long)dwork);
 		BUG_ON(timer_pending(timer));
 		BUG_ON(!list_empty(&work->entry));
 
@@ -1168,8 +1171,6 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 		set_work_cwq(work, get_cwq(lcpu, wq), 0);
 
 		timer->expires = jiffies + delay;
-		timer->data = (unsigned long)dwork;
-		timer->function = delayed_work_timer_fn;
 
 		if (unlikely(cpu >= 0))
 			add_timer_on(timer, cpu);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/15] workqueue: unify local CPU queueing handling
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (4 preceding siblings ...)
  2012-07-27 23:54 ` [PATCH 05/15] workqueue: set delayed_work->timer function on initialization Tejun Heo
@ 2012-07-27 23:54 ` Tejun Heo
  2012-07-27 23:55 ` [PATCH 07/15] workqueue: fix zero @delay handling of queue_delayed_work_on() Tejun Heo
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

Queueing functions have been using different methods to determine the
local CPU.

* queue_work() superflously uses get/put_cpu() to acquire and hold the
  local CPU across queue_work_on().

* delayed_work_timer_fn() uses smp_processor_id().

* queue_delayed_work() calls queue_delayed_work_on() with -1 @cpu
  which is interpreted as the local CPU.

* flush_delayed_work[_sync]() were using raw_smp_processor_id().

* __queue_work() interprets %WORK_CPU_UNBOUND as local CPU if the
  target workqueue is bound one but nobody uses this.

This patch converts all functions to uniformly use %WORK_CPU_UNBOUND
to indicate local CPU and use the local binding feature of
__queue_work().  unlikely() is dropped from %WORK_CPU_UNBOUND handling
in __queue_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   19 +++++++------------
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index db8098b..4beb9bf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1005,7 +1005,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	if (!(wq->flags & WQ_UNBOUND)) {
 		struct global_cwq *last_gcwq;
 
-		if (unlikely(cpu == WORK_CPU_UNBOUND))
+		if (cpu == WORK_CPU_UNBOUND)
 			cpu = raw_smp_processor_id();
 
 		/*
@@ -1103,12 +1103,7 @@ EXPORT_SYMBOL_GPL(queue_work_on);
  */
 bool queue_work(struct workqueue_struct *wq, struct work_struct *work)
 {
-	bool ret;
-
-	ret = queue_work_on(get_cpu(), wq, work);
-	put_cpu();
-
-	return ret;
+	return queue_work_on(WORK_CPU_UNBOUND, wq, work);
 }
 EXPORT_SYMBOL_GPL(queue_work);
 
@@ -1117,7 +1112,7 @@ void delayed_work_timer_fn(unsigned long __data)
 	struct delayed_work *dwork = (struct delayed_work *)__data;
 	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
 
-	__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
+	__queue_work(WORK_CPU_UNBOUND, cwq->wq, &dwork->work);
 }
 EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
 
@@ -1172,7 +1167,7 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 
 		timer->expires = jiffies + delay;
 
-		if (unlikely(cpu >= 0))
+		if (unlikely(cpu != WORK_CPU_UNBOUND))
 			add_timer_on(timer, cpu);
 		else
 			add_timer(timer);
@@ -1198,7 +1193,7 @@ bool queue_delayed_work(struct workqueue_struct *wq,
 	if (delay == 0)
 		return queue_work(wq, &dwork->work);
 
-	return queue_delayed_work_on(-1, wq, dwork, delay);
+	return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
 }
 EXPORT_SYMBOL_GPL(queue_delayed_work);
 
@@ -2868,7 +2863,7 @@ bool flush_delayed_work(struct delayed_work *dwork)
 {
 	preempt_disable();
 	if (del_timer_sync(&dwork->timer))
-		__queue_work(raw_smp_processor_id(),
+		__queue_work(WORK_CPU_UNBOUND,
 			     get_work_cwq(&dwork->work)->wq, &dwork->work);
 	preempt_enable();
 	return flush_work(&dwork->work);
@@ -2891,7 +2886,7 @@ bool flush_delayed_work_sync(struct delayed_work *dwork)
 {
 	preempt_disable();
 	if (del_timer_sync(&dwork->timer))
-		__queue_work(raw_smp_processor_id(),
+		__queue_work(WORK_CPU_UNBOUND,
 			     get_work_cwq(&dwork->work)->wq, &dwork->work);
 	preempt_enable();
 	return flush_work_sync(&dwork->work);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/15] workqueue: fix zero @delay handling of queue_delayed_work_on()
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (5 preceding siblings ...)
  2012-07-27 23:54 ` [PATCH 06/15] workqueue: unify local CPU queueing handling Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-27 23:55 ` [PATCH 08/15] workqueue: move try_to_grab_pending() upwards Tejun Heo
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

If @delay is zero and the dealyed_work is idle, queue_delayed_work()
queues it for immediate execution; however, queue_delayed_work_on()
lacks this logic and always goes through timer regardless of @delay.

This patch moves 0 @delay handling logic from queue_delayed_work() to
queue_delayed_work_on() so that both functions behave the same.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4beb9bf..e3383e8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1123,7 +1123,9 @@ EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
  * @dwork: work to queue
  * @delay: number of jiffies to wait before queueing
  *
- * Returns %false if @work was already on a queue, %true otherwise.
+ * Returns %false if @work was already on a queue, %true otherwise.  If
+ * @delay is zero and @dwork is idle, it will be scheduled for immediate
+ * execution.
  */
 bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			   struct delayed_work *dwork, unsigned long delay)
@@ -1132,6 +1134,9 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	struct work_struct *work = &dwork->work;
 	bool ret = false;
 
+	if (!delay)
+		return queue_work_on(cpu, wq, &dwork->work);
+
 	/*
 	 * We shouldn't get preempted between claiming PENDING and adding
 	 * timer.  Read the comment in __queue_work() for details.
@@ -1185,14 +1190,11 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
  * @dwork: delayable work to queue
  * @delay: number of jiffies to wait before queueing
  *
- * Returns %false if @work was already on a queue, %true otherwise.
+ * Equivalent to queue_delayed_work_on() but tries to use the local CPU.
  */
 bool queue_delayed_work(struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay)
 {
-	if (delay == 0)
-		return queue_work(wq, &dwork->work);
-
 	return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
 }
 EXPORT_SYMBOL_GPL(queue_delayed_work);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/15] workqueue: move try_to_grab_pending() upwards
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (6 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 07/15] workqueue: fix zero @delay handling of queue_delayed_work_on() Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-27 23:55 ` [PATCH 09/15] workqueue: introduce WORK_OFFQ_FLAG_* Tejun Heo
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

try_to_grab_pending() will be used by to-be-implemented
mod_delayed_work[_on]().  Move try_to_grab_pending() and related
functions above queueing functions.

This patch only moves functions around.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |  286 ++++++++++++++++++++++++++--------------------------
 1 files changed, 143 insertions(+), 143 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e3383e8..97d2f71 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -904,6 +904,149 @@ static struct worker *find_worker_executing_work(struct global_cwq *gcwq,
 }
 
 /**
+ * move_linked_works - move linked works to a list
+ * @work: start of series of works to be scheduled
+ * @head: target list to append @work to
+ * @nextp: out paramter for nested worklist walking
+ *
+ * Schedule linked works starting from @work to @head.  Work series to
+ * be scheduled starts at @work and includes any consecutive work with
+ * WORK_STRUCT_LINKED set in its predecessor.
+ *
+ * If @nextp is not NULL, it's updated to point to the next work of
+ * the last scheduled work.  This allows move_linked_works() to be
+ * nested inside outer list_for_each_entry_safe().
+ *
+ * CONTEXT:
+ * spin_lock_irq(gcwq->lock).
+ */
+static void move_linked_works(struct work_struct *work, struct list_head *head,
+			      struct work_struct **nextp)
+{
+	struct work_struct *n;
+
+	/*
+	 * Linked worklist will always end before the end of the list,
+	 * use NULL for list head.
+	 */
+	list_for_each_entry_safe_from(work, n, NULL, entry) {
+		list_move_tail(&work->entry, head);
+		if (!(*work_data_bits(work) & WORK_STRUCT_LINKED))
+			break;
+	}
+
+	/*
+	 * If we're already inside safe list traversal and have moved
+	 * multiple works to the scheduled queue, the next position
+	 * needs to be updated.
+	 */
+	if (nextp)
+		*nextp = n;
+}
+
+static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
+{
+	struct work_struct *work = list_first_entry(&cwq->delayed_works,
+						    struct work_struct, entry);
+
+	trace_workqueue_activate_work(work);
+	move_linked_works(work, &cwq->pool->worklist, NULL);
+	__clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
+	cwq->nr_active++;
+}
+
+/**
+ * cwq_dec_nr_in_flight - decrement cwq's nr_in_flight
+ * @cwq: cwq of interest
+ * @color: color of work which left the queue
+ * @delayed: for a delayed work
+ *
+ * A work either has completed or is removed from pending queue,
+ * decrement nr_in_flight of its cwq and handle workqueue flushing.
+ *
+ * CONTEXT:
+ * spin_lock_irq(gcwq->lock).
+ */
+static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color,
+				 bool delayed)
+{
+	/* ignore uncolored works */
+	if (color == WORK_NO_COLOR)
+		return;
+
+	cwq->nr_in_flight[color]--;
+
+	if (!delayed) {
+		cwq->nr_active--;
+		if (!list_empty(&cwq->delayed_works)) {
+			/* one down, submit a delayed one */
+			if (cwq->nr_active < cwq->max_active)
+				cwq_activate_first_delayed(cwq);
+		}
+	}
+
+	/* is flush in progress and are we at the flushing tip? */
+	if (likely(cwq->flush_color != color))
+		return;
+
+	/* are there still in-flight works? */
+	if (cwq->nr_in_flight[color])
+		return;
+
+	/* this cwq is done, clear flush_color */
+	cwq->flush_color = -1;
+
+	/*
+	 * If this was the last cwq, wake up the first flusher.  It
+	 * will handle the rest.
+	 */
+	if (atomic_dec_and_test(&cwq->wq->nr_cwqs_to_flush))
+		complete(&cwq->wq->first_flusher->done);
+}
+
+/*
+ * Upon a successful return (>= 0), the caller "owns" WORK_STRUCT_PENDING bit,
+ * so this work can't be re-armed in any way.
+ */
+static int try_to_grab_pending(struct work_struct *work)
+{
+	struct global_cwq *gcwq;
+	int ret = -1;
+
+	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
+		return 0;
+
+	/*
+	 * The queueing is in progress, or it is already queued. Try to
+	 * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
+	 */
+	gcwq = get_work_gcwq(work);
+	if (!gcwq)
+		return ret;
+
+	spin_lock_irq(&gcwq->lock);
+	if (!list_empty(&work->entry)) {
+		/*
+		 * This work is queued, but perhaps we locked the wrong gcwq.
+		 * In that case we must see the new value after rmb(), see
+		 * insert_work()->wmb().
+		 */
+		smp_rmb();
+		if (gcwq == get_work_gcwq(work)) {
+			debug_work_deactivate(work);
+			list_del_init(&work->entry);
+			cwq_dec_nr_in_flight(get_work_cwq(work),
+				get_work_color(work),
+				*work_data_bits(work) & WORK_STRUCT_DELAYED);
+			ret = 1;
+		}
+	}
+	spin_unlock_irq(&gcwq->lock);
+
+	return ret;
+}
+
+/**
  * insert_work - insert a work into gcwq
  * @cwq: cwq @work belongs to
  * @work: work to insert
@@ -1832,107 +1975,6 @@ static bool manage_workers(struct worker *worker)
 }
 
 /**
- * move_linked_works - move linked works to a list
- * @work: start of series of works to be scheduled
- * @head: target list to append @work to
- * @nextp: out paramter for nested worklist walking
- *
- * Schedule linked works starting from @work to @head.  Work series to
- * be scheduled starts at @work and includes any consecutive work with
- * WORK_STRUCT_LINKED set in its predecessor.
- *
- * If @nextp is not NULL, it's updated to point to the next work of
- * the last scheduled work.  This allows move_linked_works() to be
- * nested inside outer list_for_each_entry_safe().
- *
- * CONTEXT:
- * spin_lock_irq(gcwq->lock).
- */
-static void move_linked_works(struct work_struct *work, struct list_head *head,
-			      struct work_struct **nextp)
-{
-	struct work_struct *n;
-
-	/*
-	 * Linked worklist will always end before the end of the list,
-	 * use NULL for list head.
-	 */
-	list_for_each_entry_safe_from(work, n, NULL, entry) {
-		list_move_tail(&work->entry, head);
-		if (!(*work_data_bits(work) & WORK_STRUCT_LINKED))
-			break;
-	}
-
-	/*
-	 * If we're already inside safe list traversal and have moved
-	 * multiple works to the scheduled queue, the next position
-	 * needs to be updated.
-	 */
-	if (nextp)
-		*nextp = n;
-}
-
-static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
-{
-	struct work_struct *work = list_first_entry(&cwq->delayed_works,
-						    struct work_struct, entry);
-
-	trace_workqueue_activate_work(work);
-	move_linked_works(work, &cwq->pool->worklist, NULL);
-	__clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
-	cwq->nr_active++;
-}
-
-/**
- * cwq_dec_nr_in_flight - decrement cwq's nr_in_flight
- * @cwq: cwq of interest
- * @color: color of work which left the queue
- * @delayed: for a delayed work
- *
- * A work either has completed or is removed from pending queue,
- * decrement nr_in_flight of its cwq and handle workqueue flushing.
- *
- * CONTEXT:
- * spin_lock_irq(gcwq->lock).
- */
-static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color,
-				 bool delayed)
-{
-	/* ignore uncolored works */
-	if (color == WORK_NO_COLOR)
-		return;
-
-	cwq->nr_in_flight[color]--;
-
-	if (!delayed) {
-		cwq->nr_active--;
-		if (!list_empty(&cwq->delayed_works)) {
-			/* one down, submit a delayed one */
-			if (cwq->nr_active < cwq->max_active)
-				cwq_activate_first_delayed(cwq);
-		}
-	}
-
-	/* is flush in progress and are we at the flushing tip? */
-	if (likely(cwq->flush_color != color))
-		return;
-
-	/* are there still in-flight works? */
-	if (cwq->nr_in_flight[color])
-		return;
-
-	/* this cwq is done, clear flush_color */
-	cwq->flush_color = -1;
-
-	/*
-	 * If this was the last cwq, wake up the first flusher.  It
-	 * will handle the rest.
-	 */
-	if (atomic_dec_and_test(&cwq->wq->nr_cwqs_to_flush))
-		complete(&cwq->wq->first_flusher->done);
-}
-
-/**
  * process_one_work - process single work
  * @worker: self
  * @work: work to process
@@ -2767,48 +2809,6 @@ bool flush_work_sync(struct work_struct *work)
 }
 EXPORT_SYMBOL_GPL(flush_work_sync);
 
-/*
- * Upon a successful return (>= 0), the caller "owns" WORK_STRUCT_PENDING bit,
- * so this work can't be re-armed in any way.
- */
-static int try_to_grab_pending(struct work_struct *work)
-{
-	struct global_cwq *gcwq;
-	int ret = -1;
-
-	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
-		return 0;
-
-	/*
-	 * The queueing is in progress, or it is already queued. Try to
-	 * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
-	 */
-	gcwq = get_work_gcwq(work);
-	if (!gcwq)
-		return ret;
-
-	spin_lock_irq(&gcwq->lock);
-	if (!list_empty(&work->entry)) {
-		/*
-		 * This work is queued, but perhaps we locked the wrong gcwq.
-		 * In that case we must see the new value after rmb(), see
-		 * insert_work()->wmb().
-		 */
-		smp_rmb();
-		if (gcwq == get_work_gcwq(work)) {
-			debug_work_deactivate(work);
-			list_del_init(&work->entry);
-			cwq_dec_nr_in_flight(get_work_cwq(work),
-				get_work_color(work),
-				*work_data_bits(work) & WORK_STRUCT_DELAYED);
-			ret = 1;
-		}
-	}
-	spin_unlock_irq(&gcwq->lock);
-
-	return ret;
-}
-
 static bool __cancel_work_timer(struct work_struct *work,
 				struct timer_list* timer)
 {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/15] workqueue: introduce WORK_OFFQ_FLAG_*
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (7 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 08/15] workqueue: move try_to_grab_pending() upwards Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-27 23:55 ` [PATCH 10/15] workqueue: factor out __queue_delayed_work() from queue_delayed_work_on() Tejun Heo
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

Low WORK_STRUCT_FLAG_BITS bits of work_struct->data contain
WORK_STRUCT_FLAG_* and flush color.  If the work item is queued, the
rest point to the cpu_workqueue with WORK_STRUCT_CWQ set; otherwise,
WORK_STRUCT_CWQ is clear and the bits contain the last CPU number -
either a real CPU number or one of WORK_CPU_*.

Scheduled addition of mod_delayed_work[_on]() requires an additional
flag, which is used only while a work item is off queue.  There are
more than enough bits to represent off-queue CPU number on both 32 and
64bits.  This patch introduces WORK_OFFQ_FLAG_* which occupy the lower
part of the @work->data high bits while off queue.  This patch doesn't
define any actual OFFQ flag yet.

Off-queue CPU number is now shifted by WORK_OFFQ_CPU_SHIFT, which adds
the number of bits used by OFFQ flags to WORK_STRUCT_FLAG_SHIFT, to
make room for OFFQ flags.

To avoid shift width warning with large WORK_OFFQ_FLAG_BITS, ulong
cast is added to WORK_STRUCT_NO_CPU and, just in case, BUILD_BUG_ON()
to check that there are enough bits to accomodate off-queue CPU number
is added.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |    8 +++++++-
 kernel/workqueue.c        |   14 +++++++++-----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index ab95fef..f562674 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -68,9 +68,15 @@ enum {
 	WORK_STRUCT_FLAG_BITS	= WORK_STRUCT_COLOR_SHIFT +
 				  WORK_STRUCT_COLOR_BITS,
 
+	/* data contains off-queue information when !WORK_STRUCT_CWQ */
+	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_FLAG_BITS,
+	WORK_OFFQ_FLAG_BITS	= 0,
+	WORK_OFFQ_CPU_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
+
+	/* convenience constants */
 	WORK_STRUCT_FLAG_MASK	= (1UL << WORK_STRUCT_FLAG_BITS) - 1,
 	WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
-	WORK_STRUCT_NO_CPU	= WORK_CPU_NONE << WORK_STRUCT_FLAG_BITS,
+	WORK_STRUCT_NO_CPU	= (unsigned long)WORK_CPU_NONE << WORK_OFFQ_CPU_SHIFT,
 
 	/* bit mask for work_busy() return values */
 	WORK_BUSY_PENDING	= 1 << 0,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 97d2f71..58b42b2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -533,9 +533,9 @@ static int work_next_color(int color)
 }
 
 /*
- * A work's data points to the cwq with WORK_STRUCT_CWQ set while the
- * work is on queue.  Once execution starts, WORK_STRUCT_CWQ is
- * cleared and the work data contains the cpu number it was last on.
+ * While queued, %WORK_STRUCT_CWQ is set and non flag bits of a work's data
+ * contain the pointer to the queued cwq.  Once execution starts, the flag
+ * is cleared and the high bits contain OFFQ flags and CPU number.
  *
  * set_work_cwq(), set_work_cpu_and_clear_pending() and clear_work_data()
  * can be used to set the cwq, cpu or clear work->data.  These functions
@@ -565,7 +565,7 @@ static void set_work_cwq(struct work_struct *work,
 static void set_work_cpu_and_clear_pending(struct work_struct *work,
 					   unsigned int cpu)
 {
-	set_work_data(work, cpu << WORK_STRUCT_FLAG_BITS, 0);
+	set_work_data(work, (unsigned long)cpu << WORK_OFFQ_CPU_SHIFT, 0);
 }
 
 static void clear_work_data(struct work_struct *work)
@@ -592,7 +592,7 @@ static struct global_cwq *get_work_gcwq(struct work_struct *work)
 		return ((struct cpu_workqueue_struct *)
 			(data & WORK_STRUCT_WQ_DATA_MASK))->pool->gcwq;
 
-	cpu = data >> WORK_STRUCT_FLAG_BITS;
+	cpu = data >> WORK_OFFQ_CPU_SHIFT;
 	if (cpu == WORK_CPU_NONE)
 		return NULL;
 
@@ -3724,6 +3724,10 @@ static int __init init_workqueues(void)
 	unsigned int cpu;
 	int i;
 
+	/* make sure we have enough bits for OFFQ CPU number */
+	BUILD_BUG_ON((1LU << (BITS_PER_LONG - WORK_OFFQ_CPU_SHIFT)) <
+		     WORK_CPU_LAST);
+
 	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
 	cpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/15] workqueue: factor out __queue_delayed_work() from queue_delayed_work_on()
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (8 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 09/15] workqueue: introduce WORK_OFFQ_FLAG_* Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-27 23:55 ` [PATCH 11/15] workqueue: reorganize try_to_grab_pending() and __cancel_timer_work() Tejun Heo
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

This is to prepare for mod_delayed_work[_on]() and doesn't cause any
functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   74 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 58b42b2..358779c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1259,6 +1259,46 @@ void delayed_work_timer_fn(unsigned long __data)
 }
 EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
 
+static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
+				struct delayed_work *dwork, unsigned long delay)
+{
+	struct timer_list *timer = &dwork->timer;
+	struct work_struct *work = &dwork->work;
+	unsigned int lcpu;
+
+	WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
+		     timer->data != (unsigned long)dwork);
+	BUG_ON(timer_pending(timer));
+	BUG_ON(!list_empty(&work->entry));
+
+	timer_stats_timer_set_start_info(&dwork->timer);
+
+	/*
+	 * This stores cwq for the moment, for the timer_fn.  Note that the
+	 * work's gcwq is preserved to allow reentrance detection for
+	 * delayed works.
+	 */
+	if (!(wq->flags & WQ_UNBOUND)) {
+		struct global_cwq *gcwq = get_work_gcwq(work);
+
+		if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
+			lcpu = gcwq->cpu;
+		else
+			lcpu = raw_smp_processor_id();
+	} else {
+		lcpu = WORK_CPU_UNBOUND;
+	}
+
+	set_work_cwq(work, get_cwq(lcpu, wq), 0);
+
+	timer->expires = jiffies + delay;
+
+	if (unlikely(cpu != WORK_CPU_UNBOUND))
+		add_timer_on(timer, cpu);
+	else
+		add_timer(timer);
+}
+
 /**
  * queue_delayed_work_on - queue work on specific CPU after delay
  * @cpu: CPU number to execute work on
@@ -1273,7 +1313,6 @@ EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
 bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			   struct delayed_work *dwork, unsigned long delay)
 {
-	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
 	bool ret = false;
 
@@ -1287,38 +1326,7 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	preempt_disable();
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
-		unsigned int lcpu;
-
-		WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
-			     timer->data != (unsigned long)dwork);
-		BUG_ON(timer_pending(timer));
-		BUG_ON(!list_empty(&work->entry));
-
-		timer_stats_timer_set_start_info(&dwork->timer);
-
-		/*
-		 * This stores cwq for the moment, for the timer_fn.
-		 * Note that the work's gcwq is preserved to allow
-		 * reentrance detection for delayed works.
-		 */
-		if (!(wq->flags & WQ_UNBOUND)) {
-			struct global_cwq *gcwq = get_work_gcwq(work);
-
-			if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
-				lcpu = gcwq->cpu;
-			else
-				lcpu = raw_smp_processor_id();
-		} else
-			lcpu = WORK_CPU_UNBOUND;
-
-		set_work_cwq(work, get_cwq(lcpu, wq), 0);
-
-		timer->expires = jiffies + delay;
-
-		if (unlikely(cpu != WORK_CPU_UNBOUND))
-			add_timer_on(timer, cpu);
-		else
-			add_timer(timer);
+		__queue_delayed_work(cpu, wq, dwork, delay);
 		ret = true;
 	}
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/15] workqueue: reorganize try_to_grab_pending() and __cancel_timer_work()
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (9 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 10/15] workqueue: factor out __queue_delayed_work() from queue_delayed_work_on() Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-27 23:55 ` [PATCH 12/15] workqueue: mark a work item being canceled as such Tejun Heo
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

* Move timer handling from __cancel_work_timer() to
  try_to_grab_pending().

* Make try_to_grab_pending() use -EAGAIN instead of -1 for
  busy-looping and drop the ret local variable.

* Add proper function comment to try_to_grab_pending().

This makes the code a bit easier to understand and will ease further
changes.  This patch doesn't make any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   36 +++++++++++++++++++++++++-----------
 1 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 358779c..a0eb033 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1004,15 +1004,29 @@ static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color,
 		complete(&cwq->wq->first_flusher->done);
 }
 
-/*
- * Upon a successful return (>= 0), the caller "owns" WORK_STRUCT_PENDING bit,
- * so this work can't be re-armed in any way.
+/**
+ * try_to_grab_pending - steal work item from worklist
+ * @work: work item to steal
+ *
+ * Try to grab PENDING bit of @work.  This function can handle @work in any
+ * stable state - idle, on timer or on worklist.  Return values are
+ *
+ *  1		if @work was pending and we successfully stole PENDING
+ *  0		if @work was idle and we claimed PENDING
+ *  -EAGAIN	if PENDING couldn't be grabbed at the moment, safe to busy-retry
+ *
+ * On >= 0 return, the caller owns @work's PENDING bit.
  */
-static int try_to_grab_pending(struct work_struct *work)
+static int try_to_grab_pending(struct work_struct *work,
+			       struct timer_list *timer)
 {
 	struct global_cwq *gcwq;
-	int ret = -1;
 
+	/* try to steal the timer if it exists */
+	if (timer && likely(del_timer(timer)))
+		return 1;
+
+	/* try to claim PENDING the normal way */
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
 		return 0;
 
@@ -1022,7 +1036,7 @@ static int try_to_grab_pending(struct work_struct *work)
 	 */
 	gcwq = get_work_gcwq(work);
 	if (!gcwq)
-		return ret;
+		return -EAGAIN;
 
 	spin_lock_irq(&gcwq->lock);
 	if (!list_empty(&work->entry)) {
@@ -1038,12 +1052,14 @@ static int try_to_grab_pending(struct work_struct *work)
 			cwq_dec_nr_in_flight(get_work_cwq(work),
 				get_work_color(work),
 				*work_data_bits(work) & WORK_STRUCT_DELAYED);
-			ret = 1;
+
+			spin_unlock_irq(&gcwq->lock);
+			return 1;
 		}
 	}
 	spin_unlock_irq(&gcwq->lock);
 
-	return ret;
+	return -EAGAIN;
 }
 
 /**
@@ -2823,9 +2839,7 @@ static bool __cancel_work_timer(struct work_struct *work,
 	int ret;
 
 	do {
-		ret = (timer && likely(del_timer(timer)));
-		if (!ret)
-			ret = try_to_grab_pending(work);
+		ret = try_to_grab_pending(work, timer);
 		wait_on_work(work);
 	} while (unlikely(ret < 0));
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/15] workqueue: mark a work item being canceled as such
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (10 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 11/15] workqueue: reorganize try_to_grab_pending() and __cancel_timer_work() Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-30 20:11   ` [PATCH v2 " Tejun Heo
  2012-07-27 23:55 ` [PATCH 13/15] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

There can be two reasons try_to_grab_pending() can fail with -EAGAIN.
One is when someone else is queueing or deqeueing the work item.  With
the previous patches, it is guaranteed that PENDING and queued state
will soon agree making it safe to busy-retry in this case.

The other is if multiple __cancel_work_timer() invocations are racing
one another.  __cancel_work_timer() grabs PENDING and then waits for
running instances of the target work item on all CPUs while holding
PENDING and !queued.  try_to_grab_pending() invoked from another task
will keep returning -EAGAIN while the current owner is waiting.

Not distinguishing the two cases is okay because __cancel_work_timer()
is the only user of try_to_grab_pending() and it invokes
wait_on_work() whenever grabbing fails.  For the first case, busy
looping should be fine but wait_on_work() doesn't cause any critical
problem.  For the latter case, the new contender usually waits for the
same condition as the current owner, so no unnecessarily extended
busy-looping happens.  Combined, these make __cancel_work_timer()
technically correct even without preemption protection while grabbing
PENDING or distinguishing the two different cases.

While the current code is technically correct, not distinguishing the
two cases makes it difficult to use try_to_grab_pending() for other
purposes than canceling because it's impossible to tell whether it's
safe to busy-retry grabbing.

This patch adds a mechanism to mark a work item being canceled.
try_to_grab_pending() now requires the caller to have preemption
disabled and returns -EAGAIN to indicate that grabbing failed but
PENDING and queued states are gonna agree soon and it's safe to
busy-loop.  It returns -ENOENT if the work item is being canceled and
it may stay PENDING && !queued for arbitrary amount of time.

__cancel_work_timer() is modified to mark the work canceling with
WORK_OFFQ_CANCELING after grabbing PENDING, thus making
try_to_grab_pending() fail with -ENOENT instead of -EAGAIN.  Also, it
invokes wait_on_work() iff grabbing failed with -ENOENT.  This isn't
necessary for correctness but makes it consistent with other future
users of try_to_grab_pending().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |    5 +++-
 kernel/workqueue.c        |   67 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index f562674..5f4aeaa 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -70,7 +70,10 @@ enum {
 
 	/* data contains off-queue information when !WORK_STRUCT_CWQ */
 	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_FLAG_BITS,
-	WORK_OFFQ_FLAG_BITS	= 0,
+
+	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),
+
+	WORK_OFFQ_FLAG_BITS	= 1,
 	WORK_OFFQ_CPU_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
 
 	/* convenience constants */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a0eb033..c90f00e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -537,15 +537,20 @@ static int work_next_color(int color)
  * contain the pointer to the queued cwq.  Once execution starts, the flag
  * is cleared and the high bits contain OFFQ flags and CPU number.
  *
- * set_work_cwq(), set_work_cpu_and_clear_pending() and clear_work_data()
- * can be used to set the cwq, cpu or clear work->data.  These functions
- * should only be called while the work is owned - ie. while the PENDING
- * bit is set.
+ * set_work_cwq(), set_work_cpu_and_clear_pending(), mark_work_canceling()
+ * and clear_work_data() can be used to set the cwq, cpu or clear
+ * work->data.  These functions should only be called while the work is
+ * owned - ie. while the PENDING bit is set.
  *
- * get_work_[g]cwq() can be used to obtain the gcwq or cwq
- * corresponding to a work.  gcwq is available once the work has been
- * queued anywhere after initialization.  cwq is available only from
- * queueing until execution starts.
+ * get_work_[g]cwq() can be used to obtain the gcwq or cwq corresponding to
+ * a work.  gcwq is available once the work has been queued anywhere after
+ * initialization until it is sync canceled.  cwq is available only while
+ * the work item is queued.
+ *
+ * %WORK_OFFQ_CANCELING is used to mark a work item which is being
+ * canceled.  While being canceled, a work item may have its PENDING set
+ * but stay off timer and worklist for arbitrarily long and nobody should
+ * try to steal the PENDING bit.
  */
 static inline void set_work_data(struct work_struct *work, unsigned long data,
 				 unsigned long flags)
@@ -600,6 +605,22 @@ static struct global_cwq *get_work_gcwq(struct work_struct *work)
 	return get_gcwq(cpu);
 }
 
+static void mark_work_canceling(struct work_struct *work)
+{
+	struct global_cwq *gcwq = get_work_gcwq(work);
+	unsigned long cpu = gcwq ? gcwq->cpu : WORK_CPU_NONE;
+
+	set_work_data(work, (cpu << WORK_OFFQ_CPU_SHIFT) | WORK_OFFQ_CANCELING,
+		      WORK_STRUCT_PENDING);
+}
+
+static bool work_is_canceling(struct work_struct *work)
+{
+	unsigned long data = atomic_long_read(&work->data);
+
+	return !(data & WORK_STRUCT_CWQ) && (data & WORK_OFFQ_CANCELING);
+}
+
 /*
  * Policy functions.  These define the policies on how the global worker
  * pools are managed.  Unless noted otherwise, these functions assume that
@@ -1014,14 +1035,21 @@ static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color,
  *  1		if @work was pending and we successfully stole PENDING
  *  0		if @work was idle and we claimed PENDING
  *  -EAGAIN	if PENDING couldn't be grabbed at the moment, safe to busy-retry
+ *  -ENOENT	if someone else is canceling @work, this state may persist
+ *		for arbitrarily long
  *
- * On >= 0 return, the caller owns @work's PENDING bit.
+ * On >= 0 return, the caller owns @work's PENDING bit.  To avoid getting
+ * preempted while holding PENDING and @work off queue, preemption must be
+ * disabled on entry.  This ensures that we don't return -EAGAIN while
+ * another task is preempted in this function.
  */
 static int try_to_grab_pending(struct work_struct *work,
 			       struct timer_list *timer)
 {
 	struct global_cwq *gcwq;
 
+	WARN_ON_ONCE(!preempt_count());
+
 	/* try to steal the timer if it exists */
 	if (timer && likely(del_timer(timer)))
 		return 1;
@@ -1059,6 +1087,10 @@ static int try_to_grab_pending(struct work_struct *work,
 	}
 	spin_unlock_irq(&gcwq->lock);
 
+	if (unlikely(work_is_canceling(work)))
+		return -ENOENT;
+
+	cpu_relax();
 	return -EAGAIN;
 }
 
@@ -2838,11 +2870,26 @@ static bool __cancel_work_timer(struct work_struct *work,
 {
 	int ret;
 
+	preempt_disable();
+
 	do {
 		ret = try_to_grab_pending(work, timer);
-		wait_on_work(work);
+		/*
+		 * If someone else is canceling, wait for the same event it
+		 * would be waiting for before retrying.
+		 */
+		if (unlikely(ret == -ENOENT)) {
+			preempt_enable();
+			wait_on_work(work);
+			preempt_disable();
+		}
 	} while (unlikely(ret < 0));
 
+	/* tell other tasks trying to grab @work to back off */
+	mark_work_canceling(work);
+	preempt_enable();
+
+	wait_on_work(work);
 	clear_work_data(work);
 	return ret;
 }
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/15] workqueue: implement mod_delayed_work[_on]()
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (11 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 12/15] workqueue: mark a work item being canceled as such Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-27 23:55 ` [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue Tejun Heo
  2012-07-27 23:55 ` [PATCH 15/15] workqueue: deprecate __cancel_delayed_work() Tejun Heo
  14 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo

Workqueue was lacking a mechanism to modify the timeout of an already
pending delayed_work.  delayed_work users have been working around
this using several methods - using an explicit timer + work item,
messing directly with delayed_work->timer, and canceling before
re-queueing, all of which are error-prone and/or ugly.

This patch implements mod_delayed_work[_on]() which behaves similarly
to mod_timer() - if the delayed_work is idle, it's queued with the
given delay; otherwise, its timeout is modified to the new value.
Zero @delay guarantees immediate execution.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/workqueue.h |    4 +++
 kernel/workqueue.c        |   51 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 5f4aeaa..2000030 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -390,6 +390,10 @@ extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
 extern bool queue_delayed_work(struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
+extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			struct delayed_work *dwork, unsigned long delay);
+extern bool mod_delayed_work(struct workqueue_struct *wq,
+			struct delayed_work *dwork, unsigned long delay);
 
 extern void flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c90f00e..3839203 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1399,6 +1399,57 @@ bool queue_delayed_work(struct workqueue_struct *wq,
 EXPORT_SYMBOL_GPL(queue_delayed_work);
 
 /**
+ * mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * If @dwork is idle, equivalent to queue_delayed_work_on(); otherwise,
+ * modify @dwork's timer so that it expires after @delay.  If @delay is
+ * zero, @work is guaranteed to be scheduled immediately regardless of its
+ * current state.
+ *
+ * Returns %false if @dwork was idle and queued, %true if @dwork was
+ * pending and its timer was modified.
+ */
+bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			 struct delayed_work *dwork, unsigned long delay)
+{
+	int ret;
+
+	preempt_disable();
+
+	do {
+		ret = try_to_grab_pending(&dwork->work, &dwork->timer);
+	} while (unlikely(ret == -EAGAIN));
+
+	if (likely(ret >= 0))
+		__queue_delayed_work(cpu, wq, dwork, delay);
+
+	preempt_enable();
+
+	/* -ENOENT from try_to_grab_pending() becomes %true */
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mod_delayed_work_on);
+
+/**
+ * mod_delayed_work - modify delay of or queue a delayed work
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * mod_delayed_work_on() on local CPU.
+ */
+bool mod_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork,
+		      unsigned long delay)
+{
+	return mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
+}
+EXPORT_SYMBOL_GPL(mod_delayed_work);
+
+/**
  * worker_enter_idle - enter idle state
  * @worker: worker which is entering idle state
  *
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (12 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 13/15] workqueue: implement mod_delayed_work[_on]() Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-28  1:05   ` Henrique de Moraes Holschuh
                     ` (3 more replies)
  2012-07-27 23:55 ` [PATCH 15/15] workqueue: deprecate __cancel_delayed_work() Tejun Heo
  14 siblings, 4 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo, Jens Axboe,
	Jiri Kosina, David Airlie, Roland Dreier, Dmitry Torokhov,
	John W. Linville, Len Brown, David Howells, J. Bruce Fields,
	Johannes Berg

Convert delayed_work users doing [__]cancel_delayed_work() +
queue_delayed_work() to mod_delayed_work().

Most conversions are straight-forward.  Ones worth mentioning are,

* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
  use mod_delayed_work() and cancel loop in
  edac_mc_reset_delay_period() is dropped.

* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
  watchdog is active or not.  @fan_watchdog_active and related code
  dropped.

* drivers/power/charger-manager.c: Seemingly a lot of
  delayed_work_pending() abuse going on here.
  [delayed_]work_pending() are unsynchronized and racy when used like
  this.  I converted one instance in fullbatt_handler().  Please
  conver the rest so that it invokes workqueue APIs for the intended
  target state rather than trying to game work item pending state
  transitions.  e.g. if timer should be modified - call
  mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
  simplified.  Note that round_jiffies() calls in this function are
  meaningless.  round_jiffies() work on absolute jiffies not delta
  delay used by delayed_work.

* net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
  elaborate dancing around its delayed_work.  Collapse it such that
  linkwatch_work is queued for immediate execution if LW_URGENT and
  existing timer is kept otherwise.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
---
 block/blk-core.c                       |    6 ++----
 block/blk-throttle.c                   |    7 +------
 block/genhd.c                          |    6 ++----
 drivers/block/floppy.c                 |    3 +--
 drivers/edac/edac_mc.c                 |   17 +----------------
 drivers/infiniband/core/addr.c         |    4 +---
 drivers/infiniband/core/mad.c          |   14 +++++---------
 drivers/infiniband/hw/nes/nes_hw.c     |    6 ++----
 drivers/infiniband/hw/nes/nes_nic.c    |    5 ++---
 drivers/input/keyboard/qt2160.c        |    3 +--
 drivers/input/mouse/synaptics_i2c.c    |    7 +------
 drivers/net/wireless/ipw2x00/ipw2100.c |    8 +++-----
 drivers/net/wireless/zd1211rw/zd_usb.c |    3 +--
 drivers/platform/x86/thinkpad_acpi.c   |   20 +++++---------------
 drivers/power/charger-manager.c        |    9 +++------
 drivers/power/ds2760_battery.c         |    9 +++------
 drivers/power/jz4740-battery.c         |    6 ++----
 drivers/thermal/thermal_sys.c          |   15 ++++++---------
 fs/afs/callback.c                      |    4 +---
 fs/afs/server.c                        |   10 ++--------
 fs/afs/vlocation.c                     |   14 +++-----------
 fs/nfs/nfs4renewd.c                    |    3 +--
 net/core/dst.c                         |    4 ++--
 net/core/link_watch.c                  |   21 ++++++---------------
 net/rfkill/input.c                     |    3 +--
 25 files changed, 58 insertions(+), 149 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 93eb3e4..43f6c17 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -319,10 +319,8 @@ EXPORT_SYMBOL(__blk_run_queue);
  */
 void blk_run_queue_async(struct request_queue *q)
 {
-	if (likely(!blk_queue_stopped(q))) {
-		__cancel_delayed_work(&q->delay_work);
-		queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
-	}
+	if (likely(!blk_queue_stopped(q)))
+		mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
 }
 EXPORT_SYMBOL(blk_run_queue_async);
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5b06595..69516e6 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -930,12 +930,7 @@ throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay)
 
 	/* schedule work if limits changed even if no bio is queued */
 	if (total_nr_queued(td) || td->limits_changed) {
-		/*
-		 * We might have a work scheduled to be executed in future.
-		 * Cancel that and schedule a new one.
-		 */
-		__cancel_delayed_work(dwork);
-		queue_delayed_work(kthrotld_workqueue, dwork, delay);
+		mod_delayed_work(kthrotld_workqueue, dwork, delay);
 		throtl_log(td, "schedule work. delay=%lu jiffies=%lu",
 				delay, jiffies);
 	}
diff --git a/block/genhd.c b/block/genhd.c
index 9cf5583..ba16f2f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1524,10 +1524,8 @@ void disk_flush_events(struct gendisk *disk, unsigned int mask)
 
 	spin_lock_irq(&ev->lock);
 	ev->clearing |= mask;
-	if (!ev->block) {
-		cancel_delayed_work(&ev->dwork);
-		queue_delayed_work(system_nrt_freezable_wq, &ev->dwork, 0);
-	}
+	if (!ev->block)
+		mod_delayed_work(system_nrt_freezable_wq, &ev->dwork, 0);
 	spin_unlock_irq(&ev->lock);
 }
 
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 553f43a..aebd1e8 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -671,7 +671,6 @@ static void __reschedule_timeout(int drive, const char *message)
 
 	if (drive == current_reqD)
 		drive = current_drive;
-	__cancel_delayed_work(&fd_timeout);
 
 	if (drive < 0 || drive >= N_DRIVE) {
 		delay = 20UL * HZ;
@@ -679,7 +678,7 @@ static void __reschedule_timeout(int drive, const char *message)
 	} else
 		delay = UDP->timeout;
 
-	queue_delayed_work(floppy_wq, &fd_timeout, delay);
+	mod_delayed_work(floppy_wq, &fd_timeout, delay);
 	if (UDP->flags & FD_DEBUG)
 		DPRINT("reschedule timeout %s\n", message);
 	timeout_message = message;
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index de5ba86e..9dbde5f 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -492,7 +492,7 @@ static void edac_mc_workq_setup(struct mem_ctl_info *mci, unsigned msec)
 		return;
 
 	INIT_DELAYED_WORK(&mci->work, edac_mc_workq_function);
-	queue_delayed_work(edac_workqueue, &mci->work, msecs_to_jiffies(msec));
+	mod_delayed_work(edac_workqueue, &mci->work, msecs_to_jiffies(msec));
 }
 
 /*
@@ -533,21 +533,6 @@ void edac_mc_reset_delay_period(int value)
 
 	mutex_lock(&mem_ctls_mutex);
 
-	/* scan the list and turn off all workq timers, doing so under lock
-	 */
-	list_for_each(item, &mc_devices) {
-		mci = list_entry(item, struct mem_ctl_info, link);
-
-		if (mci->op_state == OP_RUNNING_POLL)
-			cancel_delayed_work(&mci->work);
-	}
-
-	mutex_unlock(&mem_ctls_mutex);
-
-
-	/* re-walk the list, and reset the poll delay */
-	mutex_lock(&mem_ctls_mutex);
-
 	list_for_each(item, &mc_devices) {
 		mci = list_entry(item, struct mem_ctl_info, link);
 
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 28058ae..eaec8d7 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -152,13 +152,11 @@ static void set_timeout(unsigned long time)
 {
 	unsigned long delay;
 
-	cancel_delayed_work(&work);
-
 	delay = time - jiffies;
 	if ((long)delay <= 0)
 		delay = 1;
 
-	queue_delayed_work(addr_wq, &work, delay);
+	mod_delayed_work(addr_wq, &work, delay);
 }
 
 static void queue_req(struct addr_req *req)
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index b0d0bc8..b593814 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2013,13 +2013,11 @@ static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv)
 		if (time_after(mad_agent_priv->timeout,
 			       mad_send_wr->timeout)) {
 			mad_agent_priv->timeout = mad_send_wr->timeout;
-			__cancel_delayed_work(&mad_agent_priv->timed_work);
 			delay = mad_send_wr->timeout - jiffies;
 			if ((long)delay <= 0)
 				delay = 1;
-			queue_delayed_work(mad_agent_priv->qp_info->
-					   port_priv->wq,
-					   &mad_agent_priv->timed_work, delay);
+			mod_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
+					 &mad_agent_priv->timed_work, delay);
 		}
 	}
 }
@@ -2052,11 +2050,9 @@ static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr)
 	list_add(&mad_send_wr->agent_list, list_item);
 
 	/* Reschedule a work item if we have a shorter timeout */
-	if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) {
-		__cancel_delayed_work(&mad_agent_priv->timed_work);
-		queue_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
-				   &mad_agent_priv->timed_work, delay);
-	}
+	if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list)
+		mod_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
+				 &mad_agent_priv->timed_work, delay);
 }
 
 void ib_reset_mad_timeout(struct ib_mad_send_wr_private *mad_send_wr,
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index d42c9f4..9e0895b 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -2679,11 +2679,9 @@ static void nes_process_mac_intr(struct nes_device *nesdev, u32 mac_number)
 			}
 		}
 		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_SFP_D) {
-			if (nesdev->link_recheck)
-				cancel_delayed_work(&nesdev->work);
 			nesdev->link_recheck = 1;
-			schedule_delayed_work(&nesdev->work,
-					      NES_LINK_RECHECK_DELAY);
+			mod_delayed_work(system_wq, &nesdev->work,
+					 NES_LINK_RECHECK_DELAY);
 		}
 	}
 
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index f3a3ecf..e43f6e4 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -243,10 +243,9 @@ static int nes_netdev_open(struct net_device *netdev)
 
 	spin_lock_irqsave(&nesdev->nesadapter->phy_lock, flags);
 	if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_SFP_D) {
-		if (nesdev->link_recheck)
-			cancel_delayed_work(&nesdev->work);
 		nesdev->link_recheck = 1;
-		schedule_delayed_work(&nesdev->work, NES_LINK_RECHECK_DELAY);
+		mod_delayed_work(system_wq, &nesdev->work,
+				 NES_LINK_RECHECK_DELAY);
 	}
 	spin_unlock_irqrestore(&nesdev->nesadapter->phy_lock, flags);
 
diff --git a/drivers/input/keyboard/qt2160.c b/drivers/input/keyboard/qt2160.c
index e7a5e36..76b7d43 100644
--- a/drivers/input/keyboard/qt2160.c
+++ b/drivers/input/keyboard/qt2160.c
@@ -156,8 +156,7 @@ static irqreturn_t qt2160_irq(int irq, void *_qt2160)
 
 	spin_lock_irqsave(&qt2160->lock, flags);
 
-	__cancel_delayed_work(&qt2160->dwork);
-	schedule_delayed_work(&qt2160->dwork, 0);
+	mod_delayed_work(system_wq, &qt2160->dwork, 0);
 
 	spin_unlock_irqrestore(&qt2160->lock, flags);
 
diff --git a/drivers/input/mouse/synaptics_i2c.c b/drivers/input/mouse/synaptics_i2c.c
index f1467570..063a174 100644
--- a/drivers/input/mouse/synaptics_i2c.c
+++ b/drivers/input/mouse/synaptics_i2c.c
@@ -376,12 +376,7 @@ static void synaptics_i2c_reschedule_work(struct synaptics_i2c *touch,
 
 	spin_lock_irqsave(&touch->lock, flags);
 
-	/*
-	 * If work is already scheduled then subsequent schedules will not
-	 * change the scheduled time that's why we have to cancel it first.
-	 */
-	__cancel_delayed_work(&touch->dwork);
-	schedule_delayed_work(&touch->dwork, delay);
+	mod_delayed_work(system_wq, &touch->dwork, delay);
 
 	spin_unlock_irqrestore(&touch->lock, flags);
 }
diff --git a/drivers/net/wireless/ipw2x00/ipw2100.c b/drivers/net/wireless/ipw2x00/ipw2100.c
index 95aa8e1..8a34202 100644
--- a/drivers/net/wireless/ipw2x00/ipw2100.c
+++ b/drivers/net/wireless/ipw2x00/ipw2100.c
@@ -2180,8 +2180,7 @@ static void isr_indicate_rf_kill(struct ipw2100_priv *priv, u32 status)
 
 	/* Make sure the RF Kill check timer is running */
 	priv->stop_rf_kill = 0;
-	cancel_delayed_work(&priv->rf_kill);
-	schedule_delayed_work(&priv->rf_kill, round_jiffies_relative(HZ));
+	mod_delayed_work(system_wq, &priv->rf_kill, round_jiffies_relative(HZ));
 }
 
 static void send_scan_event(void *data)
@@ -4321,9 +4320,8 @@ static int ipw_radio_kill_sw(struct ipw2100_priv *priv, int disable_radio)
 					  "disabled by HW switch\n");
 			/* Make sure the RF_KILL check timer is running */
 			priv->stop_rf_kill = 0;
-			cancel_delayed_work(&priv->rf_kill);
-			schedule_delayed_work(&priv->rf_kill,
-					      round_jiffies_relative(HZ));
+			mod_delayed_work(system_wq, &priv->rf_kill,
+					 round_jiffies_relative(HZ));
 		} else
 			schedule_reset(priv);
 	}
diff --git a/drivers/net/wireless/zd1211rw/zd_usb.c b/drivers/net/wireless/zd1211rw/zd_usb.c
index af83c43..ef2b171 100644
--- a/drivers/net/wireless/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zd1211rw/zd_usb.c
@@ -1164,8 +1164,7 @@ void zd_usb_reset_rx_idle_timer(struct zd_usb *usb)
 {
 	struct zd_usb_rx *rx = &usb->rx;
 
-	cancel_delayed_work(&rx->idle_work);
-	queue_delayed_work(zd_workqueue, &rx->idle_work, ZD_RX_IDLE_INTERVAL);
+	mod_delayed_work(zd_workqueue, &rx->idle_work, ZD_RX_IDLE_INTERVAL);
 }
 
 static inline void init_usb_interrupt(struct zd_usb *usb)
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index d5fd4a1..00d32a7 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -7683,25 +7683,15 @@ static int fan_set_speed(int speed)
 
 static void fan_watchdog_reset(void)
 {
-	static int fan_watchdog_active;
-
 	if (fan_control_access_mode == TPACPI_FAN_WR_NONE)
 		return;
 
-	if (fan_watchdog_active)
-		cancel_delayed_work(&fan_watchdog_task);
-
 	if (fan_watchdog_maxinterval > 0 &&
-	    tpacpi_lifecycle != TPACPI_LIFE_EXITING) {
-		fan_watchdog_active = 1;
-		if (!queue_delayed_work(tpacpi_wq, &fan_watchdog_task,
-				msecs_to_jiffies(fan_watchdog_maxinterval
-						 * 1000))) {
-			pr_err("failed to queue the fan watchdog, "
-			       "watchdog will not trigger\n");
-		}
-	} else
-		fan_watchdog_active = 0;
+	    tpacpi_lifecycle != TPACPI_LIFE_EXITING)
+		mod_delayed_work(tpacpi_wq, &fan_watchdog_task,
+			msecs_to_jiffies(fan_watchdog_maxinterval * 1000));
+	else
+		cancel_delayed_work(&fan_watchdog_task);
 }
 
 static void fan_watchdog_fire(struct work_struct *ignored)
diff --git a/drivers/power/charger-manager.c b/drivers/power/charger-manager.c
index 86935ec..0db0444 100644
--- a/drivers/power/charger-manager.c
+++ b/drivers/power/charger-manager.c
@@ -512,9 +512,8 @@ static void _setup_polling(struct work_struct *work)
 	if (!delayed_work_pending(&cm_monitor_work) ||
 	    (delayed_work_pending(&cm_monitor_work) &&
 	     time_after(next_polling, _next_polling))) {
-		cancel_delayed_work_sync(&cm_monitor_work);
 		next_polling = jiffies + polling_jiffy;
-		queue_delayed_work(cm_wq, &cm_monitor_work, polling_jiffy);
+		mod_delayed_work(cm_wq, &cm_monitor_work, polling_jiffy);
 	}
 
 out:
@@ -549,10 +548,8 @@ static void fullbatt_handler(struct charger_manager *cm)
 	if (cm_suspended)
 		device_set_wakeup_capable(cm->dev, true);
 
-	if (delayed_work_pending(&cm->fullbatt_vchk_work))
-		cancel_delayed_work(&cm->fullbatt_vchk_work);
-	queue_delayed_work(cm_wq, &cm->fullbatt_vchk_work,
-			   msecs_to_jiffies(desc->fullbatt_vchkdrop_ms));
+	mod_delayed_work(cm_wq, &cm->fullbatt_vchk_work,
+			 msecs_to_jiffies(desc->fullbatt_vchkdrop_ms));
 	cm->fullbatt_vchk_jiffies_at = jiffies + msecs_to_jiffies(
 				       desc->fullbatt_vchkdrop_ms);
 
diff --git a/drivers/power/ds2760_battery.c b/drivers/power/ds2760_battery.c
index 076e211..704e652 100644
--- a/drivers/power/ds2760_battery.c
+++ b/drivers/power/ds2760_battery.c
@@ -355,8 +355,7 @@ static void ds2760_battery_external_power_changed(struct power_supply *psy)
 
 	dev_dbg(di->dev, "%s\n", __func__);
 
-	cancel_delayed_work(&di->monitor_work);
-	queue_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ/10);
+	mod_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ/10);
 }
 
 
@@ -401,8 +400,7 @@ static void ds2760_battery_set_charged(struct power_supply *psy)
 
 	/* postpone the actual work by 20 secs. This is for debouncing GPIO
 	 * signals and to let the current value settle. See AN4188. */
-	cancel_delayed_work(&di->set_charged_work);
-	queue_delayed_work(di->monitor_wqueue, &di->set_charged_work, HZ * 20);
+	mod_delayed_work(di->monitor_wqueue, &di->set_charged_work, HZ * 20);
 }
 
 static int ds2760_battery_get_property(struct power_supply *psy,
@@ -616,8 +614,7 @@ static int ds2760_battery_resume(struct platform_device *pdev)
 	di->charge_status = POWER_SUPPLY_STATUS_UNKNOWN;
 	power_supply_changed(&di->bat);
 
-	cancel_delayed_work(&di->monitor_work);
-	queue_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ);
+	mod_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ);
 
 	return 0;
 }
diff --git a/drivers/power/jz4740-battery.c b/drivers/power/jz4740-battery.c
index 8dbc7bf..ffbed5e 100644
--- a/drivers/power/jz4740-battery.c
+++ b/drivers/power/jz4740-battery.c
@@ -173,16 +173,14 @@ static void jz_battery_external_power_changed(struct power_supply *psy)
 {
 	struct jz_battery *jz_battery = psy_to_jz_battery(psy);
 
-	cancel_delayed_work(&jz_battery->work);
-	schedule_delayed_work(&jz_battery->work, 0);
+	mod_delayed_work(system_wq, &jz_battery->work, 0);
 }
 
 static irqreturn_t jz_battery_charge_irq(int irq, void *data)
 {
 	struct jz_battery *jz_battery = data;
 
-	cancel_delayed_work(&jz_battery->work);
-	schedule_delayed_work(&jz_battery->work, 0);
+	mod_delayed_work(system_wq, &jz_battery->work, 0);
 
 	return IRQ_HANDLED;
 }
diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
index 2d7a9fe..a937838 100644
--- a/drivers/thermal/thermal_sys.c
+++ b/drivers/thermal/thermal_sys.c
@@ -694,17 +694,14 @@ thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
 static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
 					    int delay)
 {
-	cancel_delayed_work(&(tz->poll_queue));
-
-	if (!delay)
-		return;
-
 	if (delay > 1000)
-		queue_delayed_work(system_freezable_wq, &(tz->poll_queue),
-				      round_jiffies(msecs_to_jiffies(delay)));
+		mod_delayed_work(system_freezable_wq, &tz->poll_queue,
+				 round_jiffies(msecs_to_jiffies(delay)));
+	else if (delay)
+		mod_delayed_work(system_freezable_wq, &tz->poll_queue,
+				 msecs_to_jiffies(delay));
 	else
-		queue_delayed_work(system_freezable_wq, &(tz->poll_queue),
-				      msecs_to_jiffies(delay));
+		cancel_delayed_work(&tz->poll_queue);
 }
 
 static void thermal_zone_device_passive(struct thermal_zone_device *tz,
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 587ef51..7ef637d 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -351,9 +351,7 @@ void afs_dispatch_give_up_callbacks(struct work_struct *work)
  */
 void afs_flush_callback_breaks(struct afs_server *server)
 {
-	cancel_delayed_work(&server->cb_break_work);
-	queue_delayed_work(afs_callback_update_worker,
-			   &server->cb_break_work, 0);
+	mod_delayed_work(afs_callback_update_worker, &server->cb_break_work, 0);
 }
 
 #if 0
diff --git a/fs/afs/server.c b/fs/afs/server.c
index d59b751..f342acf 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -285,12 +285,7 @@ static void afs_reap_server(struct work_struct *work)
 		expiry = server->time_of_death + afs_server_timeout;
 		if (expiry > now) {
 			delay = (expiry - now) * HZ;
-			if (!queue_delayed_work(afs_wq, &afs_server_reaper,
-						delay)) {
-				cancel_delayed_work(&afs_server_reaper);
-				queue_delayed_work(afs_wq, &afs_server_reaper,
-						   delay);
-			}
+			mod_delayed_work(afs_wq, &afs_server_reaper, delay);
 			break;
 		}
 
@@ -323,6 +318,5 @@ static void afs_reap_server(struct work_struct *work)
 void __exit afs_purge_servers(void)
 {
 	afs_server_timeout = 0;
-	cancel_delayed_work(&afs_server_reaper);
-	queue_delayed_work(afs_wq, &afs_server_reaper, 0);
+	mod_delayed_work(afs_wq, &afs_server_reaper, 0);
 }
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 431984d..57bcb15 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -561,12 +561,7 @@ static void afs_vlocation_reaper(struct work_struct *work)
 		if (expiry > now) {
 			delay = (expiry - now) * HZ;
 			_debug("delay %lu", delay);
-			if (!queue_delayed_work(afs_wq, &afs_vlocation_reap,
-						delay)) {
-				cancel_delayed_work(&afs_vlocation_reap);
-				queue_delayed_work(afs_wq, &afs_vlocation_reap,
-						   delay);
-			}
+			mod_delayed_work(afs_wq, &afs_vlocation_reap, delay);
 			break;
 		}
 
@@ -614,13 +609,10 @@ void afs_vlocation_purge(void)
 	spin_lock(&afs_vlocation_updates_lock);
 	list_del_init(&afs_vlocation_updates);
 	spin_unlock(&afs_vlocation_updates_lock);
-	cancel_delayed_work(&afs_vlocation_update);
-	queue_delayed_work(afs_vlocation_update_worker,
-			   &afs_vlocation_update, 0);
+	mod_delayed_work(afs_vlocation_update_worker, &afs_vlocation_update, 0);
 	destroy_workqueue(afs_vlocation_update_worker);
 
-	cancel_delayed_work(&afs_vlocation_reap);
-	queue_delayed_work(afs_wq, &afs_vlocation_reap, 0);
+	mod_delayed_work(afs_wq, &afs_vlocation_reap, 0);
 }
 
 /*
diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index 6930bec..1720d32 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -117,8 +117,7 @@ nfs4_schedule_state_renewal(struct nfs_client *clp)
 		timeout = 5 * HZ;
 	dprintk("%s: requeueing work. Lease period = %ld\n",
 			__func__, (timeout + HZ - 1) / HZ);
-	cancel_delayed_work(&clp->cl_renewd);
-	schedule_delayed_work(&clp->cl_renewd, timeout);
+	mod_delayed_work(system_wq, &clp->cl_renewd, timeout);
 	set_bit(NFS_CS_RENEWD, &clp->cl_res_state);
 	spin_unlock(&clp->cl_lock);
 }
diff --git a/net/core/dst.c b/net/core/dst.c
index 069d51d..ed5a0b4 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -214,8 +214,8 @@ void __dst_free(struct dst_entry *dst)
 	if (dst_garbage.timer_inc > DST_GC_INC) {
 		dst_garbage.timer_inc = DST_GC_INC;
 		dst_garbage.timer_expires = DST_GC_MIN;
-		cancel_delayed_work(&dst_gc_work);
-		schedule_delayed_work(&dst_gc_work, dst_garbage.timer_expires);
+		mod_delayed_work(system_wq, &dst_gc_work,
+				 dst_garbage.timer_expires);
 	}
 	spin_unlock_bh(&dst_garbage.lock);
 }
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index c3519c6..8e397a6 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -120,22 +120,13 @@ static void linkwatch_schedule_work(int urgent)
 		delay = 0;
 
 	/*
-	 * This is true if we've scheduled it immeditately or if we don't
-	 * need an immediate execution and it's already pending.
+	 * If urgent, schedule immediate execution; otherwise, don't
+	 * override the existing timer.
 	 */
-	if (schedule_delayed_work(&linkwatch_work, delay) == !delay)
-		return;
-
-	/* Don't bother if there is nothing urgent. */
-	if (!test_bit(LW_URGENT, &linkwatch_flags))
-		return;
-
-	/* It's already running which is good enough. */
-	if (!__cancel_delayed_work(&linkwatch_work))
-		return;
-
-	/* Otherwise we reschedule it again for immediate execution. */
-	schedule_delayed_work(&linkwatch_work, 0);
+	if (test_bit(LW_URGENT, &linkwatch_flags))
+		mod_delayed_work(system_wq, &linkwatch_work, 0);
+	else
+		schedule_delayed_work(&linkwatch_work, delay);
 }
 
 
diff --git a/net/rfkill/input.c b/net/rfkill/input.c
index 24c55c5..c9d931e 100644
--- a/net/rfkill/input.c
+++ b/net/rfkill/input.c
@@ -164,8 +164,7 @@ static void rfkill_schedule_global_op(enum rfkill_sched_op op)
 	rfkill_op_pending = true;
 	if (op == RFKILL_GLOBAL_OP_EPO && !rfkill_is_epo_lock_active()) {
 		/* bypass the limiter for EPO */
-		cancel_delayed_work(&rfkill_op_work);
-		schedule_delayed_work(&rfkill_op_work, 0);
+		mod_delayed_work(system_wq, &rfkill_op_work, 0);
 		rfkill_last_scheduled = jiffies;
 	} else
 		rfkill_schedule_ratelimited();
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/15] workqueue: deprecate __cancel_delayed_work()
  2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
                   ` (13 preceding siblings ...)
  2012-07-27 23:55 ` [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue Tejun Heo
@ 2012-07-27 23:55 ` Tejun Heo
  2012-07-31 13:05   ` Tomi Valkeinen
  2012-07-31 17:51   ` Tejun Heo
  14 siblings, 2 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-27 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Tejun Heo, Jens Axboe,
	Jiri Kosina, Roland Dreier, Tomi Valkeinen

__cancel_delayed_work() is different from cancel_delayed_work() in
that it uses del_timer() instead of del_timer_sync().  This adds
confusion to already complicated flush / cancel API and given that the
only thing delayed_work->timer does is queueing the work, the
difference between cancel_delayed_work() and __cancel_delayed_work()
isn't anything material.

Furthermore, none of the remaining users are on hot path racing
against high-frequency work item making the chance of actually waiting
for delayed_work_timer_fn() very slim.

Use cancel_delayed_work() instead of __cancel_delayed_work() and mark
the latter deprecated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Roland Dreier <roland@kernel.org>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
---
 block/blk-core.c              |    2 +-
 drivers/block/floppy.c        |    2 +-
 drivers/infiniband/core/mad.c |    2 +-
 drivers/video/omap2/dss/dsi.c |    2 +-
 include/linux/workqueue.h     |    2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 43f6c17..7befb9c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -262,7 +262,7 @@ EXPORT_SYMBOL(blk_start_queue);
  **/
 void blk_stop_queue(struct request_queue *q)
 {
-	__cancel_delayed_work(&q->delay_work);
+	cancel_delayed_work(&q->delay_work);
 	queue_flag_set(QUEUE_FLAG_STOPPED, q);
 }
 EXPORT_SYMBOL(blk_stop_queue);
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index aebd1e8..0e1e364 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -889,7 +889,7 @@ static void unlock_fdc(void)
 
 	raw_cmd = NULL;
 	command_status = FD_COMMAND_NONE;
-	__cancel_delayed_work(&fd_timeout);
+	cancel_delayed_work(&fd_timeout);
 	do_floppy = NULL;
 	cont = NULL;
 	clear_bit(0, &fdc_busy);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index b593814..dc3fd1e 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2004,7 +2004,7 @@ static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv)
 	unsigned long delay;
 
 	if (list_empty(&mad_agent_priv->wait_list)) {
-		__cancel_delayed_work(&mad_agent_priv->timed_work);
+		cancel_delayed_work(&mad_agent_priv->timed_work);
 	} else {
 		mad_send_wr = list_entry(mad_agent_priv->wait_list.next,
 					 struct ib_mad_send_wr_private,
diff --git a/drivers/video/omap2/dss/dsi.c b/drivers/video/omap2/dss/dsi.c
index 14ce8cc..7869767 100644
--- a/drivers/video/omap2/dss/dsi.c
+++ b/drivers/video/omap2/dss/dsi.c
@@ -4307,7 +4307,7 @@ static void dsi_framedone_irq_callback(void *data, u32 mask)
 	 * and is sending the data.
 	 */
 
-	__cancel_delayed_work(&dsi->framedone_timeout_work);
+	cancel_delayed_work(&dsi->framedone_timeout_work);
 
 	dsi_handle_framedone(dsidev, 0);
 }
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 2000030..feed2bb 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -445,7 +445,7 @@ static inline bool cancel_delayed_work(struct delayed_work *work)
  * if it returns 0 the timer function may be running and the queueing is in
  * progress.
  */
-static inline bool __cancel_delayed_work(struct delayed_work *work)
+static inline bool __deprecated __cancel_delayed_work(struct delayed_work *work)
 {
 	bool ret;
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue
  2012-07-27 23:55 ` [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue Tejun Heo
@ 2012-07-28  1:05   ` Henrique de Moraes Holschuh
  2012-07-28  1:28   ` Dmitry Torokhov
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2012-07-28  1:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, torvalds, akpm, padovan, marcel, peterz, mingo,
	davem, dougthompson, ibm-acpi, cbou, rui.zhang, Jens Axboe,
	Jiri Kosina, David Airlie, Roland Dreier, Dmitry Torokhov,
	John W. Linville, Len Brown, David Howells, J. Bruce Fields,
	Johannes Berg

On Fri, 27 Jul 2012, Tejun Heo wrote:
> Convert delayed_work users doing [__]cancel_delayed_work() +
> queue_delayed_work() to mod_delayed_work().
> 
> Most conversions are straight-forward.  Ones worth mentioning are,
> 
> * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
>   use mod_delayed_work() and cancel loop in
>   edac_mc_reset_delay_period() is dropped.
> 
> * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
>   watchdog is active or not.  @fan_watchdog_active and related code
>   dropped.

For the thinkpad_acpi.c part:
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue
  2012-07-27 23:55 ` [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue Tejun Heo
  2012-07-28  1:05   ` Henrique de Moraes Holschuh
@ 2012-07-28  1:28   ` Dmitry Torokhov
  2012-07-29 20:55   ` Anton Vorontsov
  2012-07-31 17:55   ` [PATCH v2 " Tejun Heo
  3 siblings, 0 replies; 25+ messages in thread
From: Dmitry Torokhov @ 2012-07-28  1:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, torvalds, akpm, padovan, marcel, peterz, mingo,
	davem, dougthompson, ibm-acpi, cbou, rui.zhang, Jens Axboe,
	Jiri Kosina, David Airlie, Roland Dreier, John W. Linville,
	Len Brown, David Howells, J. Bruce Fields, Johannes Berg

On Fri, Jul 27, 2012 at 04:55:07PM -0700, Tejun Heo wrote:
> Convert delayed_work users doing [__]cancel_delayed_work() +
> queue_delayed_work() to mod_delayed_work().
> 
> Most conversions are straight-forward.  Ones worth mentioning are,
> 
> * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
>   use mod_delayed_work() and cancel loop in
>   edac_mc_reset_delay_period() is dropped.
> 
> * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
>   watchdog is active or not.  @fan_watchdog_active and related code
>   dropped.
> 
> * drivers/power/charger-manager.c: Seemingly a lot of
>   delayed_work_pending() abuse going on here.
>   [delayed_]work_pending() are unsynchronized and racy when used like
>   this.  I converted one instance in fullbatt_handler().  Please
>   conver the rest so that it invokes workqueue APIs for the intended
>   target state rather than trying to game work item pending state
>   transitions.  e.g. if timer should be modified - call
>   mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().
> 
> * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
>   simplified.  Note that round_jiffies() calls in this function are
>   meaningless.  round_jiffies() work on absolute jiffies not delta
>   delay used by delayed_work.
> 
> * net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
>   elaborate dancing around its delayed_work.  Collapse it such that
>   linkwatch_work is queued for immediate execution if LW_URGENT and
>   existing timer is kept otherwise.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Doug Thompson <dougthompson@xmission.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Roland Dreier <roland@kernel.org>
> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>

For the input parts:

Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue
  2012-07-27 23:55 ` [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue Tejun Heo
  2012-07-28  1:05   ` Henrique de Moraes Holschuh
  2012-07-28  1:28   ` Dmitry Torokhov
@ 2012-07-29 20:55   ` Anton Vorontsov
  2012-07-31 17:55   ` [PATCH v2 " Tejun Heo
  3 siblings, 0 replies; 25+ messages in thread
From: Anton Vorontsov @ 2012-07-29 20:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, torvalds, akpm, padovan, marcel, peterz, mingo,
	davem, dougthompson, ibm-acpi, rui.zhang, Jens Axboe,
	Jiri Kosina, David Airlie, Roland Dreier, Dmitry Torokhov,
	John W. Linville, Len Brown, David Howells, J. Bruce Fields,
	Johannes Berg

On Fri, Jul 27, 2012 at 04:55:07PM -0700, Tejun Heo wrote:
> Convert delayed_work users doing [__]cancel_delayed_work() +
> queue_delayed_work() to mod_delayed_work().
> 
> Most conversions are straight-forward.  Ones worth mentioning are,
> 
> * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
>   use mod_delayed_work() and cancel loop in
>   edac_mc_reset_delay_period() is dropped.
> 
> * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
>   watchdog is active or not.  @fan_watchdog_active and related code
>   dropped.
> 
> * drivers/power/charger-manager.c: Seemingly a lot of
>   delayed_work_pending() abuse going on here.
>   [delayed_]work_pending() are unsynchronized and racy when used like
>   this.  I converted one instance in fullbatt_handler().  Please
>   conver the rest so that it invokes workqueue APIs for the intended
>   target state rather than trying to game work item pending state
>   transitions.  e.g. if timer should be modified - call
>   mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().
> 
> * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
>   simplified.  Note that round_jiffies() calls in this function are
>   meaningless.  round_jiffies() work on absolute jiffies not delta
>   delay used by delayed_work.
> 
> * net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
>   elaborate dancing around its delayed_work.  Collapse it such that
>   linkwatch_work is queued for immediate execution if LW_URGENT and
>   existing timer is kept otherwise.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

drivers/power/ bits:

Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>

Thanks!

-- 
Anton Vorontsov
Email: cbouatmailru@gmail.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 04/15] workqueue: disable preemption while manipulating PENDING
  2012-07-27 23:54 ` [PATCH 04/15] workqueue: disable preemption while manipulating PENDING Tejun Heo
@ 2012-07-30 20:10   ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-30 20:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Oleg Nesterov

>From bdbc04720254d1a84504074a6b25189961030803 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 30 Jul 2012 13:03:57 -0700

Queueing operations use WORK_STRUCT_PENDING_BIT to synchronize access
to the target work item.  They first try to claim the bit and proceed
with queueing only after that succeeds and there's a window between
PENDING being set and the actual queueing where the task can be
preempted.

There's also a similar window in process_one_work() when clearing
PENDING.  A work item is dequeued, gcwq->lock is released and then
PENDING is cleared and the worker might get preempted between
releasing gcwq->lock and clearing PENDING.

cancel[_delayed]_work_sync() tries to claim or steal PENDING.  The
function assumes that a work item with PENDING is either queued, or in
the process of being queued.  In the latter case, it busy-loops until
either the work item loses PENDING or is queued.  If canceling
coincides with the above described preemptions, the canceling task
will busy-loop while the queueing or executing task is preempted.

This patch keeps preemption disabled across claiming PENDING and
actual queueing and moves PENDING clearing in process_one_work()
inside gcwq->lock so that busy looping from PENDING && !queued doesn't
wait for preempted tasks.  Note that, in process_one_work(), setting
last CPU and clearing PENDING got merged into single operation.

v2: __queue_work() was testing preempt_count() to ensure that the
    caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
---
git branch updated accordingly.

 kernel/workqueue.c |   55 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5c26d36..f6eaf34 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -537,9 +537,10 @@ static int work_next_color(int color)
  * work is on queue.  Once execution starts, WORK_STRUCT_CWQ is
  * cleared and the work data contains the cpu number it was last on.
  *
- * set_work_{cwq|cpu}() and clear_work_data() can be used to set the
- * cwq, cpu or clear work->data.  These functions should only be
- * called while the work is owned - ie. while the PENDING bit is set.
+ * set_work_cwq(), set_work_cpu_and_clear_pending() and clear_work_data()
+ * can be used to set the cwq, cpu or clear work->data.  These functions
+ * should only be called while the work is owned - ie. while the PENDING
+ * bit is set.
  *
  * get_work_[g]cwq() can be used to obtain the gcwq or cwq
  * corresponding to a work.  gcwq is available once the work has been
@@ -561,9 +562,10 @@ static void set_work_cwq(struct work_struct *work,
 		      WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags);
 }
 
-static void set_work_cpu(struct work_struct *work, unsigned int cpu)
+static void set_work_cpu_and_clear_pending(struct work_struct *work,
+					   unsigned int cpu)
 {
-	set_work_data(work, cpu << WORK_STRUCT_FLAG_BITS, WORK_STRUCT_PENDING);
+	set_work_data(work, cpu << WORK_STRUCT_FLAG_BITS, 0);
 }
 
 static void clear_work_data(struct work_struct *work)
@@ -983,6 +985,15 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	unsigned int work_flags;
 	unsigned long flags;
 
+	/*
+	 * While a work item is PENDING && off queue, a task trying to
+	 * steal the PENDING will busy-loop waiting for it to either get
+	 * queued or lose PENDING, so a task shouldn't be preempted between
+	 * grabbing PENDING and queueing @work.  Make sure the caller has
+	 * preemption disabled.
+	 */
+	WARN_ON_ONCE(preemptible());
+
 	debug_work_activate(work);
 
 	/* if dying, only works from the same workqueue are allowed */
@@ -1068,10 +1079,14 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq,
 {
 	bool ret = false;
 
+	preempt_disable();
+
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		__queue_work(cpu, wq, work);
 		ret = true;
 	}
+
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(queue_work_on);
@@ -1121,6 +1136,12 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	struct work_struct *work = &dwork->work;
 	bool ret = false;
 
+	/*
+	 * We shouldn't get preempted between claiming PENDING and adding
+	 * timer.  Read the comment in __queue_work() for details.
+	 */
+	preempt_disable();
+
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		unsigned int lcpu;
 
@@ -1156,6 +1177,8 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			add_timer(timer);
 		ret = true;
 	}
+
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(queue_delayed_work_on);
@@ -1970,15 +1993,13 @@ __acquires(&gcwq->lock)
 		return;
 	}
 
-	/* claim and process */
+	/* claim and dequeue */
 	debug_work_deactivate(work);
 	hlist_add_head(&worker->hentry, bwh);
 	worker->current_work = work;
 	worker->current_cwq = cwq;
 	work_color = get_work_color(work);
 
-	/* record the current cpu number in the work data and dequeue */
-	set_work_cpu(work, gcwq->cpu);
 	list_del_init(&work->entry);
 
 	/*
@@ -1995,10 +2016,18 @@ __acquires(&gcwq->lock)
 	if ((worker->flags & WORKER_UNBOUND) && need_more_worker(pool))
 		wake_up_worker(pool);
 
-	spin_unlock_irq(&gcwq->lock);
+	/*
+	 * Record the last CPU and clear PENDING.  The following wmb is
+	 * paired with the implied mb in test_and_set_bit(PENDING) and
+	 * ensures all updates to @work made here are visible to and
+	 * precede any updates by the next PENDING owner.  Also, clearing
+	 * PENDING is inside @gcwq->lock so that we don't get preempted
+	 * with PENDING set and @work off queue.
+	 */
+	smp_wmb();
+	set_work_cpu_and_clear_pending(work, gcwq->cpu);
 
-	smp_wmb();	/* paired with test_and_set_bit(PENDING) */
-	work_clear_pending(work);
+	spin_unlock_irq(&gcwq->lock);
 
 	lock_map_acquire_read(&cwq->wq->lockdep_map);
 	lock_map_acquire(&lockdep_map);
@@ -2836,9 +2865,11 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
  */
 bool flush_delayed_work(struct delayed_work *dwork)
 {
+	preempt_disable();
 	if (del_timer_sync(&dwork->timer))
 		__queue_work(raw_smp_processor_id(),
 			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+	preempt_enable();
 	return flush_work(&dwork->work);
 }
 EXPORT_SYMBOL(flush_delayed_work);
@@ -2857,9 +2888,11 @@ EXPORT_SYMBOL(flush_delayed_work);
  */
 bool flush_delayed_work_sync(struct delayed_work *dwork)
 {
+	preempt_disable();
 	if (del_timer_sync(&dwork->timer))
 		__queue_work(raw_smp_processor_id(),
 			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+	preempt_enable();
 	return flush_work_sync(&dwork->work);
 }
 EXPORT_SYMBOL(flush_delayed_work_sync);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/15] workqueue: mark a work item being canceled as such
  2012-07-27 23:55 ` [PATCH 12/15] workqueue: mark a work item being canceled as such Tejun Heo
@ 2012-07-30 20:11   ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-30 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang

>From 7199d2757a11ab19bf28713490b898a5a7b24d65 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 30 Jul 2012 13:03:59 -0700

There can be two reasons try_to_grab_pending() can fail with -EAGAIN.
One is when someone else is queueing or deqeueing the work item.  With
the previous patches, it is guaranteed that PENDING and queued state
will soon agree making it safe to busy-retry in this case.

The other is if multiple __cancel_work_timer() invocations are racing
one another.  __cancel_work_timer() grabs PENDING and then waits for
running instances of the target work item on all CPUs while holding
PENDING and !queued.  try_to_grab_pending() invoked from another task
will keep returning -EAGAIN while the current owner is waiting.

Not distinguishing the two cases is okay because __cancel_work_timer()
is the only user of try_to_grab_pending() and it invokes
wait_on_work() whenever grabbing fails.  For the first case, busy
looping should be fine but wait_on_work() doesn't cause any critical
problem.  For the latter case, the new contender usually waits for the
same condition as the current owner, so no unnecessarily extended
busy-looping happens.  Combined, these make __cancel_work_timer()
technically correct even without preemption protection while grabbing
PENDING or distinguishing the two different cases.

While the current code is technically correct, not distinguishing the
two cases makes it difficult to use try_to_grab_pending() for other
purposes than canceling because it's impossible to tell whether it's
safe to busy-retry grabbing.

This patch adds a mechanism to mark a work item being canceled.
try_to_grab_pending() now requires the caller to have preemption
disabled and returns -EAGAIN to indicate that grabbing failed but
PENDING and queued states are gonna agree soon and it's safe to
busy-loop.  It returns -ENOENT if the work item is being canceled and
it may stay PENDING && !queued for arbitrary amount of time.

__cancel_work_timer() is modified to mark the work canceling with
WORK_OFFQ_CANCELING after grabbing PENDING, thus making
try_to_grab_pending() fail with -ENOENT instead of -EAGAIN.  Also, it
invokes wait_on_work() iff grabbing failed with -ENOENT.  This isn't
necessary for correctness but makes it consistent with other future
users of try_to_grab_pending().

v2: try_to_grab_pending() was testing preempt_count() to ensure that
    the caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
---
Spurious WARN bug fixed.  git branch updated accordingly.

Thanks.

 include/linux/workqueue.h |    5 +++-
 kernel/workqueue.c        |   67 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index f562674..5f4aeaa 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -70,7 +70,10 @@ enum {
 
 	/* data contains off-queue information when !WORK_STRUCT_CWQ */
 	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_FLAG_BITS,
-	WORK_OFFQ_FLAG_BITS	= 0,
+
+	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),
+
+	WORK_OFFQ_FLAG_BITS	= 1,
 	WORK_OFFQ_CPU_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
 
 	/* convenience constants */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 95260c4..9f9ab20 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -537,15 +537,20 @@ static int work_next_color(int color)
  * contain the pointer to the queued cwq.  Once execution starts, the flag
  * is cleared and the high bits contain OFFQ flags and CPU number.
  *
- * set_work_cwq(), set_work_cpu_and_clear_pending() and clear_work_data()
- * can be used to set the cwq, cpu or clear work->data.  These functions
- * should only be called while the work is owned - ie. while the PENDING
- * bit is set.
+ * set_work_cwq(), set_work_cpu_and_clear_pending(), mark_work_canceling()
+ * and clear_work_data() can be used to set the cwq, cpu or clear
+ * work->data.  These functions should only be called while the work is
+ * owned - ie. while the PENDING bit is set.
  *
- * get_work_[g]cwq() can be used to obtain the gcwq or cwq
- * corresponding to a work.  gcwq is available once the work has been
- * queued anywhere after initialization.  cwq is available only from
- * queueing until execution starts.
+ * get_work_[g]cwq() can be used to obtain the gcwq or cwq corresponding to
+ * a work.  gcwq is available once the work has been queued anywhere after
+ * initialization until it is sync canceled.  cwq is available only while
+ * the work item is queued.
+ *
+ * %WORK_OFFQ_CANCELING is used to mark a work item which is being
+ * canceled.  While being canceled, a work item may have its PENDING set
+ * but stay off timer and worklist for arbitrarily long and nobody should
+ * try to steal the PENDING bit.
  */
 static inline void set_work_data(struct work_struct *work, unsigned long data,
 				 unsigned long flags)
@@ -600,6 +605,22 @@ static struct global_cwq *get_work_gcwq(struct work_struct *work)
 	return get_gcwq(cpu);
 }
 
+static void mark_work_canceling(struct work_struct *work)
+{
+	struct global_cwq *gcwq = get_work_gcwq(work);
+	unsigned long cpu = gcwq ? gcwq->cpu : WORK_CPU_NONE;
+
+	set_work_data(work, (cpu << WORK_OFFQ_CPU_SHIFT) | WORK_OFFQ_CANCELING,
+		      WORK_STRUCT_PENDING);
+}
+
+static bool work_is_canceling(struct work_struct *work)
+{
+	unsigned long data = atomic_long_read(&work->data);
+
+	return !(data & WORK_STRUCT_CWQ) && (data & WORK_OFFQ_CANCELING);
+}
+
 /*
  * Policy functions.  These define the policies on how the global worker
  * pools are managed.  Unless noted otherwise, these functions assume that
@@ -1014,14 +1035,21 @@ static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color,
  *  1		if @work was pending and we successfully stole PENDING
  *  0		if @work was idle and we claimed PENDING
  *  -EAGAIN	if PENDING couldn't be grabbed at the moment, safe to busy-retry
+ *  -ENOENT	if someone else is canceling @work, this state may persist
+ *		for arbitrarily long
  *
- * On >= 0 return, the caller owns @work's PENDING bit.
+ * On >= 0 return, the caller owns @work's PENDING bit.  To avoid getting
+ * preempted while holding PENDING and @work off queue, preemption must be
+ * disabled on entry.  This ensures that we don't return -EAGAIN while
+ * another task is preempted in this function.
  */
 static int try_to_grab_pending(struct work_struct *work,
 			       struct timer_list *timer)
 {
 	struct global_cwq *gcwq;
 
+	WARN_ON_ONCE(preemptible());
+
 	/* try to steal the timer if it exists */
 	if (timer && likely(del_timer(timer)))
 		return 1;
@@ -1059,6 +1087,10 @@ static int try_to_grab_pending(struct work_struct *work,
 	}
 	spin_unlock_irq(&gcwq->lock);
 
+	if (unlikely(work_is_canceling(work)))
+		return -ENOENT;
+
+	cpu_relax();
 	return -EAGAIN;
 }
 
@@ -2838,11 +2870,26 @@ static bool __cancel_work_timer(struct work_struct *work,
 {
 	int ret;
 
+	preempt_disable();
+
 	do {
 		ret = try_to_grab_pending(work, timer);
-		wait_on_work(work);
+		/*
+		 * If someone else is canceling, wait for the same event it
+		 * would be waiting for before retrying.
+		 */
+		if (unlikely(ret == -ENOENT)) {
+			preempt_enable();
+			wait_on_work(work);
+			preempt_disable();
+		}
 	} while (unlikely(ret < 0));
 
+	/* tell other tasks trying to grab @work to back off */
+	mark_work_canceling(work);
+	preempt_enable();
+
+	wait_on_work(work);
 	clear_work_data(work);
 	return ret;
 }
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 15/15] workqueue: deprecate __cancel_delayed_work()
  2012-07-27 23:55 ` [PATCH 15/15] workqueue: deprecate __cancel_delayed_work() Tejun Heo
@ 2012-07-31 13:05   ` Tomi Valkeinen
  2012-07-31 17:11     ` Tejun Heo
  2012-07-31 17:51   ` Tejun Heo
  1 sibling, 1 reply; 25+ messages in thread
From: Tomi Valkeinen @ 2012-07-31 13:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, torvalds, akpm, padovan, marcel, peterz, mingo,
	davem, dougthompson, ibm-acpi, cbou, rui.zhang, Jens Axboe,
	Jiri Kosina, Roland Dreier

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

Hi,

On Fri, 2012-07-27 at 16:55 -0700, Tejun Heo wrote:
> __cancel_delayed_work() is different from cancel_delayed_work() in
> that it uses del_timer() instead of del_timer_sync().  This adds
> confusion to already complicated flush / cancel API and given that the
> only thing delayed_work->timer does is queueing the work, the
> difference between cancel_delayed_work() and __cancel_delayed_work()
> isn't anything material.
> 
> Furthermore, none of the remaining users are on hot path racing
> against high-frequency work item making the chance of actually waiting
> for delayed_work_timer_fn() very slim.
> 
> Use cancel_delayed_work() instead of __cancel_delayed_work() and mark
> the latter deprecated.

I used __cancel_delayed_work() in drivers/video/omap2/dss/dsi.c as the
cancel is done in an interrupt handler. Is it safe to use
cancel_delayed_work() in atomic context? I presume not, as it uses
del_timer_sync().

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 15/15] workqueue: deprecate __cancel_delayed_work()
  2012-07-31 13:05   ` Tomi Valkeinen
@ 2012-07-31 17:11     ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-31 17:11 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: linux-kernel, torvalds, akpm, padovan, marcel, peterz, mingo,
	davem, dougthompson, ibm-acpi, cbou, rui.zhang, Jens Axboe,
	Jiri Kosina, Roland Dreier

On Tue, Jul 31, 2012 at 04:05:39PM +0300, Tomi Valkeinen wrote:
> > Use cancel_delayed_work() instead of __cancel_delayed_work() and mark
> > the latter deprecated.
> 
> I used __cancel_delayed_work() in drivers/video/omap2/dss/dsi.c as the
> cancel is done in an interrupt handler. Is it safe to use
> cancel_delayed_work() in atomic context? I presume not, as it uses
> del_timer_sync().

Ah, you're right.  __cancel_delayed_work() invoked from irq context
can't be converted to cancel_delayed_work() or mod_delayed_work().
I'll skip those from the previous patch and drop this patch.  I really
hope this could be solved somehow tho.  :(

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 15/15] workqueue: deprecate __cancel_delayed_work()
  2012-07-27 23:55 ` [PATCH 15/15] workqueue: deprecate __cancel_delayed_work() Tejun Heo
  2012-07-31 13:05   ` Tomi Valkeinen
@ 2012-07-31 17:51   ` Tejun Heo
  1 sibling, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-31 17:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Jens Axboe, Jiri Kosina,
	Roland Dreier, Tomi Valkeinen

This patch is dropped for now as cancel_delayed_work() cannot be
called from irq context as pointed out by Tomi Valkeinen.  git branch
accordingly updated.  I'm working on another patchset to remove the
restriction.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 14/15] workqueue: use mod_delayed_work() instead of cancel + queue
  2012-07-27 23:55 ` [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue Tejun Heo
                     ` (2 preceding siblings ...)
  2012-07-29 20:55   ` Anton Vorontsov
@ 2012-07-31 17:55   ` Tejun Heo
  3 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-07-31 17:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, Jens Axboe, Jiri Kosina,
	David Airlie, Roland Dreier, Dmitry Torokhov, John W. Linville,
	Len Brown, David Howells, J. Bruce Fields, Johannes Berg

>From 18fe17017c7de052f116ef771dc8c4088f0d1ebb Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 31 Jul 2012 10:53:21 -0700

Convert delayed_work users doing cancel_delayed_work() followed by
queue_delayed_work() to mod_delayed_work().

Most conversions are straight-forward.  Ones worth mentioning are,

* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
  use mod_delayed_work() and cancel loop in
  edac_mc_reset_delay_period() is dropped.

* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
  watchdog is active or not.  @fan_watchdog_active and related code
  dropped.

* drivers/power/charger-manager.c: Seemingly a lot of
  delayed_work_pending() abuse going on here.
  [delayed_]work_pending() are unsynchronized and racy when used like
  this.  I converted one instance in fullbatt_handler().  Please
  conver the rest so that it invokes workqueue APIs for the intended
  target state rather than trying to game work item pending state
  transitions.  e.g. if timer should be modified - call
  mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
  simplified.  Note that round_jiffies() calls in this function are
  meaningless.  round_jiffies() work on absolute jiffies not delta
  delay used by delayed_work.

v2: Tomi pointed out that __cancel_delayed_work() users can't be
    safely converted to mod_delayed_work().  They could be calling it
    from irq context and if that happens while delayed_work_timer_fn()
    is running, it could deadlock.  __cancel_delayed_work() users are
    dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
---
 block/genhd.c                          |    6 ++----
 drivers/edac/edac_mc.c                 |   17 +----------------
 drivers/infiniband/core/addr.c         |    4 +---
 drivers/infiniband/hw/nes/nes_hw.c     |    6 ++----
 drivers/infiniband/hw/nes/nes_nic.c    |    5 ++---
 drivers/net/wireless/ipw2x00/ipw2100.c |    8 +++-----
 drivers/net/wireless/zd1211rw/zd_usb.c |    3 +--
 drivers/platform/x86/thinkpad_acpi.c   |   20 +++++---------------
 drivers/power/charger-manager.c        |    9 +++------
 drivers/power/ds2760_battery.c         |    9 +++------
 drivers/power/jz4740-battery.c         |    6 ++----
 drivers/thermal/thermal_sys.c          |   15 ++++++---------
 fs/afs/callback.c                      |    4 +---
 fs/afs/server.c                        |   10 ++--------
 fs/afs/vlocation.c                     |   14 +++-----------
 fs/nfs/nfs4renewd.c                    |    3 +--
 net/core/dst.c                         |    4 ++--
 net/rfkill/input.c                     |    3 +--
 18 files changed, 41 insertions(+), 105 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 9cf5583..ba16f2f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1524,10 +1524,8 @@ void disk_flush_events(struct gendisk *disk, unsigned int mask)
 
 	spin_lock_irq(&ev->lock);
 	ev->clearing |= mask;
-	if (!ev->block) {
-		cancel_delayed_work(&ev->dwork);
-		queue_delayed_work(system_nrt_freezable_wq, &ev->dwork, 0);
-	}
+	if (!ev->block)
+		mod_delayed_work(system_nrt_freezable_wq, &ev->dwork, 0);
 	spin_unlock_irq(&ev->lock);
 }
 
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index de5ba86e..9dbde5f 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -492,7 +492,7 @@ static void edac_mc_workq_setup(struct mem_ctl_info *mci, unsigned msec)
 		return;
 
 	INIT_DELAYED_WORK(&mci->work, edac_mc_workq_function);
-	queue_delayed_work(edac_workqueue, &mci->work, msecs_to_jiffies(msec));
+	mod_delayed_work(edac_workqueue, &mci->work, msecs_to_jiffies(msec));
 }
 
 /*
@@ -533,21 +533,6 @@ void edac_mc_reset_delay_period(int value)
 
 	mutex_lock(&mem_ctls_mutex);
 
-	/* scan the list and turn off all workq timers, doing so under lock
-	 */
-	list_for_each(item, &mc_devices) {
-		mci = list_entry(item, struct mem_ctl_info, link);
-
-		if (mci->op_state == OP_RUNNING_POLL)
-			cancel_delayed_work(&mci->work);
-	}
-
-	mutex_unlock(&mem_ctls_mutex);
-
-
-	/* re-walk the list, and reset the poll delay */
-	mutex_lock(&mem_ctls_mutex);
-
 	list_for_each(item, &mc_devices) {
 		mci = list_entry(item, struct mem_ctl_info, link);
 
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 28058ae..eaec8d7 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -152,13 +152,11 @@ static void set_timeout(unsigned long time)
 {
 	unsigned long delay;
 
-	cancel_delayed_work(&work);
-
 	delay = time - jiffies;
 	if ((long)delay <= 0)
 		delay = 1;
 
-	queue_delayed_work(addr_wq, &work, delay);
+	mod_delayed_work(addr_wq, &work, delay);
 }
 
 static void queue_req(struct addr_req *req)
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index d42c9f4..9e0895b 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -2679,11 +2679,9 @@ static void nes_process_mac_intr(struct nes_device *nesdev, u32 mac_number)
 			}
 		}
 		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_SFP_D) {
-			if (nesdev->link_recheck)
-				cancel_delayed_work(&nesdev->work);
 			nesdev->link_recheck = 1;
-			schedule_delayed_work(&nesdev->work,
-					      NES_LINK_RECHECK_DELAY);
+			mod_delayed_work(system_wq, &nesdev->work,
+					 NES_LINK_RECHECK_DELAY);
 		}
 	}
 
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index f3a3ecf..e43f6e4 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -243,10 +243,9 @@ static int nes_netdev_open(struct net_device *netdev)
 
 	spin_lock_irqsave(&nesdev->nesadapter->phy_lock, flags);
 	if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_SFP_D) {
-		if (nesdev->link_recheck)
-			cancel_delayed_work(&nesdev->work);
 		nesdev->link_recheck = 1;
-		schedule_delayed_work(&nesdev->work, NES_LINK_RECHECK_DELAY);
+		mod_delayed_work(system_wq, &nesdev->work,
+				 NES_LINK_RECHECK_DELAY);
 	}
 	spin_unlock_irqrestore(&nesdev->nesadapter->phy_lock, flags);
 
diff --git a/drivers/net/wireless/ipw2x00/ipw2100.c b/drivers/net/wireless/ipw2x00/ipw2100.c
index 95aa8e1..8a34202 100644
--- a/drivers/net/wireless/ipw2x00/ipw2100.c
+++ b/drivers/net/wireless/ipw2x00/ipw2100.c
@@ -2180,8 +2180,7 @@ static void isr_indicate_rf_kill(struct ipw2100_priv *priv, u32 status)
 
 	/* Make sure the RF Kill check timer is running */
 	priv->stop_rf_kill = 0;
-	cancel_delayed_work(&priv->rf_kill);
-	schedule_delayed_work(&priv->rf_kill, round_jiffies_relative(HZ));
+	mod_delayed_work(system_wq, &priv->rf_kill, round_jiffies_relative(HZ));
 }
 
 static void send_scan_event(void *data)
@@ -4321,9 +4320,8 @@ static int ipw_radio_kill_sw(struct ipw2100_priv *priv, int disable_radio)
 					  "disabled by HW switch\n");
 			/* Make sure the RF_KILL check timer is running */
 			priv->stop_rf_kill = 0;
-			cancel_delayed_work(&priv->rf_kill);
-			schedule_delayed_work(&priv->rf_kill,
-					      round_jiffies_relative(HZ));
+			mod_delayed_work(system_wq, &priv->rf_kill,
+					 round_jiffies_relative(HZ));
 		} else
 			schedule_reset(priv);
 	}
diff --git a/drivers/net/wireless/zd1211rw/zd_usb.c b/drivers/net/wireless/zd1211rw/zd_usb.c
index af83c43..ef2b171 100644
--- a/drivers/net/wireless/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zd1211rw/zd_usb.c
@@ -1164,8 +1164,7 @@ void zd_usb_reset_rx_idle_timer(struct zd_usb *usb)
 {
 	struct zd_usb_rx *rx = &usb->rx;
 
-	cancel_delayed_work(&rx->idle_work);
-	queue_delayed_work(zd_workqueue, &rx->idle_work, ZD_RX_IDLE_INTERVAL);
+	mod_delayed_work(zd_workqueue, &rx->idle_work, ZD_RX_IDLE_INTERVAL);
 }
 
 static inline void init_usb_interrupt(struct zd_usb *usb)
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index d5fd4a1..00d32a7 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -7683,25 +7683,15 @@ static int fan_set_speed(int speed)
 
 static void fan_watchdog_reset(void)
 {
-	static int fan_watchdog_active;
-
 	if (fan_control_access_mode == TPACPI_FAN_WR_NONE)
 		return;
 
-	if (fan_watchdog_active)
-		cancel_delayed_work(&fan_watchdog_task);
-
 	if (fan_watchdog_maxinterval > 0 &&
-	    tpacpi_lifecycle != TPACPI_LIFE_EXITING) {
-		fan_watchdog_active = 1;
-		if (!queue_delayed_work(tpacpi_wq, &fan_watchdog_task,
-				msecs_to_jiffies(fan_watchdog_maxinterval
-						 * 1000))) {
-			pr_err("failed to queue the fan watchdog, "
-			       "watchdog will not trigger\n");
-		}
-	} else
-		fan_watchdog_active = 0;
+	    tpacpi_lifecycle != TPACPI_LIFE_EXITING)
+		mod_delayed_work(tpacpi_wq, &fan_watchdog_task,
+			msecs_to_jiffies(fan_watchdog_maxinterval * 1000));
+	else
+		cancel_delayed_work(&fan_watchdog_task);
 }
 
 static void fan_watchdog_fire(struct work_struct *ignored)
diff --git a/drivers/power/charger-manager.c b/drivers/power/charger-manager.c
index 86935ec..0db0444 100644
--- a/drivers/power/charger-manager.c
+++ b/drivers/power/charger-manager.c
@@ -512,9 +512,8 @@ static void _setup_polling(struct work_struct *work)
 	if (!delayed_work_pending(&cm_monitor_work) ||
 	    (delayed_work_pending(&cm_monitor_work) &&
 	     time_after(next_polling, _next_polling))) {
-		cancel_delayed_work_sync(&cm_monitor_work);
 		next_polling = jiffies + polling_jiffy;
-		queue_delayed_work(cm_wq, &cm_monitor_work, polling_jiffy);
+		mod_delayed_work(cm_wq, &cm_monitor_work, polling_jiffy);
 	}
 
 out:
@@ -549,10 +548,8 @@ static void fullbatt_handler(struct charger_manager *cm)
 	if (cm_suspended)
 		device_set_wakeup_capable(cm->dev, true);
 
-	if (delayed_work_pending(&cm->fullbatt_vchk_work))
-		cancel_delayed_work(&cm->fullbatt_vchk_work);
-	queue_delayed_work(cm_wq, &cm->fullbatt_vchk_work,
-			   msecs_to_jiffies(desc->fullbatt_vchkdrop_ms));
+	mod_delayed_work(cm_wq, &cm->fullbatt_vchk_work,
+			 msecs_to_jiffies(desc->fullbatt_vchkdrop_ms));
 	cm->fullbatt_vchk_jiffies_at = jiffies + msecs_to_jiffies(
 				       desc->fullbatt_vchkdrop_ms);
 
diff --git a/drivers/power/ds2760_battery.c b/drivers/power/ds2760_battery.c
index 076e211..704e652 100644
--- a/drivers/power/ds2760_battery.c
+++ b/drivers/power/ds2760_battery.c
@@ -355,8 +355,7 @@ static void ds2760_battery_external_power_changed(struct power_supply *psy)
 
 	dev_dbg(di->dev, "%s\n", __func__);
 
-	cancel_delayed_work(&di->monitor_work);
-	queue_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ/10);
+	mod_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ/10);
 }
 
 
@@ -401,8 +400,7 @@ static void ds2760_battery_set_charged(struct power_supply *psy)
 
 	/* postpone the actual work by 20 secs. This is for debouncing GPIO
 	 * signals and to let the current value settle. See AN4188. */
-	cancel_delayed_work(&di->set_charged_work);
-	queue_delayed_work(di->monitor_wqueue, &di->set_charged_work, HZ * 20);
+	mod_delayed_work(di->monitor_wqueue, &di->set_charged_work, HZ * 20);
 }
 
 static int ds2760_battery_get_property(struct power_supply *psy,
@@ -616,8 +614,7 @@ static int ds2760_battery_resume(struct platform_device *pdev)
 	di->charge_status = POWER_SUPPLY_STATUS_UNKNOWN;
 	power_supply_changed(&di->bat);
 
-	cancel_delayed_work(&di->monitor_work);
-	queue_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ);
+	mod_delayed_work(di->monitor_wqueue, &di->monitor_work, HZ);
 
 	return 0;
 }
diff --git a/drivers/power/jz4740-battery.c b/drivers/power/jz4740-battery.c
index 8dbc7bf..ffbed5e 100644
--- a/drivers/power/jz4740-battery.c
+++ b/drivers/power/jz4740-battery.c
@@ -173,16 +173,14 @@ static void jz_battery_external_power_changed(struct power_supply *psy)
 {
 	struct jz_battery *jz_battery = psy_to_jz_battery(psy);
 
-	cancel_delayed_work(&jz_battery->work);
-	schedule_delayed_work(&jz_battery->work, 0);
+	mod_delayed_work(system_wq, &jz_battery->work, 0);
 }
 
 static irqreturn_t jz_battery_charge_irq(int irq, void *data)
 {
 	struct jz_battery *jz_battery = data;
 
-	cancel_delayed_work(&jz_battery->work);
-	schedule_delayed_work(&jz_battery->work, 0);
+	mod_delayed_work(system_wq, &jz_battery->work, 0);
 
 	return IRQ_HANDLED;
 }
diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
index 2d7a9fe..a937838 100644
--- a/drivers/thermal/thermal_sys.c
+++ b/drivers/thermal/thermal_sys.c
@@ -694,17 +694,14 @@ thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
 static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
 					    int delay)
 {
-	cancel_delayed_work(&(tz->poll_queue));
-
-	if (!delay)
-		return;
-
 	if (delay > 1000)
-		queue_delayed_work(system_freezable_wq, &(tz->poll_queue),
-				      round_jiffies(msecs_to_jiffies(delay)));
+		mod_delayed_work(system_freezable_wq, &tz->poll_queue,
+				 round_jiffies(msecs_to_jiffies(delay)));
+	else if (delay)
+		mod_delayed_work(system_freezable_wq, &tz->poll_queue,
+				 msecs_to_jiffies(delay));
 	else
-		queue_delayed_work(system_freezable_wq, &(tz->poll_queue),
-				      msecs_to_jiffies(delay));
+		cancel_delayed_work(&tz->poll_queue);
 }
 
 static void thermal_zone_device_passive(struct thermal_zone_device *tz,
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 587ef51..7ef637d 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -351,9 +351,7 @@ void afs_dispatch_give_up_callbacks(struct work_struct *work)
  */
 void afs_flush_callback_breaks(struct afs_server *server)
 {
-	cancel_delayed_work(&server->cb_break_work);
-	queue_delayed_work(afs_callback_update_worker,
-			   &server->cb_break_work, 0);
+	mod_delayed_work(afs_callback_update_worker, &server->cb_break_work, 0);
 }
 
 #if 0
diff --git a/fs/afs/server.c b/fs/afs/server.c
index d59b751..f342acf 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -285,12 +285,7 @@ static void afs_reap_server(struct work_struct *work)
 		expiry = server->time_of_death + afs_server_timeout;
 		if (expiry > now) {
 			delay = (expiry - now) * HZ;
-			if (!queue_delayed_work(afs_wq, &afs_server_reaper,
-						delay)) {
-				cancel_delayed_work(&afs_server_reaper);
-				queue_delayed_work(afs_wq, &afs_server_reaper,
-						   delay);
-			}
+			mod_delayed_work(afs_wq, &afs_server_reaper, delay);
 			break;
 		}
 
@@ -323,6 +318,5 @@ static void afs_reap_server(struct work_struct *work)
 void __exit afs_purge_servers(void)
 {
 	afs_server_timeout = 0;
-	cancel_delayed_work(&afs_server_reaper);
-	queue_delayed_work(afs_wq, &afs_server_reaper, 0);
+	mod_delayed_work(afs_wq, &afs_server_reaper, 0);
 }
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 431984d..57bcb15 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -561,12 +561,7 @@ static void afs_vlocation_reaper(struct work_struct *work)
 		if (expiry > now) {
 			delay = (expiry - now) * HZ;
 			_debug("delay %lu", delay);
-			if (!queue_delayed_work(afs_wq, &afs_vlocation_reap,
-						delay)) {
-				cancel_delayed_work(&afs_vlocation_reap);
-				queue_delayed_work(afs_wq, &afs_vlocation_reap,
-						   delay);
-			}
+			mod_delayed_work(afs_wq, &afs_vlocation_reap, delay);
 			break;
 		}
 
@@ -614,13 +609,10 @@ void afs_vlocation_purge(void)
 	spin_lock(&afs_vlocation_updates_lock);
 	list_del_init(&afs_vlocation_updates);
 	spin_unlock(&afs_vlocation_updates_lock);
-	cancel_delayed_work(&afs_vlocation_update);
-	queue_delayed_work(afs_vlocation_update_worker,
-			   &afs_vlocation_update, 0);
+	mod_delayed_work(afs_vlocation_update_worker, &afs_vlocation_update, 0);
 	destroy_workqueue(afs_vlocation_update_worker);
 
-	cancel_delayed_work(&afs_vlocation_reap);
-	queue_delayed_work(afs_wq, &afs_vlocation_reap, 0);
+	mod_delayed_work(afs_wq, &afs_vlocation_reap, 0);
 }
 
 /*
diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index 6930bec..1720d32 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -117,8 +117,7 @@ nfs4_schedule_state_renewal(struct nfs_client *clp)
 		timeout = 5 * HZ;
 	dprintk("%s: requeueing work. Lease period = %ld\n",
 			__func__, (timeout + HZ - 1) / HZ);
-	cancel_delayed_work(&clp->cl_renewd);
-	schedule_delayed_work(&clp->cl_renewd, timeout);
+	mod_delayed_work(system_wq, &clp->cl_renewd, timeout);
 	set_bit(NFS_CS_RENEWD, &clp->cl_res_state);
 	spin_unlock(&clp->cl_lock);
 }
diff --git a/net/core/dst.c b/net/core/dst.c
index 069d51d..ed5a0b4 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -214,8 +214,8 @@ void __dst_free(struct dst_entry *dst)
 	if (dst_garbage.timer_inc > DST_GC_INC) {
 		dst_garbage.timer_inc = DST_GC_INC;
 		dst_garbage.timer_expires = DST_GC_MIN;
-		cancel_delayed_work(&dst_gc_work);
-		schedule_delayed_work(&dst_gc_work, dst_garbage.timer_expires);
+		mod_delayed_work(system_wq, &dst_gc_work,
+				 dst_garbage.timer_expires);
 	}
 	spin_unlock_bh(&dst_garbage.lock);
 }
diff --git a/net/rfkill/input.c b/net/rfkill/input.c
index 24c55c5..c9d931e 100644
--- a/net/rfkill/input.c
+++ b/net/rfkill/input.c
@@ -164,8 +164,7 @@ static void rfkill_schedule_global_op(enum rfkill_sched_op op)
 	rfkill_op_pending = true;
 	if (op == RFKILL_GLOBAL_OP_EPO && !rfkill_is_epo_lock_active()) {
 		/* bypass the limiter for EPO */
-		cancel_delayed_work(&rfkill_op_work);
-		schedule_delayed_work(&rfkill_op_work, 0);
+		mod_delayed_work(system_wq, &rfkill_op_work, 0);
 		rfkill_last_scheduled = jiffies;
 	} else
 		rfkill_schedule_ratelimited();
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-07-31 17:55 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-27 23:54 [PATCHSET wq/for-3.7] workqueue: implement mod_delayed_work[_on]() Tejun Heo
2012-07-27 23:54 ` [PATCH 01/15] workqueue: reorder queueing functions so that _on() variants are on top Tejun Heo
2012-07-27 23:54 ` [PATCH 02/15] workqueue: make queueing functions return bool Tejun Heo
2012-07-27 23:54 ` [PATCH 03/15] workqueue: add missing smp_wmb() in process_one_work() Tejun Heo
2012-07-27 23:54 ` [PATCH 04/15] workqueue: disable preemption while manipulating PENDING Tejun Heo
2012-07-30 20:10   ` [PATCH v2 " Tejun Heo
2012-07-27 23:54 ` [PATCH 05/15] workqueue: set delayed_work->timer function on initialization Tejun Heo
2012-07-27 23:54 ` [PATCH 06/15] workqueue: unify local CPU queueing handling Tejun Heo
2012-07-27 23:55 ` [PATCH 07/15] workqueue: fix zero @delay handling of queue_delayed_work_on() Tejun Heo
2012-07-27 23:55 ` [PATCH 08/15] workqueue: move try_to_grab_pending() upwards Tejun Heo
2012-07-27 23:55 ` [PATCH 09/15] workqueue: introduce WORK_OFFQ_FLAG_* Tejun Heo
2012-07-27 23:55 ` [PATCH 10/15] workqueue: factor out __queue_delayed_work() from queue_delayed_work_on() Tejun Heo
2012-07-27 23:55 ` [PATCH 11/15] workqueue: reorganize try_to_grab_pending() and __cancel_timer_work() Tejun Heo
2012-07-27 23:55 ` [PATCH 12/15] workqueue: mark a work item being canceled as such Tejun Heo
2012-07-30 20:11   ` [PATCH v2 " Tejun Heo
2012-07-27 23:55 ` [PATCH 13/15] workqueue: implement mod_delayed_work[_on]() Tejun Heo
2012-07-27 23:55 ` [PATCH 14/15] workqueue: use mod_delayed_work() instead of cancel + queue Tejun Heo
2012-07-28  1:05   ` Henrique de Moraes Holschuh
2012-07-28  1:28   ` Dmitry Torokhov
2012-07-29 20:55   ` Anton Vorontsov
2012-07-31 17:55   ` [PATCH v2 " Tejun Heo
2012-07-27 23:55 ` [PATCH 15/15] workqueue: deprecate __cancel_delayed_work() Tejun Heo
2012-07-31 13:05   ` Tomi Valkeinen
2012-07-31 17:11     ` Tejun Heo
2012-07-31 17:51   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.