All of lore.kernel.org
 help / color / mirror / Atom feed
* Is adding requeue_delayed_work() a good idea
@ 2009-08-20 21:51 Roland Dreier
  2009-08-21 11:55 ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-08-20 21:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Oleg Nesterov

I have the following patch queued up for 2.6.32.  It fixes a
(theoretical, lockdep-reported) deadlock involving the del_timer_sync()
inside cancel_delayed_work(); the code in question really wants to
reschedule delayed work if the timeout it's keeping track of changes,
but the only way to do this now with delayed work is to cancel the work
and then queue it again.

My patch goes back to an open-coded version of delayed work that splits
the timer and the work struct.  However it occurs to me that an API like
requeue_delayed_work() that does a mod_timer() on the delayed work
struct might be useful -- OTOH making this fully general and keeping
track of the work's pending bit etc seems perhaps a bit dicy.

Thoughts?

 - Roland


IB/mad: Fix possible lock-lock-timer deadlock

Lockdep reported a possible deadlock with cm_id_priv->lock,
mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this
happens because the mad module does

	cancel_delayed_work(&mad_agent_priv->timed_work);

while holding mad_agent_priv->lock.  cancel_delayed_work() internally
does del_timer_sync(&mad_agent_priv->timed_work.timer).

This can turn into a deadlock because mad_agent_priv->lock is taken
inside cm_id_priv->lock, so we can get the following set of contexts
that deadlock each other:

 A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
 B: holding mad_agent_priv->lock, waiting for del_timer_sync()
 C: interrupt during mad_agent_priv->timed_work.timer that takes
    cm_id_priv->lock

Fix this by splitting the delayed work struct into a normal work
struct and a timer, and using mod_timer() instead of
cancel_delayed_work() plus queue_delayed_work() (which internally
becomes del_timer_sync() plus add_timer()).

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
 drivers/infiniband/core/mad.c      |   51 +++++++++++++++++------------------
 drivers/infiniband/core/mad_priv.h |    3 +-
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index de922a0..e8c05b2 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -175,6 +175,15 @@ int ib_response_mad(struct ib_mad *mad)
 }
 EXPORT_SYMBOL(ib_response_mad);
 
+static void timeout_callback(unsigned long data)
+{
+	struct ib_mad_agent_private *mad_agent_priv =
+		(struct ib_mad_agent_private *) data;
+
+	queue_work(mad_agent_priv->qp_info->port_priv->wq,
+		   &mad_agent_priv->timeout_work);
+}
+
 /*
  * ib_register_mad_agent - Register to send/receive MADs
  */
@@ -306,7 +315,9 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
 	INIT_LIST_HEAD(&mad_agent_priv->wait_list);
 	INIT_LIST_HEAD(&mad_agent_priv->done_list);
 	INIT_LIST_HEAD(&mad_agent_priv->rmpp_list);
-	INIT_DELAYED_WORK(&mad_agent_priv->timed_work, timeout_sends);
+	INIT_WORK(&mad_agent_priv->timeout_work, timeout_sends);
+	setup_timer(&mad_agent_priv->timeout_timer, timeout_callback,
+		    (unsigned long) mad_agent_priv);
 	INIT_LIST_HEAD(&mad_agent_priv->local_list);
 	INIT_WORK(&mad_agent_priv->local_work, local_completions);
 	atomic_set(&mad_agent_priv->refcount, 1);
@@ -513,7 +524,8 @@ static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv)
 	 */
 	cancel_mads(mad_agent_priv);
 	port_priv = mad_agent_priv->qp_info->port_priv;
-	cancel_delayed_work(&mad_agent_priv->timed_work);
+	del_timer_sync(&mad_agent_priv->timeout_timer);
+	cancel_work_sync(&mad_agent_priv->timeout_work);
 
 	spin_lock_irqsave(&port_priv->reg_lock, flags);
 	remove_mad_reg_req(mad_agent_priv);
@@ -1971,10 +1983,9 @@ out:
 static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv)
 {
 	struct ib_mad_send_wr_private *mad_send_wr;
-	unsigned long delay;
 
 	if (list_empty(&mad_agent_priv->wait_list)) {
-		cancel_delayed_work(&mad_agent_priv->timed_work);
+		del_timer(&mad_agent_priv->timeout_timer);
 	} else {
 		mad_send_wr = list_entry(mad_agent_priv->wait_list.next,
 					 struct ib_mad_send_wr_private,
@@ -1983,13 +1994,8 @@ static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv)
 		if (time_after(mad_agent_priv->timeout,
 			       mad_send_wr->timeout)) {
 			mad_agent_priv->timeout = mad_send_wr->timeout;
-			cancel_delayed_work(&mad_agent_priv->timed_work);
-			delay = mad_send_wr->timeout - jiffies;
-			if ((long)delay <= 0)
-				delay = 1;
-			queue_delayed_work(mad_agent_priv->qp_info->
-					   port_priv->wq,
-					   &mad_agent_priv->timed_work, delay);
+			mod_timer(&mad_agent_priv->timeout_timer,
+				  mad_send_wr->timeout);
 		}
 	}
 }
@@ -2016,17 +2022,14 @@ static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr)
 				       temp_mad_send_wr->timeout))
 				break;
 		}
-	}
-	else
+	} else
 		list_item = &mad_agent_priv->wait_list;
 	list_add(&mad_send_wr->agent_list, list_item);
 
 	/* Reschedule a work item if we have a shorter timeout */
-	if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) {
-		cancel_delayed_work(&mad_agent_priv->timed_work);
-		queue_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
-				   &mad_agent_priv->timed_work, delay);
-	}
+	if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list)
+		mod_timer(&mad_agent_priv->timeout_timer,
+			  mad_send_wr->timeout);
 }
 
 void ib_reset_mad_timeout(struct ib_mad_send_wr_private *mad_send_wr,
@@ -2470,10 +2473,10 @@ static void timeout_sends(struct work_struct *work)
 	struct ib_mad_agent_private *mad_agent_priv;
 	struct ib_mad_send_wr_private *mad_send_wr;
 	struct ib_mad_send_wc mad_send_wc;
-	unsigned long flags, delay;
+	unsigned long flags;
 
 	mad_agent_priv = container_of(work, struct ib_mad_agent_private,
-				      timed_work.work);
+				      timeout_work);
 	mad_send_wc.vendor_err = 0;
 
 	spin_lock_irqsave(&mad_agent_priv->lock, flags);
@@ -2483,12 +2486,8 @@ static void timeout_sends(struct work_struct *work)
 					 agent_list);
 
 		if (time_after(mad_send_wr->timeout, jiffies)) {
-			delay = mad_send_wr->timeout - jiffies;
-			if ((long)delay <= 0)
-				delay = 1;
-			queue_delayed_work(mad_agent_priv->qp_info->
-					   port_priv->wq,
-					   &mad_agent_priv->timed_work, delay);
+			mod_timer(&mad_agent_priv->timeout_timer,
+				  mad_send_wr->timeout);
 			break;
 		}
 
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 05ce331..1526fa2 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -99,7 +99,8 @@ struct ib_mad_agent_private {
 	struct list_head send_list;
 	struct list_head wait_list;
 	struct list_head done_list;
-	struct delayed_work timed_work;
+	struct work_struct timeout_work;
+	struct timer_list timeout_timer;
 	unsigned long timeout;
 	struct list_head local_list;
 	struct work_struct local_work;
-- 
1.6.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-20 21:51 Is adding requeue_delayed_work() a good idea Roland Dreier
@ 2009-08-21 11:55 ` Oleg Nesterov
  2009-08-21 21:53   ` Roland Dreier
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2009-08-21 11:55 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel

On 08/20, Roland Dreier wrote:
>
> My patch goes back to an open-coded version of delayed work that splits
> the timer and the work struct.  However it occurs to me that an API like
> requeue_delayed_work() that does a mod_timer() on the delayed work
> struct might be useful -- OTOH making this fully general and keeping
> track of the work's pending bit etc seems perhaps a bit dicy.

Completely agreed.

We need some simple changes in timer.c. __mod_timer() already has
pending_only, but requeue_delayed_work() needs another flag to prevent
migrating to another CPU. Again, this is simple, let's suppose we have
requeue_timer(timer) which works like mod_timer(pending_only => true)
but never changes timer->base.

The main question is: what should requeue_delayed_work(dwork) do when
dwork->timer is not pending but dwork->work is queued or running?
Should it cancel dwork->work is this case?

If yes, then I don't understand how this new helper (which is good
anyway) can help with this particular problem,

> happens because the mad module does
>
> 	cancel_delayed_work(&mad_agent_priv->timed_work);
>
> while holding mad_agent_priv->lock.  cancel_delayed_work() internally
> does del_timer_sync(&mad_agent_priv->timed_work.timer).
>
> This can turn into a deadlock because mad_agent_priv->lock is taken
> inside cm_id_priv->lock, so we can get the following set of contexts
> that deadlock each other:
>
>  A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
>  B: holding mad_agent_priv->lock, waiting for del_timer_sync()
>  C: interrupt during mad_agent_priv->timed_work.timer that takes
>     cm_id_priv->lock

OK, suppose that we s/cancel_delayed_work/requeue_delayed_work/,
then we seem to have the same deadlock

  A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
  B: holding mad_agent_priv->lock, waiting for requeue_delayed_work()
     which found !timer_pending() && queued work
  C: interrupt during work->func() that takes cm_id_priv->lock

Perhaps, requeue_delayed_work() should cancel the pending work, but do
not wait_on_work(). This is not trivial, we have to avoid livelocks if
cancel_work_no_sync() races with queue_work()/etc. Perhaps,
requeue_delayed_work() could return the error if it can't update the
timer and can't cancel the work without spinning ?

What dou you think?

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-21 11:55 ` Oleg Nesterov
@ 2009-08-21 21:53   ` Roland Dreier
  2009-08-22 10:35     ` Stefan Richter
  2009-08-24 18:01     ` Oleg Nesterov
  0 siblings, 2 replies; 14+ messages in thread
From: Roland Dreier @ 2009-08-21 21:53 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel


 > We need some simple changes in timer.c. __mod_timer() already has
 > pending_only, but requeue_delayed_work() needs another flag to prevent
 > migrating to another CPU. Again, this is simple, let's suppose we have
 > requeue_timer(timer) which works like mod_timer(pending_only => true)
 > but never changes timer->base.

Yes... in my case I don't particularly care about which CPU the timer or
work runs on, so I ignored that.

 > The main question is: what should requeue_delayed_work(dwork) do when
 > dwork->timer is not pending but dwork->work is queued or running?
 > Should it cancel dwork->work is this case?

In my particular case it doesn't really matter.  In the queued case it
could leave it to run whenever it gets to the head of the workqueue.  In
the already running case then I think the timer should be reset.  The
main point is that if I do requeue_delayed_work() I want to make sure
the work runs all the way through from the beginning at some point in
the future.  The pattern I have in mind is something like:

	spin_lock_irqsave(&mydata_lock);
	new_timeout = add_item_to_timeout_list();
	requeue_delayed_work(wq, &process_timeout_list_work, new_timeout);
	spin_unlock_irqsave(&mydata_lock);

so if the process_timeout_list_work runs early or twice it doesn't
matter; I just want to make sure that the work runs from the beginning
and sees the new item I added to the list at some point after the
requeue.

 > OK, suppose that we s/cancel_delayed_work/requeue_delayed_work/,
 > then we seem to have the same deadlock
 > 
 >   A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
 >   B: holding mad_agent_priv->lock, waiting for requeue_delayed_work()
 >      which found !timer_pending() && queued work
 >   C: interrupt during work->func() that takes cm_id_priv->lock

Yes, I agree that if requeue_delayed_work() ever waits then we run into
the same deadlock as before.  It only works if requeue_delayed_work() is
the rough equivalent of mod_timer(), which never waits.

 > Perhaps, requeue_delayed_work() should cancel the pending work, but do
 > not wait_on_work(). This is not trivial, we have to avoid livelocks if
 > cancel_work_no_sync() races with queue_work()/etc. Perhaps,
 > requeue_delayed_work() could return the error if it can't update the
 > timer and can't cancel the work without spinning ?

I guess returning an error is possible ... although I wonder what the
caller would do to handle the error?

Perhaps the semantics are sufficiently fuzzy and not general enough, so
that the best answer is my special-case open coded change for my
specific case.  I don't know whether other places would even want a
requeue_delayed_work() ... I simply raise this point because when I find
myself reimplementing the structure of work_struct + timer because
delayed_work API is lacking, then it seems prudent to consider extending
delayed_work API instead.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-21 21:53   ` Roland Dreier
@ 2009-08-22 10:35     ` Stefan Richter
  2009-08-24 18:01     ` Oleg Nesterov
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Richter @ 2009-08-22 10:35 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Oleg Nesterov, linux-kernel

Roland Dreier wrote:
[Oleg Nesterov wrote]
> Perhaps the semantics are sufficiently fuzzy and not general enough, so
> that the best answer is my special-case open coded change for my
> specific case.  I don't know whether other places would even want a
> requeue_delayed_work() ... I simply raise this point because when I find
> myself reimplementing the structure of work_struct + timer because
> delayed_work API is lacking, then it seems prudent to consider extending
> delayed_work API instead.

There are two or three use cases of rescheduling a delayed work in 
drivers/firewire, but in one regard they are simpler than your case:

The work only needs to be pushed back in time, never scheduled earlier 
than it was originally scheduled.  This is easily implemented with the 
existing delayed work API by letting the work check whether it is 
entered too early; if yes, reschedule itself.

Further requirements of this use case:
   - It doesn't matter on which CPU this kind of work runs on.
   - If the event which necessitates rescheduling happens when the
     work is already running, then it might depend on the outcome of
     the work whether it needs to be scheduled again.  I think the
     requeue_delayed_work should do nothing in that case and the
     work needs to test how it went and rearm if necessary.

Well, considering the latter point, there is no harm in keeping all of 
the requeue_delayed_work logic concentrated in the work itself in these 
FireWire use cases.

[These use cases are high-level protocols for resource acquisition over 
the FireWire bus.  Cooperative fairness schemes require them to be timed 
after a grace period for resource re-acquisition by holders of older 
resources, following after bus reset events which may happen any time.]

So, the question is whether there is any user besides IB which needs to 
pull a delayed work forward in time.

Or another thought:  Would it hurt if you ignored any shortening of 
timeouts, i.e. reduce your use case also to only ever _deferring_ work, 
never rescheduling to an earlier point in time?
-- 
Stefan Richter
-=====-==--= =--- =-==-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-21 21:53   ` Roland Dreier
  2009-08-22 10:35     ` Stefan Richter
@ 2009-08-24 18:01     ` Oleg Nesterov
  2009-08-24 21:11       ` Roland Dreier
  1 sibling, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2009-08-24 18:01 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel

On 08/21, Roland Dreier wrote:
>
>  > The main question is: what should requeue_delayed_work(dwork) do when
>  > dwork->timer is not pending but dwork->work is queued or running?
>  > Should it cancel dwork->work is this case?
>
> In my particular case it doesn't really matter.  In the queued case it
> could leave it to run whenever it gets to the head of the workqueue.  In
> the already running case then I think the timer should be reset.  The
> main point is that if I do requeue_delayed_work() I want to make sure
> the work runs all the way through from the beginning at some point in
> the future.  The pattern I have in mind is something like:
>
> 	spin_lock_irqsave(&mydata_lock);
> 	new_timeout = add_item_to_timeout_list();
> 	requeue_delayed_work(wq, &process_timeout_list_work, new_timeout);
> 	spin_unlock_irqsave(&mydata_lock);
>
> so if the process_timeout_list_work runs early or twice it doesn't
> matter; I just want to make sure that the work runs from the beginning
> and sees the new item I added to the list at some point after the
> requeue.

Hmm. But, asuming that process_timeout_list_work->func() takes mydata_lock
too, you can just use queue_delayed_work() ?

process_timeout_list_work can't miss new items, queue_delayed_work()
can only fail if dwork is pending and its ->func has not started yet.

No?

> Perhaps the semantics are sufficiently fuzzy and not general enough, so
> that the best answer is my special-case open coded change for my
> specific case.  I don't know whether other places would even want a
> requeue_delayed_work() ... I simply raise this point because when I find
> myself reimplementing the structure of work_struct + timer because
> delayed_work API is lacking, then it seems prudent to consider extending
> delayed_work API instead.

Yes, I completely agree with you.

I think we need requeue_xxx helper(s), but the exact semantics is not
clear to me.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-24 18:01     ` Oleg Nesterov
@ 2009-08-24 21:11       ` Roland Dreier
  2009-08-25  9:39         ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-08-24 21:11 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel


 > > In my particular case it doesn't really matter.  In the queued case it
 > > could leave it to run whenever it gets to the head of the workqueue.  In
 > > the already running case then I think the timer should be reset.  The
 > > main point is that if I do requeue_delayed_work() I want to make sure
 > > the work runs all the way through from the beginning at some point in
 > > the future.  The pattern I have in mind is something like:
 > >
 > > 	spin_lock_irqsave(&mydata_lock);
 > > 	new_timeout = add_item_to_timeout_list();
 > > 	requeue_delayed_work(wq, &process_timeout_list_work, new_timeout);
 > > 	spin_unlock_irqsave(&mydata_lock);
 > >
 > > so if the process_timeout_list_work runs early or twice it doesn't
 > > matter; I just want to make sure that the work runs from the beginning
 > > and sees the new item I added to the list at some point after the
 > > requeue.
 > 
 > Hmm. But, asuming that process_timeout_list_work->func() takes mydata_lock
 > too, you can just use queue_delayed_work() ?
 > 
 > process_timeout_list_work can't miss new items, queue_delayed_work()
 > can only fail if dwork is pending and its ->func has not started yet.

Maybe I misunderstand the code or misunderstand you, but looking at it:

	int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
				struct delayed_work *dwork, unsigned long delay)
	{
		int ret = 0;
		struct timer_list *timer = &dwork->timer;
		struct work_struct *work = &dwork->work;
	
		if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
			BUG_ON(timer_pending(timer));
			BUG_ON(!list_empty(&work->entry));
			//...
		}
		return ret;
	}

if I have dwork already queued and try to requeue it for earlier it
seems that this will just silently return with the later expiration
still in effect.

Maybe I wasn't totally clear on what I want to happen; an example would
be if, say, I have delayed work queued with timer set to expire in 100
seconds and then a new piece of work is added that needs to expire in 1
second.  So I want to requeue delayed work to run in 1 second, not wait
the 100 seconds for the original timeout work.

 > I think we need requeue_xxx helper(s), but the exact semantics is not
 > clear to me.

Not clear to me either unfortunately.

 - R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-24 21:11       ` Roland Dreier
@ 2009-08-25  9:39         ` Oleg Nesterov
  2009-08-26 18:42           ` Roland Dreier
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2009-08-25  9:39 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel, Dmitry Torokhov

Now I noticed I forgot to CC Dmitry yesterday...

On 08/24, Roland Dreier wrote:
>
>  > > In my particular case it doesn't really matter.  In the queued case it
>  > > could leave it to run whenever it gets to the head of the workqueue.  In
>  > > the already running case then I think the timer should be reset.  The
>  > > main point is that if I do requeue_delayed_work() I want to make sure
>  > > the work runs all the way through from the beginning at some point in
>  > > the future.  The pattern I have in mind is something like:
>  > >
>  > > 	spin_lock_irqsave(&mydata_lock);
>  > > 	new_timeout = add_item_to_timeout_list();
>  > > 	requeue_delayed_work(wq, &process_timeout_list_work, new_timeout);
>  > > 	spin_unlock_irqsave(&mydata_lock);
>  > >
>  > > so if the process_timeout_list_work runs early or twice it doesn't
>  > > matter; I just want to make sure that the work runs from the beginning
>  > > and sees the new item I added to the list at some point after the
>  > > requeue.
>  >
>  > Hmm. But, asuming that process_timeout_list_work->func() takes mydata_lock
>  > too, you can just use queue_delayed_work() ?
>  >
>  > process_timeout_list_work can't miss new items, queue_delayed_work()
>  > can only fail if dwork is pending and its ->func has not started yet.
>
> Maybe I misunderstand the code or misunderstand you,

No, sorry. I misunderstood you (and sorry for delays btw).

I have read "I just want to make sure" above but forgot you also need
to shorten the timeout.

OK, in this case I think we have a simple solution,

	// like cancel_delayed_work, but uses del_timer().
	// this means, if it returns 0 the timer function may be
	// running and the queueing is in progress. The caller
	// can't rely on flush_workqueue/etc
	static inline int __cancel_delayed_work(struct delayed_work *work)
	{
		int ret;

		ret = del_timer(&work->timer);
		if (ret)
			work_clear_pending(&work->work);
		return ret;
	}

Now, you can do

	spin_lock_irqsave(&mydata_lock);
	new_timeout = add_item_to_timeout_list();

	__cancel_delayed_work(&process_timeout_list_work);
	queue_delayed_work(wq, &process_timeout_list_work, new_timeout);

	spin_unlock_irqsave(&mydata_lock);

If queue_delayed_work() fails, this means that WORK_STRUCT_PENDING is set,
dwork->work is already queued or the queueing is in progress. In both
cases it will run "soon" as if we just called queue_work(&dwork->work).

But this assumes nobody else does queue_delayed_work(dwork, HUGE_DELAY) in
parallel, otherwise we can lose the race and another caller can setup
HUGE_DELAY timeout.

In particular, if process_timeout_list_work->func() itself uses
queue_delay_work() to re-arm itself we can race. Bu t I think it is always
possible to do something to synchronize with work->func, for example
work->func() can re-arm itself _before_ it scans timeout_list (under the
same lock). This way, if re-queue code above fails because work->func()
wins, work->func() must see the new additions to timeout_list.

Can this work for you?

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-25  9:39         ` Oleg Nesterov
@ 2009-08-26 18:42           ` Roland Dreier
  2009-08-28 17:59             ` [PATCH 0/1] introduce __cancel_delayed_work() Oleg Nesterov
  2009-09-01  0:44             ` Is adding requeue_delayed_work() a good idea Dmitry Torokhov
  0 siblings, 2 replies; 14+ messages in thread
From: Roland Dreier @ 2009-08-26 18:42 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Dmitry Torokhov


 > OK, in this case I think we have a simple solution,
 > 
 > 	// like cancel_delayed_work, but uses del_timer().
 > 	// this means, if it returns 0 the timer function may be
 > 	// running and the queueing is in progress. The caller
 > 	// can't rely on flush_workqueue/etc
 > 	static inline int __cancel_delayed_work(struct delayed_work *work)
 > 	{
 > 		int ret;
 > 
 > 		ret = del_timer(&work->timer);
 > 		if (ret)
 > 			work_clear_pending(&work->work);
 > 		return ret;
 > 	}
 > 
 > Now, you can do
 > 
 > 	spin_lock_irqsave(&mydata_lock);
 > 	new_timeout = add_item_to_timeout_list();
 > 
 > 	__cancel_delayed_work(&process_timeout_list_work);
 > 	queue_delayed_work(wq, &process_timeout_list_work, new_timeout);
 > 
 > 	spin_unlock_irqsave(&mydata_lock);
 > 
 > If queue_delayed_work() fails, this means that WORK_STRUCT_PENDING is set,
 > dwork->work is already queued or the queueing is in progress. In both
 > cases it will run "soon" as if we just called queue_work(&dwork->work).

This looks like it would work well.  If we can get this into 2.6.32 then
I will drop my patch and switch to this approach instead.

 > But this assumes nobody else does queue_delayed_work(dwork, HUGE_DELAY) in
 > parallel, otherwise we can lose the race and another caller can setup
 > HUGE_DELAY timeout.

In my case this is fine -- all uses of queue_delayed_work() are
synchronized with the same lock.  So any place that tries to shorten the
timeout will succeed.

 > In particular, if process_timeout_list_work->func() itself uses
 > queue_delay_work() to re-arm itself we can race. Bu t I think it is always
 > possible to do something to synchronize with work->func, for example
 > work->func() can re-arm itself _before_ it scans timeout_list (under the
 > same lock). This way, if re-queue code above fails because work->func()
 > wins, work->func() must see the new additions to timeout_list.

In my case, work function does do queue_delayed_work(), but with the
same lock as everyone else held.  So there should be no race.

Thanks,
  Roland


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/1] introduce __cancel_delayed_work()
  2009-08-26 18:42           ` Roland Dreier
@ 2009-08-28 17:59             ` Oleg Nesterov
  2009-08-28 18:00               ` [PATCH 1/1] " Oleg Nesterov
  2009-09-01  0:44             ` Is adding requeue_delayed_work() a good idea Dmitry Torokhov
  1 sibling, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2009-08-28 17:59 UTC (permalink / raw)
  To: Andrew Morton, Roland Dreier
  Cc: linux-kernel, Dmitry Torokhov, Stefan Richter

On 08/26, Roland Dreier wrote:
>
>  > OK, in this case I think we have a simple solution,
>  >
>  > 	// like cancel_delayed_work, but uses del_timer().
>  > 	// this means, if it returns 0 the timer function may be
>  > 	// running and the queueing is in progress. The caller
>  > 	// can't rely on flush_workqueue/etc
>  > 	static inline int __cancel_delayed_work(struct delayed_work *work)
>  > 	{
>  > 		int ret;
>  >
>  > 		ret = del_timer(&work->timer);
>  > 		if (ret)
>  > 			work_clear_pending(&work->work);
>  > 		return ret;
>  > 	}
>
> This looks like it would work well.  If we can get this into 2.6.32 then
> I will drop my patch and switch to this approach instead.

I am not sure how can I push this patch into 2.6.32, so I am just sending
it to Andrew.

Or. Please feel free to embed this change in mad.c fixes and send the patch
yourself (if it is not too late, I have to apologize for delay again).


As for requeue, perhaps we should add two helpers. The first is simple,

	if (!mod_timer_pending(dwork->timer, new_delay))
		queue_delayed_work(dwork, new_delay);

and I think this is what we need for mad.c. Another one is not "atomic"
but also cancells dwork. But this we we can't decide what exactly these
helpers should do, it is "safer" to add the trivial __cancel_delayed_work()
which hopefully can have other users.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/1] introduce __cancel_delayed_work()
  2009-08-28 17:59             ` [PATCH 0/1] introduce __cancel_delayed_work() Oleg Nesterov
@ 2009-08-28 18:00               ` Oleg Nesterov
  2009-09-01 16:09                 ` Roland Dreier
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2009-08-28 18:00 UTC (permalink / raw)
  To: Andrew Morton, Roland Dreier
  Cc: linux-kernel, Dmitry Torokhov, Stefan Richter

cancel_delayed_work() has to use del_timer_sync() to guarantee the
timer function is not running after return. But most users doesn't
actually need this, and del_timer_sync() has problems: it is not
useable from interrupt, and it depends on every lock which could
be taken from irq.

Introduce __cancel_delayed_work() which calls del_timer() instead.

The immediate reason for this patch is
http://bugzilla.kernel.org/show_bug.cgi?id=13757
but hopefully this helper makes sense anyway.

As for 13757 bug, actually we need requeue_delayed_work(), but its
semantics is not clear yet.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

--- MINI/include/linux/workqueue.h~CDW	2009-08-27 20:17:08.000000000 +0200
+++ MINI/include/linux/workqueue.h	2009-08-27 20:22:48.000000000 +0200
@@ -240,6 +240,21 @@ static inline int cancel_delayed_work(st
 	return ret;
 }
 
+/*
+ * Like above, but uses del_timer() instead of del_timer_sync(). This means,
+ * if it returns 0 the timer function may be running and the queueing is in
+ * progress.
+ */
+static inline int __cancel_delayed_work(struct delayed_work *work)
+{
+	int ret;
+
+	ret = del_timer(&work->timer);
+	if (ret)
+		work_clear_pending(&work->work);
+	return ret;
+}
+
 extern int cancel_delayed_work_sync(struct delayed_work *work);
 
 /* Obsolete. use cancel_delayed_work_sync() */


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Is adding requeue_delayed_work() a good idea
  2009-08-26 18:42           ` Roland Dreier
  2009-08-28 17:59             ` [PATCH 0/1] introduce __cancel_delayed_work() Oleg Nesterov
@ 2009-09-01  0:44             ` Dmitry Torokhov
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Torokhov @ 2009-09-01  0:44 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Oleg Nesterov, linux-kernel

On Wed, Aug 26, 2009 at 11:42:36AM -0700, Roland Dreier wrote:
> 
>  > OK, in this case I think we have a simple solution,
>  > 
>  > 	// like cancel_delayed_work, but uses del_timer().
>  > 	// this means, if it returns 0 the timer function may be
>  > 	// running and the queueing is in progress. The caller
>  > 	// can't rely on flush_workqueue/etc
>  > 	static inline int __cancel_delayed_work(struct delayed_work *work)
>  > 	{
>  > 		int ret;
>  > 
>  > 		ret = del_timer(&work->timer);
>  > 		if (ret)
>  > 			work_clear_pending(&work->work);
>  > 		return ret;
>  > 	}
>  > 
>  > Now, you can do
>  > 
>  > 	spin_lock_irqsave(&mydata_lock);
>  > 	new_timeout = add_item_to_timeout_list();
>  > 
>  > 	__cancel_delayed_work(&process_timeout_list_work);
>  > 	queue_delayed_work(wq, &process_timeout_list_work, new_timeout);
>  > 
>  > 	spin_unlock_irqsave(&mydata_lock);
>  > 
>  > If queue_delayed_work() fails, this means that WORK_STRUCT_PENDING is set,
>  > dwork->work is already queued or the queueing is in progress. In both
>  > cases it will run "soon" as if we just called queue_work(&dwork->work).
> 
> This looks like it would work well.  If we can get this into 2.6.32 then
> I will drop my patch and switch to this approach instead.
> 
>  > But this assumes nobody else does queue_delayed_work(dwork, HUGE_DELAY) in
>  > parallel, otherwise we can lose the race and another caller can setup
>  > HUGE_DELAY timeout.
> 
> In my case this is fine -- all uses of queue_delayed_work() are
> synchronized with the same lock.  So any place that tries to shorten the
> timeout will succeed.
> 

I can add the necessary locking in my case - I think it is actually needed
there even without this particular change.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] introduce __cancel_delayed_work()
  2009-08-28 18:00               ` [PATCH 1/1] " Oleg Nesterov
@ 2009-09-01 16:09                 ` Roland Dreier
  2009-09-01 16:40                   ` Dmitry Torokhov
  0 siblings, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-09-01 16:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, linux-kernel, Dmitry Torokhov, Stefan Richter


 > cancel_delayed_work() has to use del_timer_sync() to guarantee the
 > timer function is not running after return. But most users doesn't
 > actually need this, and del_timer_sync() has problems: it is not
 > useable from interrupt, and it depends on every lock which could
 > be taken from irq.
 > 
 > Introduce __cancel_delayed_work() which calls del_timer() instead.
 > 
 > The immediate reason for this patch is
 > http://bugzilla.kernel.org/show_bug.cgi?id=13757
 > but hopefully this helper makes sense anyway.

Thanks, Oleg!

Andrew -- how do you want to handle this?  This seems to be useful for
the bug with IB that Oleg linked to, as well as by Dmitry in input, so
I'm not sure what the best way to merge all this into 2.6.32 is.

I could take Oleg's patch and the corresponding fix to
drivers/infiniband through my tree, and merge as early I as I see Linus
open 2.6.32.  That leaves Dmitry to wait on it (and possibly causes
problems in -next with tree ordering) though.  But I don't see any way
to get the number of cross-tree dependencies below 1... (unless maybe
Dmitry can take the identical workqueue patch into his tree and trust
git to sort it out?)

 - R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] introduce __cancel_delayed_work()
  2009-09-01 16:09                 ` Roland Dreier
@ 2009-09-01 16:40                   ` Dmitry Torokhov
  2009-09-01 22:29                     ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Torokhov @ 2009-09-01 16:40 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Oleg Nesterov, Andrew Morton, linux-kernel, Stefan Richter

On Tuesday 01 September 2009 09:09:36 am Roland Dreier wrote:
> 
>  > cancel_delayed_work() has to use del_timer_sync() to guarantee the
>  > timer function is not running after return. But most users doesn't
>  > actually need this, and del_timer_sync() has problems: it is not
>  > useable from interrupt, and it depends on every lock which could
>  > be taken from irq.
>  > 
>  > Introduce __cancel_delayed_work() which calls del_timer() instead.
>  > 
>  > The immediate reason for this patch is
>  > http://bugzilla.kernel.org/show_bug.cgi?id=13757
>  > but hopefully this helper makes sense anyway.
> 
> Thanks, Oleg!
> 
> Andrew -- how do you want to handle this?  This seems to be useful for
> the bug with IB that Oleg linked to, as well as by Dmitry in input, so
> I'm not sure what the best way to merge all this into 2.6.32 is.
> 
> I could take Oleg's patch and the corresponding fix to
> drivers/infiniband through my tree, and merge as early I as I see Linus
> open 2.6.32.  That leaves Dmitry to wait on it (and possibly causes
> problems in -next with tree ordering) though.  But I don't see any way
> to get the number of cross-tree dependencies below 1... (unless maybe
> Dmitry can take the identical workqueue patch into his tree and trust
> git to sort it out?)

I wonder if Linus would not just take it in 31 - it is a completely
new function with no current users (but users will surely follow) so
introducing regression is highly unlikely... That would resolve all
inter-tree dependencies.

Otherwise we'll have to leave our fate in the hands of git ;)

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] introduce __cancel_delayed_work()
  2009-09-01 16:40                   ` Dmitry Torokhov
@ 2009-09-01 22:29                     ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2009-09-01 22:29 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: rdreier, oleg, linux-kernel, stefanr

On Tue, 1 Sep 2009 09:40:33 -0700
Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote:

> > Andrew -- how do you want to handle this?  This seems to be useful for
> > the bug with IB that Oleg linked to, as well as by Dmitry in input, so
> > I'm not sure what the best way to merge all this into 2.6.32 is.
> > 
> > I could take Oleg's patch and the corresponding fix to
> > drivers/infiniband through my tree, and merge as early I as I see Linus
> > open 2.6.32.  That leaves Dmitry to wait on it (and possibly causes
> > problems in -next with tree ordering) though.  But I don't see any way
> > to get the number of cross-tree dependencies below 1... (unless maybe
> > Dmitry can take the identical workqueue patch into his tree and trust
> > git to sort it out?)
> 
> I wonder if Linus would not just take it in 31 - it is a completely
> new function with no current users (but users will surely follow) so
> introducing regression is highly unlikely... That would resolve all
> inter-tree dependencies.
> 
> Otherwise we'll have to leave our fate in the hands of git ;)

I'll try to sneak it into 2.6.31 once Linus returns, but it'll need
good timing.

Or someone else can merge it into their tree, in which case I'll
promptly forget about it ;)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-09-01 22:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-20 21:51 Is adding requeue_delayed_work() a good idea Roland Dreier
2009-08-21 11:55 ` Oleg Nesterov
2009-08-21 21:53   ` Roland Dreier
2009-08-22 10:35     ` Stefan Richter
2009-08-24 18:01     ` Oleg Nesterov
2009-08-24 21:11       ` Roland Dreier
2009-08-25  9:39         ` Oleg Nesterov
2009-08-26 18:42           ` Roland Dreier
2009-08-28 17:59             ` [PATCH 0/1] introduce __cancel_delayed_work() Oleg Nesterov
2009-08-28 18:00               ` [PATCH 1/1] " Oleg Nesterov
2009-09-01 16:09                 ` Roland Dreier
2009-09-01 16:40                   ` Dmitry Torokhov
2009-09-01 22:29                     ` Andrew Morton
2009-09-01  0:44             ` Is adding requeue_delayed_work() a good idea Dmitry Torokhov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.