linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher
@ 2014-05-26 11:58 Lai Jiangshan
  2014-05-26 11:58 ` [PATCH 2/2] workqueue: remove the unneeded "while(true)" loop and adjust the indent Lai Jiangshan
  2014-05-27 14:24 ` [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher Tejun Heo
  0 siblings, 2 replies; 4+ messages in thread
From: Lai Jiangshan @ 2014-05-26 11:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

In the end of the cascading code of flush_workqueue(), the current
first-flusher will hand over the cascading-responsibility to the next
flusher and make the next flusher to be the new first-flusher.

But when the current first-flusher finds out that the color of the next
flusher is already done, it will deprive first-flusher role from
the next, and repeat cascading by itself for optimization.

This optimization saves one mutex_lock(&wq->mutex) due to the current
first-flusher already held it.

This optimization reduces the cascading-latency, because the next flusher
is not running currently, it will delay a little when we keep the next's
responsibility for cascading.

This optimization may also have other benefits. However, it is slow-path
and low-probability-hit case, and it is not good at these aspects:
    1) it adds a special case and makes the code complex, bad for review.
    2) it adds a special state for the first-flusher which is allowed to
       be deprived. It causes a race and we have to check wq->first_flusher
       again with mutex held: 4ce48b37bfed ("workqueue: fix race condition
       in flush_workqueue()").

Since it is slow-path, we remove this optimization and pass the cascading
responsibility to the next flusher unconditionally. And the race condition
mentioned above is gone, so we also revert the 4ce48b37bfed by removing
the checking.

And since it always passes the cascading responsibility to the next flusher,
it doesn't need to repeat to cascade, and the "while(true)" can be removed.
But if we remove the "while(true)" loop and adjust the indent, "git diff"
will result a complex patch which is hard to review. So we will remove the
"while(true)" loop in the next patch which makes two patches are good
for review. This patch just uses a "break" to disable repeating cascading
at the end of the loop.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   14 ++------------
 1 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bc3c188..948a84f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2572,10 +2572,6 @@ void flush_workqueue(struct workqueue_struct *wq)
 
 	mutex_lock(&wq->mutex);
 
-	/* we might have raced, check again with mutex held */
-	if (wq->first_flusher != &this_flusher)
-		goto out_unlock;
-
 	wq->first_flusher = NULL;
 
 	WARN_ON_ONCE(!list_empty(&this_flusher.list));
@@ -2631,14 +2627,8 @@ void flush_workqueue(struct workqueue_struct *wq)
 		list_del_init(&next->list);
 		wq->first_flusher = next;
 
-		if (flush_workqueue_prep_pwqs(wq, wq->flush_color, -1))
-			break;
-
-		/*
-		 * Meh... this color is already done, clear first
-		 * flusher and repeat cascading.
-		 */
-		wq->first_flusher = NULL;
+		flush_workqueue_prep_pwqs(wq, wq->flush_color, -1);
+		break;
 	}
 
 out_unlock:
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] workqueue: remove the unneeded "while(true)" loop and adjust the indent
  2014-05-26 11:58 [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher Lai Jiangshan
@ 2014-05-26 11:58 ` Lai Jiangshan
  2014-05-27 14:24 ` [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Lai Jiangshan @ 2014-05-26 11:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

Since the current first-flusher always passes the cascading responsibility
to the next flusher, the body of "while(true)" loop is running exactly once,
"while(true)" loop is unneeded. So we remove the "while(true)" loop and
adjust the indent.

Note, there is no "continue" inside the "while(true)" loop, but there are
three "break" inside the loop. The first one is in the inner loop, we don't
touch it except adjusting the indent. The second one is changed to
"goto out_unlock". The third one is in the end of the loop, it is just
removed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   84 ++++++++++++++++++++++++---------------------------
 1 files changed, 40 insertions(+), 44 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 948a84f..3d8bcbe 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2507,6 +2507,7 @@ void flush_workqueue(struct workqueue_struct *wq)
 		.flush_color = -1,
 		.done = COMPLETION_INITIALIZER_ONSTACK(this_flusher.done),
 	};
+	struct wq_flusher *next, *tmp;
 	int next_color;
 
 	lock_map_acquire(&wq->lockdep_map);
@@ -2577,59 +2578,54 @@ void flush_workqueue(struct workqueue_struct *wq)
 	WARN_ON_ONCE(!list_empty(&this_flusher.list));
 	WARN_ON_ONCE(wq->flush_color != this_flusher.flush_color);
 
-	while (true) {
-		struct wq_flusher *next, *tmp;
-
-		/* complete all the flushers sharing the current flush color */
-		list_for_each_entry_safe(next, tmp, &wq->flusher_queue, list) {
-			if (next->flush_color != wq->flush_color)
-				break;
-			list_del_init(&next->list);
-			complete(&next->done);
-		}
+	/* complete all the flushers sharing the current flush color */
+	list_for_each_entry_safe(next, tmp, &wq->flusher_queue, list) {
+		if (next->flush_color != wq->flush_color)
+			break;
+		list_del_init(&next->list);
+		complete(&next->done);
+	}
 
-		WARN_ON_ONCE(!list_empty(&wq->flusher_overflow) &&
-			     wq->flush_color != work_next_color(wq->work_color));
+	WARN_ON_ONCE(!list_empty(&wq->flusher_overflow) &&
+		     wq->flush_color != work_next_color(wq->work_color));
 
-		/* this flush_color is finished, advance by one */
-		wq->flush_color = work_next_color(wq->flush_color);
+	/* this flush_color is finished, advance by one */
+	wq->flush_color = work_next_color(wq->flush_color);
 
-		/* one color has been freed, handle overflow queue */
-		if (!list_empty(&wq->flusher_overflow)) {
-			/*
-			 * Assign the same color to all overflowed
-			 * flushers, advance work_color and append to
-			 * flusher_queue.  This is the start-to-wait
-			 * phase for these overflowed flushers.
-			 */
-			list_for_each_entry(tmp, &wq->flusher_overflow, list)
-				tmp->flush_color = wq->work_color;
+	/* one color has been freed, handle overflow queue */
+	if (!list_empty(&wq->flusher_overflow)) {
+		/*
+		 * Assign the same color to all overflowed
+		 * flushers, advance work_color and append to
+		 * flusher_queue.  This is the start-to-wait
+		 * phase for these overflowed flushers.
+		 */
+		list_for_each_entry(tmp, &wq->flusher_overflow, list)
+			tmp->flush_color = wq->work_color;
 
-			wq->work_color = work_next_color(wq->work_color);
+		wq->work_color = work_next_color(wq->work_color);
 
-			list_splice_tail_init(&wq->flusher_overflow,
-					      &wq->flusher_queue);
-			flush_workqueue_prep_pwqs(wq, -1, wq->work_color);
-		}
+		list_splice_tail_init(&wq->flusher_overflow,
+				      &wq->flusher_queue);
+		flush_workqueue_prep_pwqs(wq, -1, wq->work_color);
+	}
 
-		if (list_empty(&wq->flusher_queue)) {
-			WARN_ON_ONCE(wq->flush_color != wq->work_color);
-			break;
-		}
+	if (list_empty(&wq->flusher_queue)) {
+		WARN_ON_ONCE(wq->flush_color != wq->work_color);
+		goto out_unlock;
+	}
 
-		/*
-		 * Need to flush more colors.  Make the next flusher
-		 * the new first flusher and arm pwqs.
-		 */
-		WARN_ON_ONCE(wq->flush_color == wq->work_color);
-		WARN_ON_ONCE(wq->flush_color != next->flush_color);
+	/*
+	 * Need to flush more colors.  Make the next flusher
+	 * the new first flusher and arm pwqs.
+	 */
+	WARN_ON_ONCE(wq->flush_color == wq->work_color);
+	WARN_ON_ONCE(wq->flush_color != next->flush_color);
 
-		list_del_init(&next->list);
-		wq->first_flusher = next;
+	list_del_init(&next->list);
+	wq->first_flusher = next;
 
-		flush_workqueue_prep_pwqs(wq, wq->flush_color, -1);
-		break;
-	}
+	flush_workqueue_prep_pwqs(wq, wq->flush_color, -1);
 
 out_unlock:
 	mutex_unlock(&wq->mutex);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher
  2014-05-26 11:58 [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher Lai Jiangshan
  2014-05-26 11:58 ` [PATCH 2/2] workqueue: remove the unneeded "while(true)" loop and adjust the indent Lai Jiangshan
@ 2014-05-27 14:24 ` Tejun Heo
  2014-05-27 14:27   ` Tejun Heo
  1 sibling, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2014-05-27 14:24 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

On Mon, May 26, 2014 at 07:58:12PM +0800, Lai Jiangshan wrote:
> This optimization saves one mutex_lock(&wq->mutex) due to the current
> first-flusher already held it.
> 
> This optimization reduces the cascading-latency, because the next flusher
> is not running currently, it will delay a little when we keep the next's
> responsibility for cascading.
> 
> This optimization may also have other benefits. However, it is slow-path
> and low-probability-hit case, and it is not good at these aspects:
>     1) it adds a special case and makes the code complex, bad for review.
>     2) it adds a special state for the first-flusher which is allowed to
>        be deprived. It causes a race and we have to check wq->first_flusher
>        again with mutex held: 4ce48b37bfed ("workqueue: fix race condition
>        in flush_workqueue()").

The original goal was replicating the wake up behavior of the existing
implementation.  It doesn't matter whether we have a couple more
lockings or somewhat more complex logic there but it *does* make
noticeable difference when it starts involving scheduling latencies.
They are multiple orders of magnitude longer after all.  Not quite the
same but synchronize_rcu() is similar and we've had several cases
where blind synchronize_rcu() invocations in userland visible paths
causing crippling latencies (e.g. SCSI scanning through non-existent
LUNs ending up taking tens of seconds in pathological cases).

So, I think hiding latencies which can easily in millisecs range is
important.  It isn't a performance optimization.  It almost becomes a
correctness issue when the problem is severely hit and the amount of
code simplification that we get from dropping this doesn't seem like
much.  As such, I'm not quite convinced I wanna apply this one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher
  2014-05-27 14:24 ` [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher Tejun Heo
@ 2014-05-27 14:27   ` Tejun Heo
  0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2014-05-27 14:27 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Tue, May 27, 2014 at 10:24:43AM -0400, Tejun Heo wrote:
> So, I think hiding latencies which can easily in millisecs range is
> important.  It isn't a performance optimization.  It almost becomes a
> correctness issue when the problem is severely hit and the amount of

Another way to think about it is considering flushing an event
notification mechanism.  Doing the above can reliably induce tens of
millisecs of latency in event delivery on the right combination of
work items and flush patterns, which isn't acceptable.  We sure still
can fail to such state when the flush color space is exhausted but
this is a lot easier to trigger.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-05-27 14:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-26 11:58 [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher Lai Jiangshan
2014-05-26 11:58 ` [PATCH 2/2] workqueue: remove the unneeded "while(true)" loop and adjust the indent Lai Jiangshan
2014-05-27 14:24 ` [PATCH 1/2] workqueue: always pass cascading responsibility to the next flusher Tejun Heo
2014-05-27 14:27   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).