linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org, awalls@radix.net,
	linux-kernel@vger.kernel.org, jeff@garzik.org, mingo@elte.hu,
	akpm@linux-foundation.org, jens.axboe@oracle.com,
	rusty@rustcorp.com.au, cl@linux-foundation.org,
	dhowells@redhat.com, arjan@linux.intel.com, avi@redhat.com,
	johannes@sipsolutions.net, andi@firstfloor.org
Subject: Re: workqueue thing
Date: Mon, 21 Dec 2009 12:04:05 +0900	[thread overview]
Message-ID: <4B2EE5A5.2030208@kernel.org> (raw)
In-Reply-To: <1261143924.20899.169.camel@laptop>

Hello,

On 12/18/2009 10:45 PM, Peter Zijlstra wrote:
>>    r1. The first design goal of cmwq is solving the issues the current
>>        workqueue implementation has including hard to detect
>>        deadlocks, 
> 
> lockdep is quite proficient at finding these these days.

I've been thinking there are cases which the current lockdep
annotations can't detect but I can't think of any.  But still when
these possible deadlocks are detected, the only solution we now have
is to put one into a separate workqueue.  As it often is an overkill
to create multithread workqueue for that single work, it usually ends
up being a singlethread one, which often isn't optimal.  With cmwq,
this problem is gone.

>> 	 unexpectedly long latencies caused by long running
>>        works which share the workqueue and excessive number of worker
>>        threads necessitated by each workqueue having its own workers.
> 
> works shouldn't be long running to begin with

Let's discuss this below.

>>        cmwq is cpu affine because its target workloads are not cpu
>>        intensive.  Most works are context hungry not cpu cycle hungry
>>        and as such providing the necessary context (or concurrency)
>>        from the local CPU is the most efficient way to serve them.
> 
> Things cannot be not cpu intensive and long running.

What are you talking about?  There's a huge world outside of CPUs and
RAMs where taking seconds or even tens of seconds isn't too strange.
SCSI/ATA exception conditions can easily take tens of seconds which in
turn makes anything which may depend on IO may take tens of seconds.

> And this design is patently unsuited for cpu intensive tasks, hence they
> should not be long running.

Burning a lot of CPU cycles in kernel is and must be very rare
exceptions.  Waiting for IO or other external events is far less so.
Seriously, take a look at other async mechanisms we have - async, long
works, SCSI EHs.  They're there to provide concurrency so that they
can wait for *EVENTS* not to burn CPU cycles.

> The only way something can be not cpu intensive and long 'running' is if
> it got blocked that long, and the right solution is to fix that
> contention, things should not be blocked for seconds.

IOs and events.  Not CPUs or RAMs.

>>        The second design goal is to unify different async mechanisms
>>        in kernel.  Although cmwq wouldn't be able to serve CPU cycle
>>        intensive workload, most in-kernel async mechanisms are there
>>        to provide context and concurrency and they all can be
>>        converted to use cmwq.
> 
> Which specifically, the ones I'm aware of are mostly cpu intensive.

Async which currently is only used to make ATA probing parallel.  Long
works which are used for fscache.  SCSI/ATA EHs.  ATA polling PIO and
probing helper workers.  To-be-implemented in-kernel media presence
polling.  xfs or other fs IO threads.

The ones you're aware of are very different from the ones I'm aware
of.  The difference is probably coming from where we usually work on.
CPU intensive ones require pretty different solution from the ones
which just need the contexts to wait for events.  For the latter,
things can be made very mechanical as the optimal level of concurrency
is easily defined as minimal level which is just enough to avoid
blocking other works as the resource they're competeing for - the
contexts - isn't scarce.  And for this, the work interface is very
well suited.

For CPU intensive ones, it's more difficult as CPU cycles are in
contention and things like fairness needs to be considered.  It's a
different class of problem.  From what I can see, this class of
problems are still smaller in volume than the event waiting class of
problems.  Increasing popularities in FS end-to-end verification and
maybe encryption are likely to increase pressure here tho.  This is
gonna require a different solution where scheduler would play the core
role.

>>    r2. The only thing necessary to support long running works is the
>>        ability to rebind workers to the cpu if it comes back online
>>        and allowing long running works will allow most existing worker
>>        pools to be served by cmwq and also make CPU down/up latencies
>>        more predictable.
> 
> That's not necessary at all, and introduces quite a lot of ugly code.
>
> Furthermore, let me restate that having long running works is the
> problem.

I guess I explained this enough.  When IO goes wrong, in extreme
cases, it can easily take over thirty secs to recover and that's
required by the hardware specifications, so anything which ends up
waiting on IO can take a pretty long time.  The only piece of code
which is necessary to support that is the code necessary to migrate
back tasks to CPUs when they come online again.  It's not a lot of
ugly code.

>>    r3. I don't think there is any way to implement shared worker pool
>>        without forking when more concurrency is required and the
>>        actual amount of forking would be low as cmwq scales the number
>>        of idle workers to keep according to the current concurrency
>>        level and uses rather long timeout (5min) for idlers.
> 
> I'm still not convinced more concurrency is required.

Hope my explanations helped convincing you.

Thanks.

-- 
tejun

  parent reply	other threads:[~2009-12-21  3:02 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-18 12:57 Tejun Heo
2009-12-18 12:57 ` [PATCH 01/27] sched: rename preempt_notifiers to sched_notifiers and refactor implementation Tejun Heo
2009-12-18 12:57 ` [PATCH 02/27] sched: refactor try_to_wake_up() Tejun Heo
2009-12-18 12:57 ` [PATCH 03/27] sched: implement __set_cpus_allowed() Tejun Heo
2009-12-18 12:57 ` [PATCH 04/27] sched: make sched_notifiers unconditional Tejun Heo
2009-12-18 12:57 ` [PATCH 05/27] sched: add wakeup/sleep sched_notifiers and allow NULL notifier ops Tejun Heo
2009-12-18 12:57 ` [PATCH 06/27] sched: implement try_to_wake_up_local() Tejun Heo
2009-12-18 12:57 ` [PATCH 07/27] acpi: use queue_work_on() instead of binding workqueue worker to cpu0 Tejun Heo
2009-12-18 12:57 ` [PATCH 08/27] stop_machine: reimplement without using workqueue Tejun Heo
2009-12-18 12:57 ` [PATCH 09/27] workqueue: misc/cosmetic updates Tejun Heo
2009-12-18 12:57 ` [PATCH 10/27] workqueue: merge feature parameters into flags Tejun Heo
2009-12-18 12:57 ` [PATCH 11/27] workqueue: define both bit position and mask for work flags Tejun Heo
2009-12-18 12:57 ` [PATCH 12/27] workqueue: separate out process_one_work() Tejun Heo
2009-12-18 12:57 ` [PATCH 13/27] workqueue: temporarily disable workqueue tracing Tejun Heo
2009-12-18 12:57 ` [PATCH 14/27] workqueue: kill cpu_populated_map Tejun Heo
2009-12-18 12:57 ` [PATCH 15/27] workqueue: update cwq alignement Tejun Heo
2009-12-18 12:57 ` [PATCH 16/27] workqueue: reimplement workqueue flushing using color coded works Tejun Heo
2009-12-18 12:57 ` [PATCH 17/27] workqueue: introduce worker Tejun Heo
2009-12-18 12:57 ` [PATCH 18/27] workqueue: reimplement work flushing using linked works Tejun Heo
2009-12-18 12:58 ` [PATCH 19/27] workqueue: implement per-cwq active work limit Tejun Heo
2009-12-18 12:58 ` [PATCH 20/27] workqueue: reimplement workqueue freeze using max_active Tejun Heo
2009-12-18 12:58 ` [PATCH 21/27] workqueue: introduce global cwq and unify cwq locks Tejun Heo
2009-12-18 12:58 ` [PATCH 22/27] workqueue: implement worker states Tejun Heo
2009-12-18 12:58 ` [PATCH 23/27] workqueue: reimplement CPU hotplugging support using trustee Tejun Heo
2009-12-18 12:58 ` [PATCH 24/27] workqueue: make single thread workqueue shared worker pool friendly Tejun Heo
2009-12-18 12:58 ` [PATCH 25/27] workqueue: use shared worklist and pool all workers per cpu Tejun Heo
2009-12-18 12:58 ` [PATCH 26/27] workqueue: implement concurrency managed dynamic worker pool Tejun Heo
2009-12-18 12:58 ` [PATCH 27/27] workqueue: increase max_active of keventd and kill current_is_keventd() Tejun Heo
2009-12-18 13:00 ` SUBJ: [RFC PATCHSET] concurrency managed workqueue, take#2 Tejun Heo
2009-12-18 13:03 ` Tejun Heo
2009-12-18 13:45 ` workqueue thing Peter Zijlstra
2009-12-18 13:50   ` Andi Kleen
2009-12-18 15:01     ` Arjan van de Ven
2009-12-21  3:19       ` Tejun Heo
2009-12-21  9:17       ` Jens Axboe
2009-12-21 10:35         ` Peter Zijlstra
2009-12-21 11:09         ` Andi Kleen
2009-12-21 11:17           ` Arjan van de Ven
2009-12-21 11:33             ` Andi Kleen
2009-12-21 13:18             ` Tejun Heo
2009-12-21 11:11         ` Arjan van de Ven
2009-12-21 13:22           ` Tejun Heo
2009-12-21 13:53             ` Arjan van de Ven
2009-12-21 14:19               ` Tejun Heo
2009-12-21 15:19                 ` Arjan van de Ven
2009-12-22  0:00                   ` Tejun Heo
2009-12-22 11:10                     ` Peter Zijlstra
2009-12-22 17:20                       ` Linus Torvalds
2009-12-22 17:47                         ` Peter Zijlstra
2009-12-22 18:07                           ` Andi Kleen
2009-12-22 18:20                             ` Peter Zijlstra
2009-12-23  8:17                             ` Stijn Devriendt
2009-12-23  8:43                               ` Peter Zijlstra
2009-12-23  9:01                                 ` Stijn Devriendt
2009-12-22 18:28                           ` Linus Torvalds
2009-12-23  8:06                             ` Johannes Berg
2009-12-23  3:37                           ` Tejun Heo
2009-12-23  6:52                             ` Herbert Xu
2009-12-23  8:00                               ` Steffen Klassert
2009-12-23  8:01                                 ` [PATCH 0/2] Parallel crypto/IPsec v7 Steffen Klassert
2009-12-23  8:03                                   ` [PATCH 1/2] padata: generic parallelization/serialization interface Steffen Klassert
2009-12-23  8:04                                   ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert
2010-01-07  5:39                                   ` [PATCH 0/2] Parallel crypto/IPsec v7 Herbert Xu
2010-01-16  9:44                                     ` David Miller
2009-12-18 15:30   ` workqueue thing Linus Torvalds
2009-12-18 15:39     ` Ingo Molnar
2009-12-18 15:39     ` Peter Zijlstra
2009-12-18 15:47       ` Linus Torvalds
2009-12-18 15:53         ` Peter Zijlstra
2009-12-21  3:04   ` Tejun Heo [this message]
2009-12-21  9:22     ` Peter Zijlstra
2009-12-21 13:30       ` Tejun Heo
2009-12-21 14:26         ` Peter Zijlstra
2009-12-21 23:50           ` Tejun Heo
2009-12-22 11:00             ` Peter Zijlstra
2009-12-22 11:03             ` Peter Zijlstra
2009-12-23  3:43               ` Tejun Heo
2009-12-22 11:04             ` Peter Zijlstra
2009-12-23  3:48               ` Tejun Heo
2009-12-22 11:06             ` Peter Zijlstra
2009-12-23  4:18               ` Tejun Heo
2009-12-23  4:42                 ` Linus Torvalds
2009-12-23  6:02                   ` Ingo Molnar
2009-12-23  6:13                     ` Jeff Garzik
2009-12-23  7:53                       ` Ingo Molnar
2009-12-23  8:41                       ` Peter Zijlstra
2009-12-23 10:25                         ` Jeff Garzik
2009-12-23 13:33                           ` Stefan Richter
2009-12-23 14:20                           ` Mark Brown
2009-12-23  7:09                     ` Tejun Heo
2009-12-23  8:01                       ` Ingo Molnar
2009-12-23  8:12                         ` Ingo Molnar
2009-12-23  8:32                           ` Tejun Heo
2009-12-23  8:42                             ` Ingo Molnar
2009-12-23  8:27                         ` Tejun Heo
2009-12-23  8:37                           ` Ingo Molnar
2009-12-23  8:49                             ` Tejun Heo
2009-12-23  8:49                               ` Ingo Molnar
2009-12-23  9:03                                 ` Tejun Heo
2009-12-23 13:40                             ` Stefan Richter
2009-12-23 13:43                               ` Stefan Richter
2009-12-23  8:25                       ` Arjan van de Ven
2009-12-23 13:00                     ` Stefan Richter
2009-12-23  8:31             ` Stijn Devriendt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B2EE5A5.2030208@kernel.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arjan@linux.intel.com \
    --cc=avi@redhat.com \
    --cc=awalls@radix.net \
    --cc=cl@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=jeff@garzik.org \
    --cc=jens.axboe@oracle.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).