From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756377AbZLUDCR (ORCPT ); Sun, 20 Dec 2009 22:02:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755662AbZLUDCQ (ORCPT ); Sun, 20 Dec 2009 22:02:16 -0500 Received: from hera.kernel.org ([140.211.167.34]:39179 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752424AbZLUDCP (ORCPT ); Sun, 20 Dec 2009 22:02:15 -0500 Message-ID: <4B2EE5A5.2030208@kernel.org> Date: Mon, 21 Dec 2009 12:04:05 +0900 From: Tejun Heo User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Peter Zijlstra CC: torvalds@linux-foundation.org, awalls@radix.net, linux-kernel@vger.kernel.org, jeff@garzik.org, mingo@elte.hu, akpm@linux-foundation.org, jens.axboe@oracle.com, rusty@rustcorp.com.au, cl@linux-foundation.org, dhowells@redhat.com, arjan@linux.intel.com, avi@redhat.com, johannes@sipsolutions.net, andi@firstfloor.org Subject: Re: workqueue thing References: <1261141088-2014-1-git-send-email-tj@kernel.org> <1261143924.20899.169.camel@laptop> In-Reply-To: <1261143924.20899.169.camel@laptop> X-Enigmail-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On 12/18/2009 10:45 PM, Peter Zijlstra wrote: >> r1. The first design goal of cmwq is solving the issues the current >> workqueue implementation has including hard to detect >> deadlocks, > > lockdep is quite proficient at finding these these days. I've been thinking there are cases which the current lockdep annotations can't detect but I can't think of any. But still when these possible deadlocks are detected, the only solution we now have is to put one into a separate workqueue. As it often is an overkill to create multithread workqueue for that single work, it usually ends up being a singlethread one, which often isn't optimal. With cmwq, this problem is gone. >> unexpectedly long latencies caused by long running >> works which share the workqueue and excessive number of worker >> threads necessitated by each workqueue having its own workers. > > works shouldn't be long running to begin with Let's discuss this below. >> cmwq is cpu affine because its target workloads are not cpu >> intensive. Most works are context hungry not cpu cycle hungry >> and as such providing the necessary context (or concurrency) >> from the local CPU is the most efficient way to serve them. > > Things cannot be not cpu intensive and long running. What are you talking about? There's a huge world outside of CPUs and RAMs where taking seconds or even tens of seconds isn't too strange. SCSI/ATA exception conditions can easily take tens of seconds which in turn makes anything which may depend on IO may take tens of seconds. > And this design is patently unsuited for cpu intensive tasks, hence they > should not be long running. Burning a lot of CPU cycles in kernel is and must be very rare exceptions. Waiting for IO or other external events is far less so. Seriously, take a look at other async mechanisms we have - async, long works, SCSI EHs. They're there to provide concurrency so that they can wait for *EVENTS* not to burn CPU cycles. > The only way something can be not cpu intensive and long 'running' is if > it got blocked that long, and the right solution is to fix that > contention, things should not be blocked for seconds. IOs and events. Not CPUs or RAMs. >> The second design goal is to unify different async mechanisms >> in kernel. Although cmwq wouldn't be able to serve CPU cycle >> intensive workload, most in-kernel async mechanisms are there >> to provide context and concurrency and they all can be >> converted to use cmwq. > > Which specifically, the ones I'm aware of are mostly cpu intensive. Async which currently is only used to make ATA probing parallel. Long works which are used for fscache. SCSI/ATA EHs. ATA polling PIO and probing helper workers. To-be-implemented in-kernel media presence polling. xfs or other fs IO threads. The ones you're aware of are very different from the ones I'm aware of. The difference is probably coming from where we usually work on. CPU intensive ones require pretty different solution from the ones which just need the contexts to wait for events. For the latter, things can be made very mechanical as the optimal level of concurrency is easily defined as minimal level which is just enough to avoid blocking other works as the resource they're competeing for - the contexts - isn't scarce. And for this, the work interface is very well suited. For CPU intensive ones, it's more difficult as CPU cycles are in contention and things like fairness needs to be considered. It's a different class of problem. From what I can see, this class of problems are still smaller in volume than the event waiting class of problems. Increasing popularities in FS end-to-end verification and maybe encryption are likely to increase pressure here tho. This is gonna require a different solution where scheduler would play the core role. >> r2. The only thing necessary to support long running works is the >> ability to rebind workers to the cpu if it comes back online >> and allowing long running works will allow most existing worker >> pools to be served by cmwq and also make CPU down/up latencies >> more predictable. > > That's not necessary at all, and introduces quite a lot of ugly code. > > Furthermore, let me restate that having long running works is the > problem. I guess I explained this enough. When IO goes wrong, in extreme cases, it can easily take over thirty secs to recover and that's required by the hardware specifications, so anything which ends up waiting on IO can take a pretty long time. The only piece of code which is necessary to support that is the code necessary to migrate back tasks to CPUs when they come online again. It's not a lot of ugly code. >> r3. I don't think there is any way to implement shared worker pool >> without forking when more concurrency is required and the >> actual amount of forking would be low as cmwq scales the number >> of idle workers to keep according to the current concurrency >> level and uses rather long timeout (5min) for idlers. > > I'm still not convinced more concurrency is required. Hope my explanations helped convincing you. Thanks. -- tejun