From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751680AbZLUNbf (ORCPT ); Mon, 21 Dec 2009 08:31:35 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750999AbZLUNbe (ORCPT ); Mon, 21 Dec 2009 08:31:34 -0500 Received: from hera.kernel.org ([140.211.167.34]:50312 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750801AbZLUNbe (ORCPT ); Mon, 21 Dec 2009 08:31:34 -0500 Message-ID: <4B2F7879.2080901@kernel.org> Date: Mon, 21 Dec 2009 22:30:33 +0900 From: Tejun Heo User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20090915 SUSE/3.0b4-3.6 Thunderbird/3.0b4 MIME-Version: 1.0 To: Peter Zijlstra CC: torvalds@linux-foundation.org, awalls@radix.net, linux-kernel@vger.kernel.org, jeff@garzik.org, mingo@elte.hu, akpm@linux-foundation.org, jens.axboe@oracle.com, rusty@rustcorp.com.au, cl@linux-foundation.org, dhowells@redhat.com, arjan@linux.intel.com, avi@redhat.com, johannes@sipsolutions.net, andi@firstfloor.org Subject: Re: workqueue thing References: <1261141088-2014-1-git-send-email-tj@kernel.org> <1261143924.20899.169.camel@laptop> <4B2EE5A5.2030208@kernel.org> <1261387377.4314.37.camel@laptop> In-Reply-To: <1261387377.4314.37.camel@laptop> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Peter. On 12/21/2009 06:22 PM, Peter Zijlstra wrote: > On Mon, 2009-12-21 at 12:04 +0900, Tejun Heo wrote: >> When IO goes wrong, in extreme >> cases, it can easily take over thirty secs to recover and that's >> required by the hardware specifications, so anything which ends up >> waiting on IO can take a pretty long time. The only piece of code >> which is necessary to support that is the code necessary to migrate >> back tasks to CPUs when they come online again. It's not a lot of >> ugly code. > > Why does it need to get migrated back, there are no affinity promises if > you allow hotplug to continue, so it might as well complete and continue > on the other cpu. > > And yes, it is a lot of very ugly code. Migrating to online but !active CPU is necessary to call rescuers during CPU_DOWN_PREPARE which is necessary to guarantee forward progress during cpu down operation. Given that, the only extra code which is necessary purely for migrating back when a CPU comes back online is a few tens of lines of code which handles TRUSTEE_RELEASE case. That's not a lot. If we do it differently (ie. let unbound workers not process new works, just drain and let them die), it will take more code. I think you're primarily concerned with the scheduler modifications and think that the choose-between-two-masks on migration is ugly. I agree it's not the prettiest thing in this world but then again it's not a lot of code. The reason why it looks ugly is because the way migration is implemented and parameter is passed in. API-wise, I think making kthread_bind() synchronized against cpu onliness should be pretty clean. Thanks. -- tejun