From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932855AbXBERCf (ORCPT ); Mon, 5 Feb 2007 12:02:35 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932846AbXBERCf (ORCPT ); Mon, 5 Feb 2007 12:02:35 -0500 Received: from agminet01.oracle.com ([141.146.126.228]:53697 "EHLO agminet01.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932838AbXBERCe (ORCPT ); Mon, 5 Feb 2007 12:02:34 -0500 In-Reply-To: <20070202222110.GA1212@elte.hu> References: <20070201083611.GC18233@elte.hu> <20070202104900.GA13941@elte.hu> <20070202222110.GA1212@elte.hu> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <87DE673C-92A0-4401-8DE5-BDC2C08B5F41@oracle.com> Cc: Linus Torvalds , linux-kernel@vger.kernel.org, linux-aio@kvack.org, Suparna Bhattacharya , Benjamin LaHaise Content-Transfer-Encoding: 7bit From: Zach Brown Subject: Re: [PATCH 2 of 4] Introduce i386 fibril scheduling Date: Mon, 5 Feb 2007 12:02:05 -0500 To: Ingo Molnar X-Mailer: Apple Mail (2.752.3) X-Brightmail-Tracker: AAAAAQAAAAI= X-Whitelist: TRUE Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org > ok, i think i noticed another misunderstanding. The kernel thread > based > scheme i'm suggesting would /not/ 'switch' to another kernel thread in > the cached case, by default. It would just execute in the original > context (as if it were a synchronous syscall), and the switch to a > kernel thread from the pool would only occur /if/ the context is about > to block. (this 'switch' thing would be done by the scheduler) Yeah, this is what I imagined when you described doing this with threads instead of these 'fibril' things. It sounds like you're suggesting that we keep the 1:1 relationship between task_struct and thread_info. That would avoid the risks that the current fibril approach brings. It insists that all of task_struct is shared between concurrent fibrils (even if only between blocking points). As I understand what Ingo is suggesting, we'd instead only explicitly share the fields that we migrate (copy or get a reference) as we move the stack from the submitting task_struct to a waiting_task struct as the submission blocks. We trade initial effort to make things safe in the presence of universal sharing for effort to introduce sharing as people notice deficient behaviour. If that's the way we prefer to go, I'm cool with that. I might have gone slightly nuts in preferring *identical* sync and async behaviour. The fast path would look almost identical to the existing fibril switch. We'd just have a few more fields to sync up between the two task_structs. Ingo, am I getting this right? This sounds pretty straight forward to prototype from the current patches. I can certainly give it a try. > it's quite cheap to 'flip' it to under any arbitrary user-space > context: > change its thread_info->task pointer to the user-space context's task > struct, copy the mm pointer, the fs pointer to the "worker thread", > switch the thread_info, update ptregs - done. Hm? Or maybe you're talking about having concurrent executing thread_info's pointing to the user-space submitting task_struct? That really does sound like the current fibril approach, with even more sharing of thread_info's that might be executing on other cpus? Either way, I want to give it a try. If we can measure it performing reasonably in the cached case then I think everyone's happy? > is not part of the signal set. (Although it might make sense to make > such async syscalls interruptible, just like any syscall.) I think we all agree that they have to be interruptible by now, right? If for no other reason than to interrupt pending poll with no timeout, say, as the task exits.. > The 'pool' of kernel threads doesnt even have to be per-task, it > can be > a natural per-CPU thing Yeah, absolutely. - z