From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933253AbXBEUjo (ORCPT ); Mon, 5 Feb 2007 15:39:44 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933428AbXBEUjo (ORCPT ); Mon, 5 Feb 2007 15:39:44 -0500 Received: from smtp.osdl.org ([65.172.181.24]:60659 "EHLO smtp.osdl.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933253AbXBEUjn (ORCPT ); Mon, 5 Feb 2007 15:39:43 -0500 Date: Mon, 5 Feb 2007 12:39:14 -0800 (PST) From: Linus Torvalds To: Davide Libenzi cc: Zach Brown , Ingo Molnar , Linux Kernel Mailing List , linux-aio@kvack.org, Suparna Bhattacharya , Benjamin LaHaise Subject: Re: [PATCH 2 of 4] Introduce i386 fibril scheduling In-Reply-To: Message-ID: References: <20070201083611.GC18233@elte.hu> <20070202104900.GA13941@elte.hu> <20070202222110.GA1212@elte.hu> <20070202235531.GA18904@elte.hu> <20070203082308.GA6748@elte.hu> <8CF4BE18-8EEF-4ACA-A4B4-B627ED3B4831@oracle.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 5 Feb 2007, Davide Libenzi wrote: > > No, that's *really* it ;) > > The cookie you pass, and the return code of the syscall. > If there other data transfered? Sure, but that data transfered during the > syscall processing, and handled by the syscall (filling up a sys_read > buffer just for example). Indeed. One word is *exactly* what a normal system call returns too. That said, normally we have a user-space library layer to turn that into the "errno + return value" thing, and in the case of async() calls we very basically wouldn't have that. So either: - we'd need to do it in the kernel (which is actually nasty, since different system calls have slightly different semantics - some don't return any error value at all, and negative numbers are real numbers) - we'd have to teach user space about the "negative errno" mechanism, in which case one word really is alwats enough. Quite frankly, I much prefer the second alternative. The "negative errno" thing has not only worked really really well inside the kernel, it's so obviously 100% superior to the standard UNIX "-1 + errno" approach that it's not even funny. To see why "negative errno" is better, just look at any threaded program, or look at any program that does multiple calls and needs to save the errno not from the last one, but from some earlier one (eg, it does a "close()" in between returning *its* error, and the real operation that we care about). > Did I miss something? The async() syscall will allow (with few > restrictions) to execute whatever syscall in an async fashion. An syscall > returns a result code (long). Plus, you need to pass back the > userspace-provided cookie of course. HOWEVER, they get returned differently. The cookie gets returned immediately, the system call result gets returned in-memory only after the async thing has actually completed. I would actually argue that it's not the kernel that should generate any cookie, but that user-space should *pass*in* the cookie it wants to, and the kernel should consider it a pointer to a 64-bit entity which is the return code. In other words, the only "cookie" we need is literally the pointer to the results. And that's obviously something that the user space has to set up anyway. So how about making the async interface be: // returns negative for error // zero for "synchronous" // positive kernel "wait for me" cookie for success long sys_async_submit( unsigned long flags, long *user_result_ptr, long syscall, unsigned long *args); and the "args" thing would literally just fill up the registers. The real downside here is that it's very architecture-specific this way, and that means that x86-64 (and other 64-bit ones) would need to have emulation layers for the 32-bit ones, but they likely need to do that *anyway*, so it's probably not a huge downside. The alternative is to: - make a new architecture-independent system call enumeration for the async interface - make everything use 64-bit values. Now, making an architecture-independent system call enumeration may actually make sense regardless, because it would allow sys_async() to have its own system call table and put the limitations and rules for those system calls there, instead of depending on the per-architecture system call table that tends to have some really architecture-specific details. Hmm? Linus