From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Date: Fri, 12 Dec 2008 15:13:33 +0100 Message-ID: <20081212141333.GJ6809@random.random> References: <20081211131222.GA14908@random.random> <494130B5.2080800@redhat.com> <20081211155335.GE14908@random.random> <49413B9C.3030703@redhat.com> <20081211164947.GD6809@random.random> <49414BC9.5090905@redhat.com> <20081211181116.GE6809@random.random> <20081212082309.GI23742@kernel.dk> <20081212115133.GI6809@random.random> <20081212115420.GR23742@kernel.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: qemu-devel@nongnu.org, Gerd Hoffmann , kvm-devel To: Jens Axboe Return-path: Received: from mx2.redhat.com ([66.187.237.31]:45295 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757040AbYLLONr (ORCPT ); Fri, 12 Dec 2008 09:13:47 -0500 Content-Disposition: inline In-Reply-To: <20081212115420.GR23742@kernel.dk> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, Dec 12, 2008 at 12:54:21PM +0100, Jens Axboe wrote: > I agree completely. The buffered aio patches got pretty involved though, > it wasn't real pretty in the end. So it never got merged. Looks like the > most realistic way forward is some variant of syslet (or the acall stuff > that Zach has been working on), which is largely a cop out and will > never perform as well. It'll at least perform better a brand new userland pool of threads for each task that needs aio functionality, and it can be later optimized if we want ;). But I'm surprised, the aio patches in 2.4 were very clean, we didn't have to break filesystems, it was really a nice done work, enterprise quality as demonstrated by the several databases running on it for years. Ironically the O_DIRECT part didn't work at the time... because effectively the O_DIRECT part is more difficult. So 2.6 has the hard stuff done and misses the simpler stuff. I guess the simpler stuff is harder to merge as it has more users. Well I hope it'll be fixed... for kvm/qemu we definitely require aio for buffered reads too (buffered writes aren't a big deal but reads are). For the parent images it makes sense to run them in buffered mode even on servers using O_DIRECT, so basically we can't use linux-aio until this is fixed somehow. In the meantime I think it'd be better to -EINVAL (so the userland thread can fallback to userland thread pool) instead of just behaving synchronously that can break GUI and interactive behavior... > I added CLONE_IO some time ago to avoid that, so it's perfectly possible > to share cfq io contexts with threads or processes even in userspace! It's available in recent kernels I see! so the fix is easy. Only problem is how to pass CLONE_IO to pthread_create... We'll have to make a linux-only change and call clone by hand under some #ifdef CLONE_IO. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LB8m1-0000wi-Ae for qemu-devel@nongnu.org; Fri, 12 Dec 2008 09:13:45 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LB8lz-0000vG-So for qemu-devel@nongnu.org; Fri, 12 Dec 2008 09:13:44 -0500 Received: from [199.232.76.173] (port=52517 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LB8lz-0000vA-LL for qemu-devel@nongnu.org; Fri, 12 Dec 2008 09:13:43 -0500 Received: from mx2.redhat.com ([66.187.237.31]:59971) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LB8ly-000121-AK for qemu-devel@nongnu.org; Fri, 12 Dec 2008 09:13:43 -0500 Date: Fri, 12 Dec 2008 15:13:33 +0100 From: Andrea Arcangeli Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Message-ID: <20081212141333.GJ6809@random.random> References: <20081211131222.GA14908@random.random> <494130B5.2080800@redhat.com> <20081211155335.GE14908@random.random> <49413B9C.3030703@redhat.com> <20081211164947.GD6809@random.random> <49414BC9.5090905@redhat.com> <20081211181116.GE6809@random.random> <20081212082309.GI23742@kernel.dk> <20081212115133.GI6809@random.random> <20081212115420.GR23742@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081212115420.GR23742@kernel.dk> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jens Axboe Cc: qemu-devel@nongnu.org, kvm-devel , Gerd Hoffmann On Fri, Dec 12, 2008 at 12:54:21PM +0100, Jens Axboe wrote: > I agree completely. The buffered aio patches got pretty involved though, > it wasn't real pretty in the end. So it never got merged. Looks like the > most realistic way forward is some variant of syslet (or the acall stuff > that Zach has been working on), which is largely a cop out and will > never perform as well. It'll at least perform better a brand new userland pool of threads for each task that needs aio functionality, and it can be later optimized if we want ;). But I'm surprised, the aio patches in 2.4 were very clean, we didn't have to break filesystems, it was really a nice done work, enterprise quality as demonstrated by the several databases running on it for years. Ironically the O_DIRECT part didn't work at the time... because effectively the O_DIRECT part is more difficult. So 2.6 has the hard stuff done and misses the simpler stuff. I guess the simpler stuff is harder to merge as it has more users. Well I hope it'll be fixed... for kvm/qemu we definitely require aio for buffered reads too (buffered writes aren't a big deal but reads are). For the parent images it makes sense to run them in buffered mode even on servers using O_DIRECT, so basically we can't use linux-aio until this is fixed somehow. In the meantime I think it'd be better to -EINVAL (so the userland thread can fallback to userland thread pool) instead of just behaving synchronously that can break GUI and interactive behavior... > I added CLONE_IO some time ago to avoid that, so it's perfectly possible > to share cfq io contexts with threads or processes even in userspace! It's available in recent kernels I see! so the fix is easy. Only problem is how to pass CLONE_IO to pthread_create... We'll have to make a linux-only change and call clone by hand under some #ifdef CLONE_IO.