From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Date: Fri, 12 Dec 2008 18:09:16 +0100 Message-ID: <20081212170916.GO6809@random.random> References: <493E965E.5050701@us.ibm.com> <20081210164401.GF18814@random.random> <493FFAB6.2000106@codemonkey.ws> <493FFC8E.9080802@redhat.com> <49400F69.8080707@codemonkey.ws> <20081210190810.GG18814@random.random> <20081212142435.GL6809@random.random> <494276CD.6060904@codemonkey.ws> <20081212154418.GM6809@random.random> <49429629.20309@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Gerd Hoffmann , qemu-devel@nongnu.org, kvm-devel To: Anthony Liguori Return-path: Received: from mx2.redhat.com ([66.187.237.31]:52102 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757773AbYLLRJW (ORCPT ); Fri, 12 Dec 2008 12:09:22 -0500 Content-Disposition: inline In-Reply-To: <49429629.20309@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, Dec 12, 2008 at 10:49:45AM -0600, Anthony Liguori wrote: > I meant, if you wanted to pass a file descriptor as a raw device. So: > > qemu -hda raw:fd=4 > > Or something like that. We don't support this today. ah ok. > I think bouncing the iov and just using pread/pwrite may be our best bet. > It means memory allocation but we can cap it. Since we're using threads, It's already capped. However currently it generates an iovec, but we've simply to check the iovcnt to be 1, if it's 1 we pread from iov.iov_base, iov.iov_len. The dma api will take care to enforce iovcnt to be 1 for the iovec if preadv/pwritev isn't detected at compile time. > we just can force a thread to sleep until memory becomes available so it's > actually pretty straight forward. There's no way to detect that and wait for memory, it'd sigkill before you can check... at least with the default overcommit. The way the dma api works, is that it doesn't send a mega large writev, but send it in pieces capped by the max buffer size, with many iovecs with iovcnt = 1. > We can use libaio on older Linux's to simulate preadv/pwritev. Use the > proper syscalls on newer kernels, on BSDs, and bounce everything else. Given READV/WRITEV aren't available in not very recent kernels and given that without O_DIRECT each iocb will become synchronous, we can't use the libaio. Also once they fix linux-aio, if we do that, the iocb logic would need to be largely refactored. So I'm not sure if it worth it as it can't handle 2.6.16-18 when O_DIRECT is disabled (when O_DIRECT is enabled we could just build an array of linear iocb). From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LBBVz-0006SZ-IE for qemu-devel@nongnu.org; Fri, 12 Dec 2008 12:09:23 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LBBVy-0006S3-QU for qemu-devel@nongnu.org; Fri, 12 Dec 2008 12:09:23 -0500 Received: from [199.232.76.173] (port=47628 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LBBVy-0006Rx-Np for qemu-devel@nongnu.org; Fri, 12 Dec 2008 12:09:22 -0500 Received: from mx2.redhat.com ([66.187.237.31]:49623) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LBBVy-0000Sx-00 for qemu-devel@nongnu.org; Fri, 12 Dec 2008 12:09:22 -0500 Date: Fri, 12 Dec 2008 18:09:16 +0100 From: Andrea Arcangeli Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Message-ID: <20081212170916.GO6809@random.random> References: <493E965E.5050701@us.ibm.com> <20081210164401.GF18814@random.random> <493FFAB6.2000106@codemonkey.ws> <493FFC8E.9080802@redhat.com> <49400F69.8080707@codemonkey.ws> <20081210190810.GG18814@random.random> <20081212142435.GL6809@random.random> <494276CD.6060904@codemonkey.ws> <20081212154418.GM6809@random.random> <49429629.20309@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49429629.20309@codemonkey.ws> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Gerd Hoffmann , kvm-devel , qemu-devel@nongnu.org On Fri, Dec 12, 2008 at 10:49:45AM -0600, Anthony Liguori wrote: > I meant, if you wanted to pass a file descriptor as a raw device. So: > > qemu -hda raw:fd=4 > > Or something like that. We don't support this today. ah ok. > I think bouncing the iov and just using pread/pwrite may be our best bet. > It means memory allocation but we can cap it. Since we're using threads, It's already capped. However currently it generates an iovec, but we've simply to check the iovcnt to be 1, if it's 1 we pread from iov.iov_base, iov.iov_len. The dma api will take care to enforce iovcnt to be 1 for the iovec if preadv/pwritev isn't detected at compile time. > we just can force a thread to sleep until memory becomes available so it's > actually pretty straight forward. There's no way to detect that and wait for memory, it'd sigkill before you can check... at least with the default overcommit. The way the dma api works, is that it doesn't send a mega large writev, but send it in pieces capped by the max buffer size, with many iovecs with iovcnt = 1. > We can use libaio on older Linux's to simulate preadv/pwritev. Use the > proper syscalls on newer kernels, on BSDs, and bounce everything else. Given READV/WRITEV aren't available in not very recent kernels and given that without O_DIRECT each iocb will become synchronous, we can't use the libaio. Also once they fix linux-aio, if we do that, the iocb logic would need to be largely refactored. So I'm not sure if it worth it as it can't handle 2.6.16-18 when O_DIRECT is disabled (when O_DIRECT is enabled we could just build an array of linear iocb).