From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [Qemu-devel] Re: [PATCH 2 of 5] add can_dma/post_dma for direct IO Date: Sun, 14 Dec 2008 21:49:11 +0200 Message-ID: <49456337.4000000@redhat.com> References: <4942B841.6010900@codemonkey.ws> <4942BDEE.7020003@codemonkey.ws> <49437EC8.6020506@redhat.com> <4943E68E.3030400@codemonkey.ws> <4944117C.6030404@redhat.com> <49442410.7020608@codemonkey.ws> <4944A1B5.5080300@redhat.com> <49455A33.207@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: qemu-devel@nongnu.org, Andrea Arcangeli , chrisw@redhat.com, Gerd Hoffmann , kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from mx2.redhat.com ([66.187.237.31]:35436 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751312AbYLNTtC (ORCPT ); Sun, 14 Dec 2008 14:49:02 -0500 In-Reply-To: <49455A33.207@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > I've thought quite a bit about it, and I'm becoming less convinced > that this sort of API is going to be helpful. > > I was thinking that we need to make one minor change to the map API I > proposed. It should return a mapped size as an output parameter and > take a flag as to whether partial mappings can be handled. The effect > would be that you never bounce to RAM which means that you can also > quite accurately determine the maximum amount of bouncing (it should > be proportional to the amount of MMIO memory that's registered). That's pointless; cirrus for example has 8MB of mmio while a cpu-to-vram blit is in progress, and some random device we'll add tomorrow could easily introduce more. Our APIs shouldn't depend on properties of emulated hardware, at least as much as possible. Of course you don't need the API if you replicate its functionality in 3 block devices (+ out of tree), 4+ network devices (+1 out of tree), and any other new dma client that comes along. But I think that coding the bouncing and rescheduling logic in all of those devices is a mistake, for reasons I don't even wish to point out. Just as an example: as presented, the maximum bounce memory allocated is equal to the number of concurrent requests issues by all dma capable devices multiplied by the maximum dma buffer size. Should we consider this to be a deficiency (and I do), it is easy to modify dma.c to limit the amount of allocated memory and defer requests when we are out. It would be a horrible pain to do this using map()/unmap() alone. The logic in dma.c cannot be made to go away. You can either put it in the dma clients, or in map()/unmap(), or leave it in dma.c. While putting it in map()/unmap() is possible (though undesirable IMO), letting the dma clients handle these details is a big mistake. I'll enumerate the functions that dma.c provides: - convert guest physical addresses to host virtual addresses - construct an iovec for scatter/gather - handle guest physical addresses for which no host virtual addresses exist, while controlling memory use - take care of the dirty bit - provide a place for adding hooks to hardware that can modify dma operations generically (emulated iommus, transforming dma engines) I believe that a dma api that fails to address all of these requirements is trying to solve too few problems at the same time, and will either cause dma clients to be unduly complicated, or will require rewriting. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LBwxb-0001N6-V9 for qemu-devel@nongnu.org; Sun, 14 Dec 2008 14:49:04 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LBwxZ-0001Lf-Sj for qemu-devel@nongnu.org; Sun, 14 Dec 2008 14:49:03 -0500 Received: from [199.232.76.173] (port=45175 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LBwxZ-0001Lc-Po for qemu-devel@nongnu.org; Sun, 14 Dec 2008 14:49:01 -0500 Received: from mx2.redhat.com ([66.187.237.31]:36328) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LBwxZ-0002wI-9H for qemu-devel@nongnu.org; Sun, 14 Dec 2008 14:49:01 -0500 Message-ID: <49456337.4000000@redhat.com> Date: Sun, 14 Dec 2008 21:49:11 +0200 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH 2 of 5] add can_dma/post_dma for direct IO References: <4942B841.6010900@codemonkey.ws> <4942BDEE.7020003@codemonkey.ws> <49437EC8.6020506@redhat.com> <4943E68E.3030400@codemonkey.ws> <4944117C.6030404@redhat.com> <49442410.7020608@codemonkey.ws> <4944A1B5.5080300@redhat.com> <49455A33.207@codemonkey.ws> In-Reply-To: <49455A33.207@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Andrea Arcangeli , chrisw@redhat.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, Gerd Hoffmann Anthony Liguori wrote: > I've thought quite a bit about it, and I'm becoming less convinced > that this sort of API is going to be helpful. > > I was thinking that we need to make one minor change to the map API I > proposed. It should return a mapped size as an output parameter and > take a flag as to whether partial mappings can be handled. The effect > would be that you never bounce to RAM which means that you can also > quite accurately determine the maximum amount of bouncing (it should > be proportional to the amount of MMIO memory that's registered). That's pointless; cirrus for example has 8MB of mmio while a cpu-to-vram blit is in progress, and some random device we'll add tomorrow could easily introduce more. Our APIs shouldn't depend on properties of emulated hardware, at least as much as possible. Of course you don't need the API if you replicate its functionality in 3 block devices (+ out of tree), 4+ network devices (+1 out of tree), and any other new dma client that comes along. But I think that coding the bouncing and rescheduling logic in all of those devices is a mistake, for reasons I don't even wish to point out. Just as an example: as presented, the maximum bounce memory allocated is equal to the number of concurrent requests issues by all dma capable devices multiplied by the maximum dma buffer size. Should we consider this to be a deficiency (and I do), it is easy to modify dma.c to limit the amount of allocated memory and defer requests when we are out. It would be a horrible pain to do this using map()/unmap() alone. The logic in dma.c cannot be made to go away. You can either put it in the dma clients, or in map()/unmap(), or leave it in dma.c. While putting it in map()/unmap() is possible (though undesirable IMO), letting the dma clients handle these details is a big mistake. I'll enumerate the functions that dma.c provides: - convert guest physical addresses to host virtual addresses - construct an iovec for scatter/gather - handle guest physical addresses for which no host virtual addresses exist, while controlling memory use - take care of the dirty bit - provide a place for adding hooks to hardware that can modify dma operations generically (emulated iommus, transforming dma engines) I believe that a dma api that fails to address all of these requirements is trying to solve too few problems at the same time, and will either cause dma clients to be unduly complicated, or will require rewriting. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.