From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4) Date: Thu, 16 Sep 2010 11:20:43 +0200 Message-ID: <20100916092043.GB20864@redhat.com> References: <1283007298-10942-1-git-send-email-eduard.munteanu@linux360.ro> <20100913200120.GA20918@redhat.com> <20100916070616.GA7811@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: joro@8bytes.org, blauwirbel@gmail.com, paul@codesourcery.com, avi@redhat.com, anthony@codemonkey.ws, av1474@comtv.ru, yamahata@valinux.co.jp, kvm@vger.kernel.org, qemu-devel@nongnu.org To: Eduard - Gabriel Munteanu Return-path: Received: from mx1.redhat.com ([209.132.183.28]:58615 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751768Ab0IPJ1Y (ORCPT ); Thu, 16 Sep 2010 05:27:24 -0400 Content-Disposition: inline In-Reply-To: <20100916070616.GA7811@localhost> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Sep 16, 2010 at 10:06:16AM +0300, Eduard - Gabriel Munteanu wrote: > On Mon, Sep 13, 2010 at 10:01:20PM +0200, Michael S. Tsirkin wrote: > > So I think the following will give the idea of what an API > > might look like that will let us avoid the scary hacks in > > e.g. the ide layer and other generic layers that need to do DMA, > > without either binding us to pci, adding more complexity with > > callbacks, or losing type safety with casts and void*. > > > > Basically we have DMADevice that we can use container_of on > > to get a PCIDevice from, and DMAMmu that will get instanciated > > in a specific MMU. > > > > This is not complete code - just a header - I might complete > > this later if/when there's interest or hopefully someone interested > > in iommu emulation will. > > Hi, > > I personally like this approach better. It also seems to make poisoning > cpu_physical_memory_*() easier if we convert every device to this API. > We could then ban cpu_physical_memory_*(), perhaps by requiring a > #define and #ifdef-ing those declarations. > > > Notes: > > the IOMMU_PERM_RW code seem unused, so I replaced > > this with plain is_write. Is it ever useful? > > The original idea made provisions for stuff like full R/W memory maps. > In that case cpu_physical_memory_map() would call the translation / > checking function with perms == IOMMU_PERM_RW. That's not there yet so > it can be removed at the moment, especially since it only affects these > helpers. > > Also, I'm not sure if there are other sorts of accesses besides reads > and writes we want to check or translate. > > > It seems that invalidate callback should be able to > > get away with just a device, so I switched to that > > from a void pointer for type safety. > > Seems enough for the users I saw. > > I think this makes matters too complicated. Normally, a single DMADevice > should be embedded within a Device, No, DMADevice is a device that does DMA. So e.g. a PCI device would embed one. Remember, traslations are per device, right? DMAMmu is part of the iommu object. > so doing this makes it really > hard to invalidate a specific map when there are more of them. It forces > device code to act as a bus, provide fake 'DMADevice's for each map and > dispatch translation to the real DMATranslateFunc. I see no other way. > > If you really want more type-safety (although I think this is a case of > a true opaque identifying something only device code understands), I > have another proposal: have a DMAMap embedded in the opaque. Example > from dma-helpers.c: > > typedef struct { > DMADevice *owner; > [...] > } DMAMap; > > typedef struct { > [...] > DMAMap map; > [...] > } DMAAIOCB; > > /* The callback. */ > static void dma_bdrv_cancel(DMAMap *map) > { > DMAAIOCB *dbs = container_of(map, DMAAIOCB, map); > > [...] > } > > The upside is we only need to pass the DMAMap. That can also contain > details of the actual map in case the device wants to release only the > relevant range and remap the rest. Fine. Or maybe DMAAIOCB (just make some letters lower case: DMAIocb?). Everyone will use it anyway, right? > > I saw devices do stl_le_phys and such, these > > might need to be wrapped as well. > > stl_le_phys() is defined and used only by hw/eepro100.c. That's already > dealt with by converting the device. > I see. Need to get around to adding some prefix to it to make this clear. > Thanks, > Eduard > > > Signed-off-by: Michael S. Tsirkin > > > > --- > > > > diff --git a/hw/dma_rw.h b/hw/dma_rw.h > > new file mode 100644 > > index 0000000..d63fd17 > > --- /dev/null > > +++ b/hw/dma_rw.h > > @@ -0,0 +1,122 @@ > > +#ifndef DMA_RW_H > > +#define DMA_RW_H > > + > > +#include "qemu-common.h" > > + > > +/* We currently only have pci mmus, but using > > + a generic type makes it possible to use this > > + e.g. from the generic ide code without callbacks. */ > > +typedef uint64_t dma_addr_t; > > + > > +typedef struct DMAMmu DMAMmu; > > +typedef struct DMADevice DMADevice; > > + > > +typedef int DMATranslateFunc(DMAMmu *mmu, > > + DMADevice *dev, > > + dma_addr_t addr, > > + dma_addr_t *paddr, > > + dma_addr_t *len, > > + int is_write); > > + > > +typedef int DMAInvalidateMapFunc(DMADevice *); > > +struct DMAMmu { > > + /* invalidate, etc. */ > > + DmaTranslateFunc *translate; > > +}; > > + > > +struct DMADevice { > > + DMAMmu *mmu; > > + DMAInvalidateMapFunc *invalidate; > > +}; > > + > > +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *); > > + > > +static inline void dma_memory_rw(DMADevice *dev, > > + dma_addr_t addr, > > + void *buf, > > + uint32_t len, > > + int is_write) > > +{ > > + uint32_t plen; > > + /* Fast-path non-iommu. > > + * More importantly, makes it obvious what this function does. */ > > + if (!dev->mmu) { > > + cpu_physical_memory_rw(paddr, buf, plen, is_write); > > + return; > > + } > > + while (len) { > > + err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write); > > + if (err) { > > + return; > > + } > > + > > + /* The translation might be valid for larger regions. */ > > + if (plen > len) { > > + plen = len; > > + } > > + > > + cpu_physical_memory_rw(paddr, buf, plen, is_write); > > + > > + len -= plen; > > + addr += plen; > > + buf += plen; > > + } > > +} > > + > > +void *dma_memory_map(DMADevice *dev, > > + dma_addr_t addr, > > + uint32_t *len, > > + int is_write); > > +void dma_memory_unmap(DMADevice *dev, > > + void *buffer, > > + uint32_t len, > > + int is_write, > > + uint32_t access_len); > > + > > + > > ++#define DEFINE_DMA_LD(suffix, size) \ > > ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr) \ > > ++{ \ > > ++ int err; \ > > ++ target_phys_addr_t paddr, plen; \ > > ++ if (!dev->mmu) { \ > > ++ return ld##suffix##_phys(addr, val); \ > > ++ } \ > > ++ \ > > ++ err = dev->mmu->translate(dev->bus->iommu, dev, \ > > ++ addr, &paddr, &plen, IOMMU_PERM_READ); \ > > ++ if (err || (plen < size / 8)) \ > > ++ return 0; \ > > ++ \ > > ++ return ld##suffix##_phys(paddr); \ > > ++} > > ++ > > ++#define DEFINE_DMA_ST(suffix, size) \ > > ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val) \ > > ++{ \ > > ++ int err; \ > > ++ target_phys_addr_t paddr, plen; \ > > ++ \ > > ++ if (!dev->mmu) { \ > > ++ st##suffix##_phys(addr, val); \ > > ++ return; \ > > ++ } \ > > ++ err = dev->mmu->translate(dev->bus->iommu, dev, \ > > ++ addr, &paddr, &plen, IOMMU_PERM_WRITE); \ > > ++ if (err || (plen < size / 8)) \ > > ++ return; \ > > ++ \ > > ++ st##suffix##_phys(paddr, val); \ > > ++} > > + > > +DEFINE_DMA_LD(ub, 8) > > +DEFINE_DMA_LD(uw, 16) > > +DEFINE_DMA_LD(l, 32) > > +DEFINE_DMA_LD(q, 64) > > + > > +DEFINE_DMA_ST(b, 8) > > +DEFINE_DMA_ST(w, 16) > > +DEFINE_DMA_ST(l, 32) > > +DEFINE_DMA_ST(q, 64) > > + > > +#endif > > diff --git a/hw/pci.h b/hw/pci.h > > index 1c6075e..9737f0e 100644 > > --- a/hw/pci.h > > +++ b/hw/pci.h > > @@ -5,6 +5,7 @@ > > #include "qobject.h" > > > > #include "qdev.h" > > +#include "dma_rw.h" > > > > /* PCI includes legacy ISA access. */ > > #include "isa.h" > > @@ -119,6 +120,10 @@ enum { > > > > struct PCIDevice { > > DeviceState qdev; > > + > > + /* For devices that do DMA. */ > > + DMADevice dma; > > + > > /* PCI config space */ > > uint8_t *config; > > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=34218 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OwApP-0000fj-Ph for qemu-devel@nongnu.org; Thu, 16 Sep 2010 05:32:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OwAkU-0004Ja-7d for qemu-devel@nongnu.org; Thu, 16 Sep 2010 05:27:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:25757) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OwAkT-0004JQ-Ti for qemu-devel@nongnu.org; Thu, 16 Sep 2010 05:27:22 -0400 Date: Thu, 16 Sep 2010 11:20:43 +0200 From: "Michael S. Tsirkin" Message-ID: <20100916092043.GB20864@redhat.com> References: <1283007298-10942-1-git-send-email-eduard.munteanu@linux360.ro> <20100913200120.GA20918@redhat.com> <20100916070616.GA7811@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100916070616.GA7811@localhost> Subject: [Qemu-devel] Re: [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4) List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduard - Gabriel Munteanu Cc: kvm@vger.kernel.org, joro@8bytes.org, qemu-devel@nongnu.org, blauwirbel@gmail.com, yamahata@valinux.co.jp, paul@codesourcery.com, avi@redhat.com On Thu, Sep 16, 2010 at 10:06:16AM +0300, Eduard - Gabriel Munteanu wrote: > On Mon, Sep 13, 2010 at 10:01:20PM +0200, Michael S. Tsirkin wrote: > > So I think the following will give the idea of what an API > > might look like that will let us avoid the scary hacks in > > e.g. the ide layer and other generic layers that need to do DMA, > > without either binding us to pci, adding more complexity with > > callbacks, or losing type safety with casts and void*. > > > > Basically we have DMADevice that we can use container_of on > > to get a PCIDevice from, and DMAMmu that will get instanciated > > in a specific MMU. > > > > This is not complete code - just a header - I might complete > > this later if/when there's interest or hopefully someone interested > > in iommu emulation will. > > Hi, > > I personally like this approach better. It also seems to make poisoning > cpu_physical_memory_*() easier if we convert every device to this API. > We could then ban cpu_physical_memory_*(), perhaps by requiring a > #define and #ifdef-ing those declarations. > > > Notes: > > the IOMMU_PERM_RW code seem unused, so I replaced > > this with plain is_write. Is it ever useful? > > The original idea made provisions for stuff like full R/W memory maps. > In that case cpu_physical_memory_map() would call the translation / > checking function with perms == IOMMU_PERM_RW. That's not there yet so > it can be removed at the moment, especially since it only affects these > helpers. > > Also, I'm not sure if there are other sorts of accesses besides reads > and writes we want to check or translate. > > > It seems that invalidate callback should be able to > > get away with just a device, so I switched to that > > from a void pointer for type safety. > > Seems enough for the users I saw. > > I think this makes matters too complicated. Normally, a single DMADevice > should be embedded within a Device, No, DMADevice is a device that does DMA. So e.g. a PCI device would embed one. Remember, traslations are per device, right? DMAMmu is part of the iommu object. > so doing this makes it really > hard to invalidate a specific map when there are more of them. It forces > device code to act as a bus, provide fake 'DMADevice's for each map and > dispatch translation to the real DMATranslateFunc. I see no other way. > > If you really want more type-safety (although I think this is a case of > a true opaque identifying something only device code understands), I > have another proposal: have a DMAMap embedded in the opaque. Example > from dma-helpers.c: > > typedef struct { > DMADevice *owner; > [...] > } DMAMap; > > typedef struct { > [...] > DMAMap map; > [...] > } DMAAIOCB; > > /* The callback. */ > static void dma_bdrv_cancel(DMAMap *map) > { > DMAAIOCB *dbs = container_of(map, DMAAIOCB, map); > > [...] > } > > The upside is we only need to pass the DMAMap. That can also contain > details of the actual map in case the device wants to release only the > relevant range and remap the rest. Fine. Or maybe DMAAIOCB (just make some letters lower case: DMAIocb?). Everyone will use it anyway, right? > > I saw devices do stl_le_phys and such, these > > might need to be wrapped as well. > > stl_le_phys() is defined and used only by hw/eepro100.c. That's already > dealt with by converting the device. > I see. Need to get around to adding some prefix to it to make this clear. > Thanks, > Eduard > > > Signed-off-by: Michael S. Tsirkin > > > > --- > > > > diff --git a/hw/dma_rw.h b/hw/dma_rw.h > > new file mode 100644 > > index 0000000..d63fd17 > > --- /dev/null > > +++ b/hw/dma_rw.h > > @@ -0,0 +1,122 @@ > > +#ifndef DMA_RW_H > > +#define DMA_RW_H > > + > > +#include "qemu-common.h" > > + > > +/* We currently only have pci mmus, but using > > + a generic type makes it possible to use this > > + e.g. from the generic ide code without callbacks. */ > > +typedef uint64_t dma_addr_t; > > + > > +typedef struct DMAMmu DMAMmu; > > +typedef struct DMADevice DMADevice; > > + > > +typedef int DMATranslateFunc(DMAMmu *mmu, > > + DMADevice *dev, > > + dma_addr_t addr, > > + dma_addr_t *paddr, > > + dma_addr_t *len, > > + int is_write); > > + > > +typedef int DMAInvalidateMapFunc(DMADevice *); > > +struct DMAMmu { > > + /* invalidate, etc. */ > > + DmaTranslateFunc *translate; > > +}; > > + > > +struct DMADevice { > > + DMAMmu *mmu; > > + DMAInvalidateMapFunc *invalidate; > > +}; > > + > > +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *); > > + > > +static inline void dma_memory_rw(DMADevice *dev, > > + dma_addr_t addr, > > + void *buf, > > + uint32_t len, > > + int is_write) > > +{ > > + uint32_t plen; > > + /* Fast-path non-iommu. > > + * More importantly, makes it obvious what this function does. */ > > + if (!dev->mmu) { > > + cpu_physical_memory_rw(paddr, buf, plen, is_write); > > + return; > > + } > > + while (len) { > > + err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write); > > + if (err) { > > + return; > > + } > > + > > + /* The translation might be valid for larger regions. */ > > + if (plen > len) { > > + plen = len; > > + } > > + > > + cpu_physical_memory_rw(paddr, buf, plen, is_write); > > + > > + len -= plen; > > + addr += plen; > > + buf += plen; > > + } > > +} > > + > > +void *dma_memory_map(DMADevice *dev, > > + dma_addr_t addr, > > + uint32_t *len, > > + int is_write); > > +void dma_memory_unmap(DMADevice *dev, > > + void *buffer, > > + uint32_t len, > > + int is_write, > > + uint32_t access_len); > > + > > + > > ++#define DEFINE_DMA_LD(suffix, size) \ > > ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr) \ > > ++{ \ > > ++ int err; \ > > ++ target_phys_addr_t paddr, plen; \ > > ++ if (!dev->mmu) { \ > > ++ return ld##suffix##_phys(addr, val); \ > > ++ } \ > > ++ \ > > ++ err = dev->mmu->translate(dev->bus->iommu, dev, \ > > ++ addr, &paddr, &plen, IOMMU_PERM_READ); \ > > ++ if (err || (plen < size / 8)) \ > > ++ return 0; \ > > ++ \ > > ++ return ld##suffix##_phys(paddr); \ > > ++} > > ++ > > ++#define DEFINE_DMA_ST(suffix, size) \ > > ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val) \ > > ++{ \ > > ++ int err; \ > > ++ target_phys_addr_t paddr, plen; \ > > ++ \ > > ++ if (!dev->mmu) { \ > > ++ st##suffix##_phys(addr, val); \ > > ++ return; \ > > ++ } \ > > ++ err = dev->mmu->translate(dev->bus->iommu, dev, \ > > ++ addr, &paddr, &plen, IOMMU_PERM_WRITE); \ > > ++ if (err || (plen < size / 8)) \ > > ++ return; \ > > ++ \ > > ++ st##suffix##_phys(paddr, val); \ > > ++} > > + > > +DEFINE_DMA_LD(ub, 8) > > +DEFINE_DMA_LD(uw, 16) > > +DEFINE_DMA_LD(l, 32) > > +DEFINE_DMA_LD(q, 64) > > + > > +DEFINE_DMA_ST(b, 8) > > +DEFINE_DMA_ST(w, 16) > > +DEFINE_DMA_ST(l, 32) > > +DEFINE_DMA_ST(q, 64) > > + > > +#endif > > diff --git a/hw/pci.h b/hw/pci.h > > index 1c6075e..9737f0e 100644 > > --- a/hw/pci.h > > +++ b/hw/pci.h > > @@ -5,6 +5,7 @@ > > #include "qobject.h" > > > > #include "qdev.h" > > +#include "dma_rw.h" > > > > /* PCI includes legacy ISA access. */ > > #include "isa.h" > > @@ -119,6 +120,10 @@ enum { > > > > struct PCIDevice { > > DeviceState qdev; > > + > > + /* For devices that do DMA. */ > > + DMADevice dma; > > + > > /* PCI config space */ > > uint8_t *config; > >