From: Alexey Kardashevskiy <aik@ozlabs.ru> To: Alistair Popple <alistair@popple.id.au> Cc: linuxppc-dev@lists.ozlabs.org, David Gibson <david@gibson.dropbear.id.au>, kvm@vger.kernel.org, Alex Williamson <alex.williamson@redhat.com>, Oliver O'Halloran <oohall@gmail.com> Subject: Re: [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window at 4GB Date: Mon, 2 Dec 2019 16:58:15 +1100 Message-ID: <45175dc2-8ed4-6e96-ff69-44980f3d1951@ozlabs.ru> (raw) In-Reply-To: <22858805.RAHADn2P79@townsend> On 02/12/2019 16:36, Alistair Popple wrote: > On Monday, 2 December 2019 12:59:49 PM AEDT Alexey Kardashevskiy wrote: >> Here is an attempt to support bigger DMA space for devices >> supporting DMA masks less than 59 bits (GPUs come into mind >> first). POWER9 PHBs have an option to map 2 windows at 0 >> and select a windows based on DMA address being below or above >> 4GB. >> >> This adds the "iommu=iommu_bypass" kernel parameter and > > Would it be possible to just enable this by default if the platform supports > it? Are there any downsides? It changes the second DMA window location which is now assumed by QEMU to be at 0x800.0000.0000.0000 and I do not see an easy way to work around this. For example, we start QEMU without VFIO but with emulated XHCI which will ask for DDW, we (QEMU) have to pick a window location but then we have to stick to it and if a user later hotplugs an VFIO-PCI, that physical IOMMU has to support the previously selected DMA window address; otherwise hotplug is going to fail. The question is how to tell QEMU about this new offset and what we do about migration from P8 (which let's say did have a VFIO device which we unplug before the migration) to P9 with a prospect of hotplugging an VFIO device but this time with this GTE4GB bit set. > Adding it as an option seems like it would make > things harder to support and reduces the amount of testing/use it would get. Yeah, this why this is an RFC... >> supports VFIO+pseries machine - current this requires telling >> upstream+unmodified QEMU about this via >> -global spapr-pci-host-bridge.dma64_win_addr=0x100000000 >> or per-phb property. 4/4 advertises the new option but >> there is no automation around it in QEMU (should it be?). >> >> For now it is either 1<<59 or 4GB mode; dynamic switching is >> not supported (could be via sysfs). >> >> This is based on sha1 >> a6ed68d6468b Linus Torvalds "Merge tag 'drm-next-2019-11-27' of git:// > anongit.freedesktop.org/drm/drm". > > Are you sure? Almost. It should have been HEAD^^^^^..HEAD instead of HEAD^^^^..HEAD :) I've posted 00/4 to the thread now, sorry about that. Thanks, > I am getting the following rejected hunk trying to apply the > first patch in the series: > > --- arch/powerpc/platforms/powernv/pci-ioda.c > +++ arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2349,15 +2349,10 @@ static void pnv_pci_ioda2_set_bypass(struct > pnv_ioda_pe *pe, bool enable) > pe->tce_bypass_enabled = enable; > } > > -static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group, > - int num, __u32 page_shift, __u64 window_size, __u32 levels, > +static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset, > + __u32 page_shift, __u64 window_size, __u32 levels, > bool alloc_userspace_copy, struct iommu_table **ptbl) > { > - struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe, > - table_group); > - int nid = pe->phb->hose->node; > - __u64 bus_offset = num ? > - pe->table_group.tce64_start : table_group->tce32_start; > long ret; > struct iommu_table *tbl; > > - Alistair > >> Please comment. Thanks. >> >> >> >> Alexey Kardashevskiy (4): >> powerpc/powernv/ioda: Rework for huge DMA window at 4GB >> powerpc/powernv/ioda: Allow smaller TCE table levels >> powerpc/powernv/phb4: Add 4GB IOMMU bypass mode >> vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB >> >> arch/powerpc/include/asm/iommu.h | 1 + >> arch/powerpc/include/asm/opal-api.h | 11 +- >> arch/powerpc/include/asm/opal.h | 2 + >> arch/powerpc/platforms/powernv/pci.h | 1 + >> include/uapi/linux/vfio.h | 2 + >> arch/powerpc/platforms/powernv/opal-call.c | 2 + >> arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 219 ++++++++++++++---- >> drivers/vfio/vfio_iommu_spapr_tce.c | 10 +- >> 9 files changed, 202 insertions(+), 50 deletions(-) >> >> > > > > -- Alexey
next prev parent reply index Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-12-02 1:59 Alexey Kardashevskiy 2019-12-02 1:59 ` [PATCH kernel RFC 1/4] powerpc/powernv/ioda: Rework for " Alexey Kardashevskiy 2019-12-02 1:59 ` [PATCH kernel RFC 2/4] powerpc/powernv/ioda: Allow smaller TCE table levels Alexey Kardashevskiy 2019-12-02 1:59 ` [PATCH kernel RFC 3/4] powerpc/powernv/phb4: Add 4GB IOMMU bypass mode Alexey Kardashevskiy 2019-12-02 1:59 ` [PATCH kernel RFC 4/4] vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB Alexey Kardashevskiy 2019-12-02 5:36 ` [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window " Alistair Popple 2019-12-02 5:58 ` Alexey Kardashevskiy [this message] 2019-12-02 5:51 ` [PATCH kernel RFC 00/4] powerpc/powernv/ioda: Move TCE bypass base to PE Alexey Kardashevskiy 2020-01-10 4:18 ` [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window at 4GB Alexey Kardashevskiy 2020-01-23 0:53 ` Alexey Kardashevskiy 2020-01-23 1:17 ` David Gibson 2020-01-23 8:42 ` Alexey Kardashevskiy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=45175dc2-8ed4-6e96-ff69-44980f3d1951@ozlabs.ru \ --to=aik@ozlabs.ru \ --cc=alex.williamson@redhat.com \ --cc=alistair@popple.id.au \ --cc=david@gibson.dropbear.id.au \ --cc=kvm@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=oohall@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
KVM Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \ kvm@vger.kernel.org public-inbox-index kvm Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.kvm AGPL code for this site: git clone https://public-inbox.org/public-inbox.git