Re: [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window at 4GB

From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alistair Popple <alistair@popple.id.au>
Cc: Oliver O'Halloran <oohall@gmail.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window at 4GB
Date: Mon, 2 Dec 2019 16:58:15 +1100	[thread overview]
Message-ID: <45175dc2-8ed4-6e96-ff69-44980f3d1951@ozlabs.ru> (raw)
In-Reply-To: <22858805.RAHADn2P79@townsend>

On 02/12/2019 16:36, Alistair Popple wrote:
> On Monday, 2 December 2019 12:59:49 PM AEDT Alexey Kardashevskiy wrote:
>> Here is an attempt to support bigger DMA space for devices
>> supporting DMA masks less than 59 bits (GPUs come into mind
>> first). POWER9 PHBs have an option to map 2 windows at 0
>> and select a windows based on DMA address being below or above
>> 4GB.
>>
>> This adds the "iommu=iommu_bypass" kernel parameter and
> 
> Would it be possible to just enable this by default if the platform supports 
> it? Are there any downsides?

It changes the second DMA window location which is now assumed by QEMU
to be at 0x800.0000.0000.0000 and I do not see an easy way to work
around this.

For example, we start QEMU without VFIO but with emulated XHCI which
will ask for DDW, we (QEMU) have to pick a window location but then we
have to stick to it and if a user later hotplugs an VFIO-PCI, that
physical IOMMU has to support the previously selected DMA window
address; otherwise hotplug is going to fail.

The question is how to tell QEMU about this new offset and what we do
about migration from P8 (which let's say did have a VFIO device which we
unplug before the migration) to P9 with a prospect of hotplugging an
VFIO device but this time with this GTE4GB bit set.

> Adding it as an option seems like it would make 
> things harder to support and reduces the amount of testing/use it would get.

Yeah, this why this is an RFC...

>> supports VFIO+pseries machine - current this requires telling
>> upstream+unmodified QEMU about this via
>> -global spapr-pci-host-bridge.dma64_win_addr=0x100000000
>> or per-phb property. 4/4 advertises the new option but
>> there is no automation around it in QEMU (should it be?).
>>
>> For now it is either 1<<59 or 4GB mode; dynamic switching is
>> not supported (could be via sysfs).
>>
>> This is based on sha1
>> a6ed68d6468b Linus Torvalds "Merge tag 'drm-next-2019-11-27' of git://
> anongit.freedesktop.org/drm/drm".
> 
> Are you sure?

Almost. It should have been HEAD^^^^^..HEAD instead of HEAD^^^^..HEAD :)

I've posted 00/4 to the thread now, sorry about that. Thanks,

> I am getting the following rejected hunk trying to apply the 
> first patch in the series:
> 
> --- arch/powerpc/platforms/powernv/pci-ioda.c
> +++ arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2349,15 +2349,10 @@ static void pnv_pci_ioda2_set_bypass(struct 
> pnv_ioda_pe *pe, bool enable)
>                 pe->tce_bypass_enabled = enable;
>  }
>  
> -static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
> -               int num, __u32 page_shift, __u64 window_size, __u32 levels,
> +static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
> +               __u32 page_shift, __u64 window_size, __u32 levels,
>                 bool alloc_userspace_copy, struct iommu_table **ptbl)
>  {
> -       struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
> -                       table_group);
> -       int nid = pe->phb->hose->node;
> -       __u64 bus_offset = num ?
> -               pe->table_group.tce64_start : table_group->tce32_start;
>         long ret;
>         struct iommu_table *tbl;
> 
> - Alistair
>  
>> Please comment. Thanks.
>>
>>
>>
>> Alexey Kardashevskiy (4):
>>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>>   powerpc/powernv/ioda: Allow smaller TCE table levels
>>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
>>
>>  arch/powerpc/include/asm/iommu.h              |   1 +
>>  arch/powerpc/include/asm/opal-api.h           |  11 +-
>>  arch/powerpc/include/asm/opal.h               |   2 +
>>  arch/powerpc/platforms/powernv/pci.h          |   1 +
>>  include/uapi/linux/vfio.h                     |   2 +
>>  arch/powerpc/platforms/powernv/opal-call.c    |   2 +
>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c     | 219 ++++++++++++++----
>>  drivers/vfio/vfio_iommu_spapr_tce.c           |  10 +-
>>  9 files changed, 202 insertions(+), 50 deletions(-)
>>
>>
> 
> 
> 
> 

-- 
Alexey