KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alistair Popple <alistair@popple.id.au>
Cc: linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>,
	kvm@vger.kernel.org, Alex Williamson <alex.williamson@redhat.com>,
	Oliver O'Halloran <oohall@gmail.com>
Subject: Re: [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window at 4GB
Date: Mon, 2 Dec 2019 16:58:15 +1100
Message-ID: <45175dc2-8ed4-6e96-ff69-44980f3d1951@ozlabs.ru> (raw)
In-Reply-To: <22858805.RAHADn2P79@townsend>



On 02/12/2019 16:36, Alistair Popple wrote:
> On Monday, 2 December 2019 12:59:49 PM AEDT Alexey Kardashevskiy wrote:
>> Here is an attempt to support bigger DMA space for devices
>> supporting DMA masks less than 59 bits (GPUs come into mind
>> first). POWER9 PHBs have an option to map 2 windows at 0
>> and select a windows based on DMA address being below or above
>> 4GB.
>>
>> This adds the "iommu=iommu_bypass" kernel parameter and
> 
> Would it be possible to just enable this by default if the platform supports 
> it? Are there any downsides?

It changes the second DMA window location which is now assumed by QEMU
to be at 0x800.0000.0000.0000 and I do not see an easy way to work
around this.

For example, we start QEMU without VFIO but with emulated XHCI which
will ask for DDW, we (QEMU) have to pick a window location but then we
have to stick to it and if a user later hotplugs an VFIO-PCI, that
physical IOMMU has to support the previously selected DMA window
address; otherwise hotplug is going to fail.

The question is how to tell QEMU about this new offset and what we do
about migration from P8 (which let's say did have a VFIO device which we
unplug before the migration) to P9 with a prospect of hotplugging an
VFIO device but this time with this GTE4GB bit set.


> Adding it as an option seems like it would make 
> things harder to support and reduces the amount of testing/use it would get.

Yeah, this why this is an RFC...


>> supports VFIO+pseries machine - current this requires telling
>> upstream+unmodified QEMU about this via
>> -global spapr-pci-host-bridge.dma64_win_addr=0x100000000
>> or per-phb property. 4/4 advertises the new option but
>> there is no automation around it in QEMU (should it be?).
>>
>> For now it is either 1<<59 or 4GB mode; dynamic switching is
>> not supported (could be via sysfs).
>>
>> This is based on sha1
>> a6ed68d6468b Linus Torvalds "Merge tag 'drm-next-2019-11-27' of git://
> anongit.freedesktop.org/drm/drm".
> 
> Are you sure?

Almost. It should have been HEAD^^^^^..HEAD instead of HEAD^^^^..HEAD :)

I've posted 00/4 to the thread now, sorry about that. Thanks,


> I am getting the following rejected hunk trying to apply the 
> first patch in the series:
> 
> --- arch/powerpc/platforms/powernv/pci-ioda.c
> +++ arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2349,15 +2349,10 @@ static void pnv_pci_ioda2_set_bypass(struct 
> pnv_ioda_pe *pe, bool enable)
>                 pe->tce_bypass_enabled = enable;
>  }
>  
> -static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
> -               int num, __u32 page_shift, __u64 window_size, __u32 levels,
> +static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
> +               __u32 page_shift, __u64 window_size, __u32 levels,
>                 bool alloc_userspace_copy, struct iommu_table **ptbl)
>  {
> -       struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
> -                       table_group);
> -       int nid = pe->phb->hose->node;
> -       __u64 bus_offset = num ?
> -               pe->table_group.tce64_start : table_group->tce32_start;
>         long ret;
>         struct iommu_table *tbl;
> 
> - Alistair
>  
>> Please comment. Thanks.
>>
>>
>>
>> Alexey Kardashevskiy (4):
>>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>>   powerpc/powernv/ioda: Allow smaller TCE table levels
>>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
>>
>>  arch/powerpc/include/asm/iommu.h              |   1 +
>>  arch/powerpc/include/asm/opal-api.h           |  11 +-
>>  arch/powerpc/include/asm/opal.h               |   2 +
>>  arch/powerpc/platforms/powernv/pci.h          |   1 +
>>  include/uapi/linux/vfio.h                     |   2 +
>>  arch/powerpc/platforms/powernv/opal-call.c    |   2 +
>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c     | 219 ++++++++++++++----
>>  drivers/vfio/vfio_iommu_spapr_tce.c           |  10 +-
>>  9 files changed, 202 insertions(+), 50 deletions(-)
>>
>>
> 
> 
> 
> 

-- 
Alexey

  reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-02  1:59 Alexey Kardashevskiy
2019-12-02  1:59 ` [PATCH kernel RFC 1/4] powerpc/powernv/ioda: Rework for " Alexey Kardashevskiy
2019-12-02  1:59 ` [PATCH kernel RFC 2/4] powerpc/powernv/ioda: Allow smaller TCE table levels Alexey Kardashevskiy
2019-12-02  1:59 ` [PATCH kernel RFC 3/4] powerpc/powernv/phb4: Add 4GB IOMMU bypass mode Alexey Kardashevskiy
2019-12-02  1:59 ` [PATCH kernel RFC 4/4] vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB Alexey Kardashevskiy
2019-12-02  5:36 ` [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window " Alistair Popple
2019-12-02  5:58   ` Alexey Kardashevskiy [this message]
2019-12-02  5:51 ` [PATCH kernel RFC 00/4] powerpc/powernv/ioda: Move TCE bypass base to PE Alexey Kardashevskiy
2020-01-10  4:18 ` [PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window at 4GB Alexey Kardashevskiy
2020-01-23  0:53   ` Alexey Kardashevskiy
2020-01-23  1:17     ` David Gibson
2020-01-23  8:42       ` Alexey Kardashevskiy

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45175dc2-8ed4-6e96-ff69-44980f3d1951@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=alistair@popple.id.au \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=oohall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git