linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH kernel 8/9] KVM: PPC: Add in-kernel handling for VFIO
Date: Fri, 11 Mar 2016 13:15:20 +1100	[thread overview]
Message-ID: <56E22A38.1050204@ozlabs.ru> (raw)
In-Reply-To: <20160310051828.GU22546@voom.fritz.box>

On 03/10/2016 04:18 PM, David Gibson wrote:
> On Wed, Mar 09, 2016 at 07:46:47PM +1100, Alexey Kardashevskiy wrote:
>> On 03/08/2016 10:08 PM, David Gibson wrote:
>>> On Mon, Mar 07, 2016 at 02:41:16PM +1100, Alexey Kardashevskiy wrote:
>>>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
>>>> and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
>>>> without passing them to user space which saves time on switching
>>>> to user space and back.
>>>>
>>>> Both real and virtual modes are supported. The kernel tries to
>>>> handle a TCE request in the real mode, if fails it passes the request
>>>> to the virtual mode to complete the operation. If it a virtual mode
>>>> handler fails, the request is passed to user space; this is not expected
>>>> to happen ever though.
>>>
>>> Well... not expect to happen with a qemu which uses this.  Presumably
>>> it will fall back to userspace routinely if you have an old qemu that
>>> doesn't add the liobn mappings.
>>
>>
>> Ah. Ok, thanks, I'll add this to the commit log.
>
> Ok.
>
>>>> The first user of this is VFIO on POWER. Trampolines to the VFIO external
>>>> user API functions are required for this patch.
>>>
>>> I'm not sure what you mean by "trampoline" here.
>>
>> For example, look at kvm_vfio_group_get_external_user. It calls
>> symbol_get(vfio_group_get_external_user) and then calls a function via the
>> returned pointer.
>>
>> Is there a better word for this?
>
> Uh.. probably although I don't immediately know what.  "Trampoline"
> usually refers to code on the stack used for bouncing places, which
> isn't what this resembles.

"Dynamic wrapper"?



>>>> This uses a VFIO KVM device to associate a logical bus number (LIOBN)
>>>> with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap
>>>> requests.
>>>
>>> Group fd?  Or container fd?  The group fd wouldn't make a lot of
>>> sense.
>>
>>
>> Group. KVM has no idea about containers.
>
> That's not going to fly.  Having a liobn registered against just one
> group in a container makes no sense at all.  Conceptually, if not
> physically, the container shares a single set of TCE tables.  If
> handling that means teaching KVM the concept of containers, then so be
> it.
>
> Btw, I'm not sure yet if extending the existing vfio kvm device to
> make the vfio<->kvm linkages makes sense.  I think the reason some x86
> machines need that is quite different from how we're using it for
> Power.  I haven't got a clear enough picture yet to be sure either
> way.
>
> The other option that would seem likely to me would be a "bind VFIO
> container" ioctl() on the fd associated with a kernel accelerated TCE table.


Oh, I just noticed this response. I need to digest it. Looks like this is 
going to take other 2 years to upstream...


>>>> To make use of the feature, the user space has to create a guest view
>>>> of the TCE table via KVM_CAP_SPAPR_TCE/KVM_CAP_SPAPR_TCE_64 and
>>>> then associate a LIOBN with this table via VFIO KVM device,
>>>> a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN property (which is added in
>>>> the next patch).
>>>>
>>>> Tests show that this patch increases transmission speed from 220MB/s
>>>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
>>>
>>> Is that with or without DDW (i.e. with or without a 64-bit DMA window)?
>>
>>
>> Without DDW, I should have mentioned this. The patch is from the times when
>> there was no DDW :(
>
> Ok.
>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>   arch/powerpc/kvm/book3s_64_vio.c    | 184 +++++++++++++++++++++++++++++++++++
>>>>   arch/powerpc/kvm/book3s_64_vio_hv.c | 186 ++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 370 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>>>> index 7965fc7..9417d12 100644
>>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>>> @@ -33,6 +33,7 @@
>>>>   #include <asm/kvm_ppc.h>
>>>>   #include <asm/kvm_book3s.h>
>>>>   #include <asm/mmu-hash64.h>
>>>> +#include <asm/mmu_context.h>
>>>>   #include <asm/hvcall.h>
>>>>   #include <asm/synch.h>
>>>>   #include <asm/ppc-opcode.h>
>>>> @@ -317,11 +318,161 @@ fail:
>>>>   	return ret;
>>>>   }
>>>>
>>>> +static long kvmppc_tce_iommu_mapped_dec(struct iommu_table *tbl,
>>>> +		unsigned long entry)
>>>> +{
>>>> +	struct mm_iommu_table_group_mem_t *mem = NULL;
>>>> +	const unsigned long pgsize = 1ULL << tbl->it_page_shift;
>>>> +	unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>>>> +
>>>> +	if (!pua)
>>>> +		return H_HARDWARE;
>>>> +
>>>> +	mem = mm_iommu_lookup(*pua, pgsize);
>>>> +	if (!mem)
>>>> +		return H_HARDWARE;
>>>> +
>>>> +	mm_iommu_mapped_dec(mem);
>>>> +
>>>> +	*pua = 0;
>>>> +
>>>> +	return H_SUCCESS;
>>>> +}
>>>> +
>>>> +static long kvmppc_tce_iommu_unmap(struct iommu_table *tbl,
>>>> +		unsigned long entry)
>>>> +{
>>>> +	enum dma_data_direction dir = DMA_NONE;
>>>> +	unsigned long hpa = 0;
>>>> +
>>>> +	if (iommu_tce_xchg(tbl, entry, &hpa, &dir))
>>>> +		return H_HARDWARE;
>>>> +
>>>> +	if (dir == DMA_NONE)
>>>> +		return H_SUCCESS;
>>>> +
>>>> +	return kvmppc_tce_iommu_mapped_dec(tbl, entry);
>>>> +}
>>>> +
>>>> +long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>>> +		unsigned long entry, unsigned long gpa,
>>>> +		enum dma_data_direction dir)
>>>> +{
>>>> +	long ret;
>>>> +	unsigned long hpa, ua, *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>>>> +	struct mm_iommu_table_group_mem_t *mem;
>>>> +
>>>> +	if (!pua)
>>>> +		return H_HARDWARE;
>>>
>>> H_HARDWARE?  Or H_PARAMETER?  This essentially means the guest has
>>> supplied a bad physical address, doesn't it?
>>
>> Well, may be. I'll change. If it not H_TOO_HARD, it does not make any
>> difference after all :)
>>
>>
>>
>>>> +	if (kvmppc_gpa_to_ua(kvm, gpa, &ua, NULL))
>>>> +		return H_HARDWARE;
>>>> +
>>>> +	mem = mm_iommu_lookup(ua, 1ULL << tbl->it_page_shift);
>>>> +	if (!mem)
>>>> +		return H_HARDWARE;
>>>> +
>>>> +	if (mm_iommu_ua_to_hpa(mem, ua, &hpa))
>>>> +		return H_HARDWARE;
>>>> +
>>>> +	if (mm_iommu_mapped_inc(mem))
>>>> +		return H_HARDWARE;
>>>> +
>>>> +	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
>>>> +	if (ret) {
>>>> +		mm_iommu_mapped_dec(mem);
>>>> +		return H_TOO_HARD;
>>>> +	}
>>>> +
>>>> +	if (dir != DMA_NONE)
>>>> +		kvmppc_tce_iommu_mapped_dec(tbl, entry);
>>>> +
>>>> +	*pua = ua;
>>>
>>> IIUC this means you have a copy of the UA for every group attached to
>>> the TCE table, but they'll all be the same. Any way to avoid that
>>> duplication?
>>
>> It is for every container, not a group. On P8, I allow multiple groups to go
>> to the same container, that means that a container has one or two
>> iommu_table, and each iommu_table has this "ua" list but since tables are
>> different (window size, page size, content), these "ua" arrays are also
>> different.
>
> Erm.. but h_put_tce iterates h_put_tce_iommu through all the groups
> attached to the stt, and each one seems to update pua.
>
> Or is that what the if (kg->tbl == tbltmp) continue; is supposed to
> avoid?  In which case what ensures that the stt->groups list is
> ordered by tbl pointer?


Nothing. In the normal case (POWER8 IODA2) all groups on the same liobn 
have the same iommu_table, so the first group's one gets updated, other do 
not but it is ok as they use the same table.

In a bad case (POWER7 IODA1, multiple containers per LIOBN) the same @ua 
can be updated more than once. Well, not a huge loss.


-- 
Alexey

  reply	other threads:[~2016-03-11  2:15 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-07  3:41 [PATCH kernel 0/9] KVM, PPC, VFIO: Enable in-kernel acceleration Alexey Kardashevskiy
2016-03-07  3:41 ` [PATCH kernel 1/9] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
2016-03-07  4:58   ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 2/9] powerpc/mmu: Add real mode support for IOMMU preregistered memory Alexey Kardashevskiy
2016-03-07  5:30   ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list Alexey Kardashevskiy
2016-03-07  6:00   ` David Gibson
2016-03-08  5:47     ` Alexey Kardashevskiy
2016-03-08  6:30       ` David Gibson
2016-03-09  8:55         ` Alexey Kardashevskiy
2016-03-09 23:46           ` David Gibson
2016-03-10  8:33     ` Paul Mackerras
2016-03-10 23:42       ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg() Alexey Kardashevskiy
2016-03-07  6:05   ` David Gibson
2016-03-07  7:32     ` Alexey Kardashevskiy
2016-03-08  4:50       ` David Gibson
2016-03-10  8:43   ` Paul Mackerras
2016-03-10  8:46   ` Paul Mackerras
2016-03-07  3:41 ` [PATCH kernel 5/9] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently Alexey Kardashevskiy
2016-03-07  3:41 ` [PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table Alexey Kardashevskiy
2016-03-07  6:25   ` David Gibson
2016-03-07  9:38     ` Alexey Kardashevskiy
2016-03-08  4:55       ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 7/9] KVM: PPC: Create a virtual-mode only TCE table handlers Alexey Kardashevskiy
2016-03-08  6:32   ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 8/9] KVM: PPC: Add in-kernel handling for VFIO Alexey Kardashevskiy
2016-03-08 11:08   ` David Gibson
2016-03-09  8:46     ` Alexey Kardashevskiy
2016-03-10  5:18       ` David Gibson
2016-03-11  2:15         ` Alexey Kardashevskiy [this message]
2016-03-15  6:00           ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 9/9] KVM: PPC: VFIO device: support SPAPR TCE Alexey Kardashevskiy
2016-03-09  5:45   ` David Gibson
2016-03-09  9:20     ` Alexey Kardashevskiy
2016-03-10  5:21       ` David Gibson
2016-03-10 23:09         ` Alexey Kardashevskiy
2016-03-15  6:04           ` David Gibson
     [not found]             ` <15389a41428.27cb.1ca38dd7e845b990cd13d431eb58563d@ozlabs.ru>
     [not found]               ` <20160321051932.GJ23586@voom.redhat.com>
2016-03-22  0:34                 ` Alexey Kardashevskiy
2016-03-23  3:03                   ` David Gibson
2016-06-09  6:47                     ` Alexey Kardashevskiy
2016-06-10  6:50                       ` David Gibson
2016-06-14  3:30                         ` Alexey Kardashevskiy
2016-06-15  4:43                           ` David Gibson
2016-04-08  9:13     ` Alexey Kardashevskiy
2016-04-11  3:36       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E22A38.1050204@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).