On Tue, May 07, 2013 at 03:51:31PM +1000, Alexey Kardashevskiy wrote: > On 05/07/2013 03:29 PM, David Gibson wrote: > > On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote: > >> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT > >> and H_STUFF_TCE requests without passing them to QEMU, which should > >> save time on switching to QEMU and back. > >> > >> Both real and virtual modes are supported - whenever the kernel > >> fails to handle TCE request, it passes it to the virtual mode. > >> If it the virtual mode handlers fail, then the request is passed > >> to the user mode, for example, to QEMU. > >> > >> This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate > >> a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables > >> in-kernel handling of IOMMU map/unmap. > >> > >> This adds a special case for huge pages (16MB). The reference > >> counting cannot be easily done for such pages in real mode (when > >> MMU is off) so we added a list of huge pages. It is populated in > >> virtual mode and get_page is called just once per a huge page. > >> Real mode handlers check if the requested page is huge and in the list, > >> then no reference counting is done, otherwise an exit to virtual mode > >> happens. The list is released at KVM exit. At the moment the fastest > >> card available for tests uses up to 9 huge pages so walking through this > >> list is not very expensive. However this can change and we may want > >> to optimize this. > >> > >> This also adds the virt_only parameter to the KVM module > >> for debug and performance check purposes. > >> > >> Tests show that this patch increases transmission speed from 220MB/s > >> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). > >> > >> Cc: David Gibson > >> Signed-off-by: Alexey Kardashevskiy > >> Signed-off-by: Paul Mackerras > >> --- > >> Documentation/virtual/kvm/api.txt | 28 ++++ > >> arch/powerpc/include/asm/kvm_host.h | 2 + > >> arch/powerpc/include/asm/kvm_ppc.h | 2 + > >> arch/powerpc/include/uapi/asm/kvm.h | 7 + > >> arch/powerpc/kvm/book3s_64_vio.c | 242 ++++++++++++++++++++++++++++++++++- > >> arch/powerpc/kvm/book3s_64_vio_hv.c | 192 +++++++++++++++++++++++++++ > >> arch/powerpc/kvm/powerpc.c | 12 ++ > >> include/uapi/linux/kvm.h | 2 + > >> 8 files changed, 485 insertions(+), 2 deletions(-) > >> > >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > >> index f621cd6..2039767 100644 > >> --- a/Documentation/virtual/kvm/api.txt > >> +++ b/Documentation/virtual/kvm/api.txt > >> @@ -2127,6 +2127,34 @@ written, then `n_invalid' invalid entries, invalidating any previously > >> valid entries found. > >> > >> > >> +4.79 KVM_CREATE_SPAPR_TCE_IOMMU > >> + > >> +Capability: KVM_CAP_SPAPR_TCE_IOMMU > >> +Architectures: powerpc > >> +Type: vm ioctl > >> +Parameters: struct kvm_create_spapr_tce_iommu (in) > >> +Returns: 0 on success, -1 on error > >> + > >> +This creates a link between IOMMU group and a hardware TCE (translation > >> +control entry) table. This link lets the host kernel know what IOMMU > >> +group (i.e. TCE table) to use for the LIOBN number passed with > >> +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. > >> + > >> +/* for KVM_CAP_SPAPR_TCE_IOMMU */ > >> +struct kvm_create_spapr_tce_iommu { > >> + __u64 liobn; > >> + __u32 iommu_id; > > > > Wouldn't it be more in keeping > > > pardon? Sorry, I was going to suggest a change, but then realised it wasn't actually any better than what you have now. > >> + __u32 flags; > >> +}; > >> + > >> +No flag is supported at the moment. > >> + > >> +When the guest issues TCE call on a liobn for which a TCE table has been > >> +registered, the kernel will handle it in real mode, updating the hardware > >> +TCE table. TCE table calls for other liobns will cause a vm exit and must > >> +be handled by userspace. > >> + > >> + > >> 5. The kvm_run structure > >> ------------------------ > >> > >> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h > >> index 36ceb0d..2b70cbc 100644 > >> --- a/arch/powerpc/include/asm/kvm_host.h > >> +++ b/arch/powerpc/include/asm/kvm_host.h > >> @@ -178,6 +178,8 @@ struct kvmppc_spapr_tce_table { > >> struct kvm *kvm; > >> u64 liobn; > >> u32 window_size; > >> + bool virtmode_only; > > > > I see this is now initialized from the global parameter, but I think > > it would be better to just check the global (debug) parameter > > directly, rather than duplicating it here. > > > The global parameter is in kvm.ko and the struct above is in the real mode > part which cannot go to the module. Ah, ok. I'm half inclined to just drop the virtmode_only thing entirely. > >> + struct iommu_group *grp; /* used for IOMMU groups */ > >> struct page *pages[0]; > >> }; > >> > >> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h > >> index d501246..bdfa140 100644 > >> --- a/arch/powerpc/include/asm/kvm_ppc.h > >> +++ b/arch/powerpc/include/asm/kvm_ppc.h > >> @@ -139,6 +139,8 @@ extern void kvmppc_xics_free(struct kvm *kvm); > >> > >> extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, > >> struct kvm_create_spapr_tce *args); > >> +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, > >> + struct kvm_create_spapr_tce_iommu *args); > >> extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( > >> struct kvm_vcpu *vcpu, unsigned long liobn); > >> extern long kvmppc_emulated_h_put_tce(struct kvmppc_spapr_tce_table *stt, > >> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h > >> index 681b314..b67d44b 100644 > >> --- a/arch/powerpc/include/uapi/asm/kvm.h > >> +++ b/arch/powerpc/include/uapi/asm/kvm.h > >> @@ -291,6 +291,13 @@ struct kvm_create_spapr_tce { > >> __u32 window_size; > >> }; > >> > >> +/* for KVM_CAP_SPAPR_TCE_IOMMU */ > >> +struct kvm_create_spapr_tce_iommu { > >> + __u64 liobn; > >> + __u32 iommu_id; > >> + __u32 flags; > >> +}; > >> + > >> /* for KVM_ALLOCATE_RMA */ > >> struct kvm_allocate_rma { > >> __u64 rma_size; > >> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c > >> index 643ac1e..98cf949 100644 > >> --- a/arch/powerpc/kvm/book3s_64_vio.c > >> +++ b/arch/powerpc/kvm/book3s_64_vio.c > >> @@ -27,6 +27,9 @@ > >> #include > >> #include > >> #include > >> +#include > >> +#include > >> +#include > >> > >> #include > >> #include > >> @@ -38,10 +41,19 @@ > >> #include > >> #include > >> #include > >> +#include > >> + > >> +#define DRIVER_VERSION "0.1" > >> +#define DRIVER_AUTHOR "Paul Mackerras, IBM Corp. " > >> +#define DRIVER_DESC "POWERPC KVM driver" > > > > Really? > > > What is wrong here? Well, it seems entirely unrelated to the rest of the changes, and not obviously accurate. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson