KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
To: Auger Eric <eric.auger@redhat.com>,
	"eric.auger.pro@gmail.com" <eric.auger.pro@gmail.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"jacob.jun.pan@linux.intel.com" <jacob.jun.pan@linux.intel.com>,
	"yi.l.liu@intel.com" <yi.l.liu@intel.com>,
	"jean-philippe.brucker@arm.com" <jean-philippe.brucker@arm.com>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>
Cc: "kevin.tian@intel.com" <kevin.tian@intel.com>,
	"vincent.stehle@arm.com" <vincent.stehle@arm.com>,
	"ashok.raj@intel.com" <ashok.raj@intel.com>,
	"marc.zyngier@arm.com" <marc.zyngier@arm.com>,
	"tina.zhang@intel.com" <tina.zhang@intel.com>,
	Linuxarm <linuxarm@huawei.com>, "xuwei (O)" <xuwei5@huawei.com>
Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
Date: Tue, 12 Nov 2019 14:21:08 +0000
Message-ID: <19af0dda619f4947b20f2087eaa43251@huawei.com> (raw)
In-Reply-To: <b0a9f107-2e89-1418-d6f4-3e6f5ac0b330@redhat.com>



> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 12 November 2019 13:22
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> robin.murphy@arm.com
> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
> Hi Shameer,
> 
> On 11/12/19 2:06 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@redhat.com]
> >> Sent: 12 November 2019 11:29
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> >> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> >> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> >> kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> >> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> >> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> >> robin.murphy@arm.com
> >> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> >> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
> >> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> >> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >>
> >> Hi Shameer,
> >> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
> >>> Hi Eric,
> >>>
> >>>> -----Original Message-----
> >>>> From: kvmarm-bounces@lists.cs.columbia.edu
> >>>> [mailto:kvmarm-bounces@lists.cs.columbia.edu] On Behalf Of Eric Auger
> >>>> Sent: 11 July 2019 14:56
> >>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >>>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >>>> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> >>>> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> >>>> yi.l.liu@intel.com; jean-philippe.brucker@arm.com;
> will.deacon@arm.com;
> >>>> robin.murphy@arm.com
> >>>> Cc: kevin.tian@intel.com; vincent.stehle@arm.com;
> ashok.raj@intel.com;
> >>>> marc.zyngier@arm.com; tina.zhang@intel.com
> >>>> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >>>>
> >>>> This series brings the VFIO part of HW nested paging support
> >>>> in the SMMUv3.
> >>>>
> >>>> The series depends on:
> >>>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
> >>>> (https://www.spinics.net/lists/kernel/msg3187714.html)
> >>>>
> >>>> 3 new IOCTLs are introduced that allow the userspace to
> >>>> 1) pass the guest stage 1 configuration
> >>>> 2) pass stage 1 MSI bindings
> >>>> 3) invalidate stage 1 related caches
> >>>>
> >>>> They map onto the related new IOMMU API functions.
> >>>>
> >>>> We introduce the capability to register specific interrupt
> >>>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
> >>>> an eventfd to be signaled whenever a stage 1 related fault
> >>>> is detected at physical level. Also a specific region allows
> >>>> to expose the fault records to the user space.
> >>>
> >>> I am trying to get this running on one of our platform that has smmuv3 dual
> >>> stage support. I am seeing some issues with this when an ixgbe vf dev is
> >>> made pass-through and is behind a vSMMUv3 in Guest.
> >>>
> >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> >>>
> >>> And this is my Qemu cmd line,
> >>>
> >>> ./qemu-system-aarch64
> >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
> >>> -kernel Image \
> >>> -drive if=none,file=ubuntu,id=fs \
> >>> -device virtio-blk-device,drive=fs \
> >>> -device vfio-pci,host=0000:01:10.1 \
> >>> -bios QEMU_EFI.fd \
> >>> -net none \
> >>> -m 4G \
> >>> -nographic -D -d -enable-kvm \
> >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> >>>
> >>> The basic ping from Guest works fine,
> >>> root@ubuntu:~# ping 10.202.225.185
> >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> >>> ...
> >>>
> >>> But if I increase ping packet size,
> >>>
> >>> root@ubuntu:~# ping -s 1024 10.202.225.185
> >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> >>>
> >>> And from Host kernel I get,
> >>> [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> >>> [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> >>> [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> >>> [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets detected
> >>> [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> >>> [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> >>>
> >>> Also noted that iperf cannot work as it fails to establish the connection
> with
> >> iperf
> >>> server.
> >>>
> >>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> >> reference.
> >>> I haven't debugged this further yet and thought of checking with you if this
> is
> >>> something you have seen already or not. Or maybe I am missing something
> >> here?
> >>
> >> Please can you try to edit and modify hw/vfio/common.c, function
> >> vfio_iommu_unmap_notify
> >>
> >>
> >> /*
> >>     if (size <= 0x10000) {
> >>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> >>         ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
> >>         ustruct.info.addr_info.flags =
> IOMMU_INV_ADDR_FLAGS_ARCHID;
> >>         if (iotlb->leaf) {
> >>             ustruct.info.addr_info.flags |=
> >> IOMMU_INV_ADDR_FLAGS_LEAF;
> >>         }
> >>         ustruct.info.addr_info.archid = iotlb->arch_id;
> >>         ustruct.info.addr_info.addr = start;
> >>         ustruct.info.addr_info.granule_size = size;
> >>         ustruct.info.addr_info.nb_granules = 1;
> >>         trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
> >>                                         iotlb->leaf);
> >>     } else {
> >> */
> >>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> >>         ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
> >>         ustruct.info.pasid_info.archid = iotlb->arch_id;
> >>         ustruct.info.pasid_info.flags =
> IOMMU_INV_PASID_FLAGS_ARCHID;
> >>         trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
> >> //    }
> >>
> >> This modification leads to invalidate the whole asid each time we get a
> >> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
> >> saw this was the cause of such kind of issues. Please let me know if it
> >> fixes your perf issues
> >
> > Yes, this seems to fix the issue.
> >
> > root@ubuntu:~# iperf -c 10.202.225.185
> > ------------------------------------------------------------
> > Client connecting to 10.202.225.185, TCP port 5001
> > TCP window size: 85.0 KByte (default)
> > ------------------------------------------------------------
> > [  3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port
> 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
> > root@ubuntu:~#
> >
> > But the performance seems to be very poor as this is a 10Gbps interface(Of
> course
> > invalidating the whole asid may not be very helpful). It is interesting that why
> the
> > single iova invalidation is not working.
> >
> >  and then we may discuss further about the test
> >> configuration.
> >
> > Sure. Please let me know.
> 
> I reported that issue earlier on the ML. I have not been able to find
> any integration issue in the kernel/qemu code but maybe I am too blind
> now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down
> to the physical IOMMU. I also pass the LEAF flag.

Ok.

> As you are an expert of the SMMUv3 PMU, if your implementation has any
> and you have cycles to look at this, it would be helpful to run it and
> see if something weird gets highlighted.

:). Sure. I will give it a try and report back if anything suspicious.

Thanks,
Shameer

 
> Thanks
> 
> Eric
> >
> > Cheers,
> > Shameer
> >
> >> Thanks
> >>
> >> Eric
> >>
> >>
> >>
> >>>
> >>> Please let me know.
> >>>
> >>> Thanks,
> >>> Shameer
> >>>
> >>>> Best Regards
> >>>>
> >>>> Eric
> >>>>
> >>>> This series can be found at:
> >>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> >>>>
> >>>> It series includes Tina's patch steming from
> >>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
> >>>> specific irq" plus patches originally contributed by Yi.
> >>>>
> >>>> History:
> >>>>
> >>>> v8 -> v9:
> >>>> - introduce specific irq framework
> >>>> - single fault region
> >>>> - iommu_unregister_device_fault_handler failure case not handled
> >>>>   yet.
> >>>>
> >>>> v7 -> v8:
> >>>> - rebase on top of v5.2-rc1 and especially
> >>>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
> >>>> - dynamic alloc of s1_cfg/s2_cfg
> >>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
> >>>> - check there is no HW MSI regions
> >>>> - asid invalidation using pasid extended struct (change in the uapi)
> >>>> - add s1_live/s2_live checks
> >>>> - move check about support of nested stages in domain finalise
> >>>> - fixes in error reporting according to the discussion with Robin
> >>>> - reordered the patches to have first iommu/smmuv3 patches and then
> >>>>   VFIO patches
> >>>>
> >>>> v6 -> v7:
> >>>> - removed device handle from bind/unbind_guest_msi
> >>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
> >>>>   enforcement"
> >>>> - added few uapi comments as suggested by Jean, Jacop and Alex
> >>>>
> >>>> v5 -> v6:
> >>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
> >>>>
> >>>> v4 -> v5:
> >>>> - fix bug reported by Vincent: fault handler unregistration now happens in
> >>>>   vfio_pci_release
> >>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
> >>>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
> >>>> - iommu: introduce device fault report API: removed the PRI part.
> >>>> - see individual logs for more details
> >>>> - reset the ste abort flag on detach
> >>>>
> >>>> v3 -> v4:
> >>>> - took into account Alex, jean-Philippe and Robin's comments on v3
> >>>> - rework of the smmuv3 driver integration
> >>>> - add tear down ops for msi binding and PASID table binding
> >>>> - fix S1 fault propagation
> >>>> - put fault reporting patches at the beginning of the series following
> >>>>   Jean-Philippe's request
> >>>> - update of the cache invalidate and fault API uapis
> >>>> - VFIO fault reporting rework with 2 separate regions and one mmappable
> >>>>   segment for the fault queue
> >>>> - moved to PATCH
> >>>>
> >>>> v2 -> v3:
> >>>> - When registering the S1 MSI binding we now store the device handle.
> This
> >>>>   addresses Robin's comment about discimination of devices beonging
> to
> >>>>   different S1 groups and using different physical MSI doorbells.
> >>>> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX
> to
> >>>>   set the eventfd and expose the faults through an mmappable fault
> region
> >>>>
> >>>> v1 -> v2:
> >>>> - Added the fault reporting capability
> >>>> - asid properly passed on invalidation (fix assignment of multiple
> >>>>   devices)
> >>>> - see individual change logs for more info
> >>>>
> >>>>
> >>>> Eric Auger (8):
> >>>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
> >>>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
> >>>>   vfio/pci: Register an iommu fault handler
> >>>>   vfio/pci: Allow to mmap the fault queue
> >>>>   vfio: Add new IRQ for DMA fault reporting
> >>>>   vfio/pci: Add framework for custom interrupt indices
> >>>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
> >>>>   vfio: Document nested stage control
> >>>>
> >>>> Liu, Yi L (2):
> >>>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
> >>>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
> >>>>
> >>>> Tina Zhang (1):
> >>>>   vfio: Use capability chains to handle device specific irq
> >>>>
> >>>>  Documentation/vfio.txt              |  77 ++++++++
> >>>>  drivers/vfio/pci/vfio_pci.c         | 283
> >> ++++++++++++++++++++++++++--
> >>>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
> >>>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
> >>>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
> >>>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
> >>>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
> >>>>  7 files changed, 747 insertions(+), 19 deletions(-)
> >>>>
> >>>> --
> >>>> 2.20.1
> >>>>
> >>>> _______________________________________________
> >>>> kvmarm mailing list
> >>>> kvmarm@lists.cs.columbia.edu
> >>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> >


  reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-11 13:56 Eric Auger
2019-07-11 13:56 ` [PATCH v9 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE Eric Auger
2019-07-11 13:56 ` [PATCH v9 02/11] vfio: VFIO_IOMMU_CACHE_INVALIDATE Eric Auger
2019-07-11 13:56 ` [PATCH v9 03/11] vfio: VFIO_IOMMU_SET_MSI_BINDING Eric Auger
2019-07-11 13:56 ` [PATCH v9 04/11] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type Eric Auger
2019-07-11 13:56 ` [PATCH v9 05/11] vfio/pci: Register an iommu fault handler Eric Auger
2019-07-11 13:56 ` [PATCH v9 06/11] vfio/pci: Allow to mmap the fault queue Eric Auger
2019-07-11 13:56 ` [PATCH v9 07/11] vfio: Use capability chains to handle device specific irq Eric Auger
2019-07-11 13:56 ` [PATCH v9 08/11] vfio: Add new IRQ for DMA fault reporting Eric Auger
2019-07-11 13:56 ` [PATCH v9 09/11] vfio/pci: Add framework for custom interrupt indices Eric Auger
2019-07-11 13:56 ` [PATCH v9 10/11] vfio/pci: Register and allow DMA FAULT IRQ signaling Eric Auger
2019-07-11 13:56 ` [PATCH v9 11/11] vfio: Document nested stage control Eric Auger
     [not found] ` <f5b4b97b197d4bab8f3703eba2e966c4@huawei.com>
2019-11-12 11:28   ` [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) Auger Eric
2019-11-12 13:06     ` Shameerali Kolothum Thodi
2019-11-12 13:21       ` Auger Eric
2019-11-12 14:21         ` Shameerali Kolothum Thodi [this message]
2019-11-12 17:56         ` Shameerali Kolothum Thodi
2019-11-12 20:34           ` Auger Eric
2019-11-13 16:24             ` Shameerali Kolothum Thodi
2019-11-20  8:15 ` Tomasz Nowicki
2019-11-20 10:18   ` Auger Eric
2020-03-03 12:57     ` zhangfei
2020-03-03 13:14       ` Auger Eric

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19af0dda619f4947b20f2087eaa43251@huawei.com \
    --to=shameerali.kolothum.thodi@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=marc.zyngier@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=tina.zhang@intel.com \
    --cc=vincent.stehle@arm.com \
    --cc=will.deacon@arm.com \
    --cc=xuwei5@huawei.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git