All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: tianyu.lan@intel.com, kevin.tian@intel.com, mst@redhat.com,
	Marcel Apfelbaum <marcel@redhat.com>,
	jan.kiszka@siemens.com, jasowang@redhat.com, peterx@redhat.com,
	David Gibson <david@gibson.dropbear.id.au>,
	alex.williamson@redhat.com, bd.aviv@gmail.com
Subject: [Qemu-devel] [PATCH v8 0/9] VT-d: vfio enablement and misc enhances
Date: Thu,  6 Apr 2017 15:08:35 +0800	[thread overview]
Message-ID: <1491462524-1617-1-git-send-email-peterx@redhat.com> (raw)

This is v8 of vt-d vfio enablement series.

v8
- remove patches 1-9 since merged already
- add David's r-b for all the patches
- add Aviv's s-o-b in the last patch
- rename iommu to iommu_dmar [Jason]
- rename last patch subject to "remote IOTLB" [Jason]
- pick up jason's two patches to fix vhost breakage
- let vhost leverage the new IOMMU notifier interface

v7:
- for the two traces patches: Change subjects. Remove vtd_err() and
  vtd_err_nonzero_rsvd() tracers, instead using standalone trace for
  each of the places. Don't remove any DPRINTF() if there is no
  replacement. [Jason]
- add r-b and a-b for Alex/David/Jason.
- in patch "intel_iommu: renaming gpa to iova where proper", convert
  one more place where I missed [Jason]
- fix the place where I should use "~0ULL" not "~0" [Jason]
- squash patch 16 into 18 [Jason]

v6:
- do unmap in all cases when replay [Jason]
- do global replay even if context entry is invalidated [Jason]
- when iommu reset, send unmap to all registered notifiers [Jason]
- use rcu read lock to protect the whole vfio_iommu_map_notify()
  [Alex, Paolo]

v5:
- fix patch 4 subject too long, and error spelling [Eric]
- add ack-by for alex in patch 1 [Alex]
- squashing patch 19/20 into patch 18 [Jason]
- fix comments in vtd_page_walk() [Jason]
- remove all error_report() [Jason]
- add comment for patch 18, mention about that enabled vhost without
  ATS as well [Jason]
- remove skipped debug thing during page walk [Jason]
- remove duplicated page walk trace [Jason]
- some tunings in vtd_address_space_unmap(), to provide correct iova
  and addr_mask. For this, I tuned this patch as well a bit:
  "memory: add section range info for IOMMU notifier"
  to loosen the range check

v4:
- convert all error_report()s into traces (in the two patches that did
  that)
- rebased to Jason's DMAR series (master + one more patch:
  "[PATCH V4 net-next] vhost_net: device IOTLB support")
- let vhost use the new api iommu_notifier_init() so it won't break
  vhost dmar [Jason]
- touch commit message of the patch:
  "intel_iommu: provide its own replay() callback"
  old replay is not a dead loop, but it will just consume lots of time
  [Jason]
- add comment for patch:
  "intel_iommu: do replay when context invalidate"
  telling why replay won't be a problem even without CM=1 [Jason]
- remove a useless comment line [Jason]
- remove dmar_enabled parameter for vtd_switch_address_space() and
  vtd_switch_address_space_all() [Mst, Jason]
- merged the vfio patches in, to support unmap of big ranges at the
  beginning ("[PATCH RFC 0/3] vfio: allow to notify unmap for very big
  region")
- using caching_mode instead of cache_mode_enabled, and "caching-mode"
  instead of "cache-mode" [Kevin]
- when receive context entry invalidation, we unmap the entire region
  first, then replay [Alex]
- fix commit message for patch:
  "intel_iommu: simplify irq region translation" [Kevin]
- handle domain/global invalidation, and notify where proper [Jason,
  Kevin]

v3:
- fix style error reported by patchew
- fix comment in domain switch patch: use "IOMMU address space" rather
  than "IOMMU region" [Kevin]
- add ack-by for Paolo in patch:
  "memory: add section range info for IOMMU notifier"
  (this is seperately collected besides this thread)
- remove 3 patches which are merged already (from Jason)
- rebase to master b6c0897

v2:
- change comment for "end" parameter in vtd_page_walk() [Tianyu]
- change comment for "a iova" to "an iova" [Yi]
- fix fault printed val for GPA address in vtd_page_walk_level (debug
  only)
- rebased to master (rather than Aviv's v6 series) and merged Aviv's
  series v6: picked patch 1 (as patch 1 in this series), dropped patch
  2, re-wrote patch 3 (as patch 17 of this series).
- picked up two more bugfix patches from Jason's DMAR series
- picked up the following patch as well:
  "[PATCH v3] intel_iommu: allow dynamic switch of IOMMU region"

This RFC series is a re-work for Aviv B.D.'s vfio enablement series
with vt-d:

  https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01452.html

Aviv has done a great job there, and what we still lack there are
mostly the following:

(1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
    memory region.

(2) VT-d still haven't provide a correct replay() mechanism (e.g.,
    when IOMMU domain switches, things will broke).

This series should have solved the above two issues.

Online repo:

  https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v8

I would be glad to hear about any review comments for above patches.

=========
Test Done
=========

Build test passed for x86_64/arm/ppc64.

Simply tested with x86_64, assigning two PCI devices to a single VM,
boot the VM using:

bin=x86_64-softmmu/qemu-system-x86_64
$bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
     -device intel-iommu,intremap=on,eim=off,caching-mode=on \
     -netdev user,id=net0,hostfwd=tcp::5555-:22 \
     -device virtio-net-pci,netdev=net0 \
     -device vfio-pci,host=03:00.0 \
     -device vfio-pci,host=02:00.0 \
     -trace events=".trace.vfio" \
     /var/lib/libvirt/images/vm1.qcow2

pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
vtd_page_walk*
vtd_replay*
vtd_inv_desc*

Then, in the guest, run the following tool:

  https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c

With parameter:

  ./vfio-bind-group 00:03.0 00:04.0

Check host side trace log, I can see pages are replayed and mapped in
00:04.0 device address space, like:

...
vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
...

=========
Todo List
=========

- error reporting for the assigned devices (as Tianyu has mentioned)

- per-domain address-space: A better solution in the future may be -
  we maintain one address space per IOMMU domain in the guest (so
  multiple devices can share a same address space if they are sharing
  the same IOMMU domains in the guest), rather than one address space
  per device (which is current implementation of vt-d). However that's
  a step further than this series, and let's see whether we can first
  provide a workable version of device assignment with vt-d
  protection.

- don't need to notify IOTLB (psi/gsi/global) invalidations to devices
  that with ATS enabled

- investigate when guest map page while mask contains existing mapped
  pages (e.g. map 12k-16k first, then map 0-12k)

- coalesce unmap during page walk (currently, we send it once per
  page)

- when do PSI for unmap, whether we can send one notify directly
  instead of walking over the page table?

- more to come...

Thanks,

Jason Wang (1):
  intel_iommu: use the correct memory region for device IOTLB
    notification

Peter Xu (8):
  memory: add section range info for IOMMU notifier
  memory: provide IOMMU_NOTIFIER_FOREACH macro
  memory: provide iommu_replay_all()
  memory: introduce memory_region_notify_one()
  memory: add MemoryRegionIOMMUOps.replay() callback
  intel_iommu: provide its own replay() callback
  intel_iommu: allow dynamic switch of IOMMU region
  intel_iommu: enable remote IOTLB

 hw/i386/intel_iommu.c          | 442 +++++++++++++++++++++++++++++++++++++++--
 hw/i386/intel_iommu_internal.h |   1 +
 hw/i386/trace-events           |  10 +-
 hw/vfio/common.c               |  12 +-
 hw/virtio/vhost.c              |  10 +-
 include/exec/memory.h          |  49 ++++-
 include/hw/i386/intel_iommu.h  |  10 +
 memory.c                       |  52 ++++-
 8 files changed, 552 insertions(+), 34 deletions(-)

-- 
2.7.4

             reply	other threads:[~2017-04-06  7:09 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06  7:08 Peter Xu [this message]
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 1/9] memory: add section range info for IOMMU notifier Peter Xu
2017-04-06 10:42   ` Auger Eric
2017-04-06 10:52     ` Peter Xu
2017-04-06 11:54   ` Michael S. Tsirkin
2017-04-06 15:10   ` Alex Williamson
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 2/9] memory: provide IOMMU_NOTIFIER_FOREACH macro Peter Xu
2017-04-06 10:45   ` Auger Eric
2017-04-06 11:12     ` Peter Xu
2017-04-06 11:30       ` Auger Eric
2017-04-06 11:54   ` Michael S. Tsirkin
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 3/9] memory: provide iommu_replay_all() Peter Xu
2017-04-06 10:52   ` Auger Eric
2017-04-06 11:46     ` Peter Xu
2017-04-07  4:17       ` Peter Xu
2017-04-06 11:55   ` Michael S. Tsirkin
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 4/9] memory: introduce memory_region_notify_one() Peter Xu
2017-04-06 10:54   ` Auger Eric
2017-04-06 11:55   ` Michael S. Tsirkin
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 5/9] memory: add MemoryRegionIOMMUOps.replay() callback Peter Xu
2017-04-06 10:58   ` Auger Eric
2017-04-06 11:57   ` Michael S. Tsirkin
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 6/9] intel_iommu: use the correct memory region for device IOTLB notification Peter Xu
2017-04-06 11:57   ` Michael S. Tsirkin
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 7/9] intel_iommu: provide its own replay() callback Peter Xu
2017-04-06 11:57   ` Michael S. Tsirkin
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 8/9] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2017-04-06 11:58   ` Michael S. Tsirkin
2017-04-06  7:08 ` [Qemu-devel] [PATCH v8 9/9] intel_iommu: enable remote IOTLB Peter Xu
2017-04-06 11:58   ` Michael S. Tsirkin
2017-04-06 11:53 ` [Qemu-devel] [PATCH v8 0/9] VT-d: vfio enablement and misc enhances Michael S. Tsirkin
2017-04-06 15:25   ` Peter Xu
2017-04-06 12:00 ` Michael S. Tsirkin
2017-04-06 15:27   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1491462524-1617-1-git-send-email-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.