All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: tianyu.lan@intel.com, kevin.tian@intel.com, mst@redhat.com,
	jan.kiszka@siemens.com, bd.aviv@gmail.com, qemu-devel@nongnu.org,
	alex.williamson@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v4 18/20] intel_iommu: enable vfio devices
Date: Tue, 24 Jan 2017 12:42:48 +0800	[thread overview]
Message-ID: <20170124044248.GL26526@pxdev.xzpeter.org> (raw)
In-Reply-To: <c3ec8e80-2d47-82c4-b38b-3cc487ea37c0@redhat.com>

On Mon, Jan 23, 2017 at 06:23:44PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月23日 11:34, Peter Xu wrote:
> >On Mon, Jan 23, 2017 at 09:55:39AM +0800, Jason Wang wrote:
> >>
> >>On 2017年01月22日 17:04, Peter Xu wrote:
> >>>On Sun, Jan 22, 2017 at 04:08:04PM +0800, Jason Wang wrote:
> >>>
> >>>[...]
> >>>
> >>>>>+static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> >>>>>+                                           uint16_t domain_id, hwaddr addr,
> >>>>>+                                           uint8_t am)
> >>>>>+{
> >>>>>+    IntelIOMMUNotifierNode *node;
> >>>>>+    VTDContextEntry ce;
> >>>>>+    int ret;
> >>>>>+
> >>>>>+    QLIST_FOREACH(node, &(s->notifiers_list), next) {
> >>>>>+        VTDAddressSpace *vtd_as = node->vtd_as;
> >>>>>+        ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> >>>>>+                                       vtd_as->devfn, &ce);
> >>>>>+        if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
> >>>>>+            vtd_page_walk(&ce, addr, addr + (1 << am) * VTD_PAGE_SIZE,
> >>>>>+                          vtd_page_invalidate_notify_hook,
> >>>>>+                          (void *)&vtd_as->iommu, true);
> >>>>Why not simply trigger the notifier here? (or is this vfio required?)
> >>>Because we may only want to notify part of the region - we are with
> >>>mask here, but not exact size.
> >>>
> >>>Consider this: guest (with caching mode) maps 12K memory (4K*3 pages),
> >>>the mask will be extended to 16K in the guest. In that case, we need
> >>>to explicitly go over the page entry to know that the 4th page should
> >>>not be notified.
> >>I see. Then it was required by vfio only, I think we can add a fast path for
> >>!CM in this case by triggering the notifier directly.
> >I noted this down (to be further investigated in my todo), but I don't
> >know whether this can work, due to the fact that I think it is still
> >legal that guest merge more than one PSIs into one. For example, I
> >don't know whether below is legal:
> >
> >- guest invalidate page (0, 4k)
> >- guest map new page (4k, 8k)
> >- guest send single PSI of (0, 8k)
> >
> >In that case, it contains both map/unmap, and looks like it didn't
> >disobay the spec as well?
> 
> Not sure I get your meaning, you mean just send single PSI instead of two?

Yes, and looks like that still doesn't violate the spec?

Actually for now, I think the best way to do with this series is that,
we can first let it in (so that advanced users can start to use it and
play with it). Then, we can get more feedback and solve critical
issues that may matter to customers and users.

For the above, I think per-page walk is the safest one for now. And I
can do investigate (as I mentioned) in the future to see whether we
can make it faster, according to your suggestion. However that'll be
nice we do it after we have some real use cases for this series, then
we can make sure the enhancement won't break anything besides boosting
the performance.

But of course I would like to listen to the maintainer's opinion on
this...

> 
> >
> >>Another possible issue is, consider (with CM) a 16K contiguous iova with the
> >>last page has already been mapped. In this case, if we want to map first
> >>three pages, when handling IOTLB invalidation, am would be 16K, then the
> >>last page will be mapped twice. Can this lead some issue?
> >I don't know whether guest has special handling of this kind of
> >request.
> 
> This seems quite usual I think? E.g iommu_flush_iotlb_psi() did:
> 
> static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
>                   struct dmar_domain *domain,
>                   unsigned long pfn, unsigned int pages,
>                   int ih, int map)
> {
>     unsigned int mask = ilog2(__roundup_pow_of_two(pages));
>     uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
>     u16 did = domain->iommu_did[iommu->seq_id];
> ...

Yes, do rounding up should be the only thing to do when we have
unaligned size.

> 
> 
> >
> >Besides, imho to completely solve this problem, we still need that
> >per-domain tree. Considering that currently the tree is inside vfio, I
> >see this not a big issue as well.
> 
> Another issue I found is: with this series, VFIO_IOMMU_MAP_DMA seems become
> guest trigger-able. And since VFIO allocate its own structure to record dma
> mapping, this seems open a window for evil guest to exhaust host memory
> which is even worse.

(I see Alex replied in another email, so will skip this one)

> 
> >  In that case, the last page mapping
> >request will fail (we might see one error line from QEMU stderr),
> >however that'll not affect too much since currently vfio allows that
> >failure to happen (ioctl fail, but that page is still mapped, which is
> >what we wanted).
> 
> Works but sub-optimal or maybe even buggy.

Again, to finally solve this, I think we need a tree. But I don't
think that's a good idea for this series, considering that we have
already had one in the kernel. But I see this issue not a critical
blocker (if you won't disagree) since it should work for our goal,
which is either nested device assignment, or dpdk applications in
general.

I think users' feedback is really important for this series. So again,
I'll request that we postpone some issues as todo, rather than solving
all of them in this series before merge.

> 
> >
> >(But of course above error message can be used by an in-guest attacker
> >  as well just like general error_report() issues reported before,
> >  though again I will appreciate if we can have this series
> >  functionally work first :)
> >
> >And, I should be able to emulate this behavior in guest with a tiny C
> >program to make sure of it, possibly after this series if allowed.
> 
> Or through your vtd unittest :) ?

Yes, or easier, just write a program in guest running Linux, sends
VFIO_IOMMU_DMA_MAP ioctl()s correspondingly.

Thanks,

-- peterx

  parent reply	other threads:[~2017-01-24  4:42 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-20 13:08 [Qemu-devel] [PATCH RFC v4 00/20] VT-d: vfio enablement and misc enhances Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 01/20] vfio: trace map/unmap for notify as well Peter Xu
2017-01-23 18:20   ` Alex Williamson
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 02/20] vfio: introduce vfio_get_vaddr() Peter Xu
2017-01-23 18:49   ` Alex Williamson
2017-01-24  3:28     ` Peter Xu
2017-01-24  4:30       ` Alex Williamson
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 03/20] vfio: allow to notify unmap for very large region Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 04/20] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Peter Xu
2017-01-22  2:51   ` [Qemu-devel] [PATCH RFC v4.1 04/20] intel_iommu: add "caching-mode" option Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 05/20] intel_iommu: simplify irq region translation Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 06/20] intel_iommu: renaming gpa to iova where proper Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 07/20] intel_iommu: fix trace for inv desc handling Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 08/20] intel_iommu: fix trace for addr translation Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 09/20] intel_iommu: vtd_slpt_level_shift check level Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 10/20] memory: add section range info for IOMMU notifier Peter Xu
2017-01-23 19:12   ` Alex Williamson
2017-01-24  7:48     ` Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 11/20] memory: provide IOMMU_NOTIFIER_FOREACH macro Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 12/20] memory: provide iommu_replay_all() Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 13/20] memory: introduce memory_region_notify_one() Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 14/20] memory: add MemoryRegionIOMMUOps.replay() callback Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 15/20] intel_iommu: provide its own replay() callback Peter Xu
2017-01-22  7:56   ` Jason Wang
2017-01-22  8:51     ` Peter Xu
2017-01-22  9:36       ` Peter Xu
2017-01-23  1:50         ` Jason Wang
2017-01-23  1:48       ` Jason Wang
2017-01-23  2:54         ` Peter Xu
2017-01-23  3:12           ` Jason Wang
2017-01-23  3:35             ` Peter Xu
2017-01-23 19:34           ` Alex Williamson
2017-01-24  4:04             ` Peter Xu
2017-01-23 19:33       ` Alex Williamson
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 16/20] intel_iommu: do replay when context invalidate Peter Xu
2017-01-23 10:36   ` Jason Wang
2017-01-24  4:52     ` Peter Xu
2017-01-25  3:09       ` Jason Wang
2017-01-25  3:46         ` Peter Xu
2017-01-25  6:37           ` Tian, Kevin
2017-01-25  6:44             ` Peter Xu
2017-01-25  7:45               ` Jason Wang
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 17/20] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 18/20] intel_iommu: enable vfio devices Peter Xu
2017-01-22  8:08   ` Jason Wang
2017-01-22  9:04     ` Peter Xu
2017-01-23  1:55       ` Jason Wang
2017-01-23  3:34         ` Peter Xu
2017-01-23 10:23           ` Jason Wang
2017-01-23 19:40             ` Alex Williamson
2017-01-25  1:19               ` Jason Wang
2017-01-25  1:31                 ` Alex Williamson
2017-01-25  7:41                   ` Jason Wang
2017-01-24  4:42             ` Peter Xu [this message]
2017-01-23 18:03           ` Alex Williamson
2017-01-24  7:22             ` Peter Xu
2017-01-24 16:24               ` Alex Williamson
2017-01-25  4:04                 ` Peter Xu
2017-01-23  2:01   ` Jason Wang
2017-01-23  2:17     ` Jason Wang
2017-01-23  3:40     ` Peter Xu
2017-01-23 10:27       ` Jason Wang
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 19/20] intel_iommu: unmap existing pages before replay Peter Xu
2017-01-22  8:13   ` Jason Wang
2017-01-22  9:09     ` Peter Xu
2017-01-23  1:57       ` Jason Wang
2017-01-23  7:30         ` Peter Xu
2017-01-23 10:29           ` Jason Wang
2017-01-23 10:40   ` Jason Wang
2017-01-24  7:31     ` Peter Xu
2017-01-25  3:11       ` Jason Wang
2017-01-25  4:15         ` Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 20/20] intel_iommu: replay even with DSI/GLOBAL inv desc Peter Xu
2017-01-23 15:55 ` [Qemu-devel] [PATCH RFC v4 00/20] VT-d: vfio enablement and misc enhances Michael S. Tsirkin
2017-01-24  7:40   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170124044248.GL26526@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.