All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: qemu-devel@nongnu.org, tianyu.lan@intel.com,
	kevin.tian@intel.com, mst@redhat.com, jan.kiszka@siemens.com,
	alex.williamson@redhat.com, bd.aviv@gmail.com
Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices
Date: Wed, 18 Jan 2017 16:11:37 +0800	[thread overview]
Message-ID: <20170118081137.GS30108@pxdev.xzpeter.org> (raw)
In-Reply-To: <99e185dc-e3c9-678f-7a69-fc006658d07f@redhat.com>

On Wed, Jan 18, 2017 at 11:10:53AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月17日 22:45, Peter Xu wrote:
> >On Mon, Jan 16, 2017 at 05:54:55PM +0800, Jason Wang wrote:
> >>
> >>On 2017年01月16日 17:18, Peter Xu wrote:
> >>>>>  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> >>>>>                                        hwaddr addr, uint8_t am)
> >>>>>  {
> >>>>>@@ -1222,6 +1251,7 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> >>>>>      info.addr = addr;
> >>>>>      info.mask = ~((1 << am) - 1);
> >>>>>      g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> >>>>>+    vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> >>>>Is the case of GLOBAL or DSI flush missed, or we don't care it at all?
> >>>IMHO we don't. For device assignment, since we are having CM=1 here,
> >>>we should have explicit page invalidations even if guest sends
> >>>global/domain invalidations.
> >>>
> >>>Thanks,
> >>>
> >>>-- peterx
> >>Is this spec required?
> >I think not. IMO the spec is very coarse grained on describing cache
> >mode...
> >
> >>Btw, it looks to me that both DSI and GLOBAL are
> >>indeed explicit flush.
> >Actually when cache mode is on, it is unclear to me on how we should
> >treat domain/global invalidations, at least from the spec (as
> >mentioned earlier). My understanding is that they are not "explicit
> >flushes", which IMHO should only mean page selective IOTLB
> >invalidations.
> 
> Probably not, at least from the view of performance. DSI and global should
> be more efficient in some cases.

I agree with you that DSI/GLOBAL flushes are more efficient in some
way. But IMHO that does not mean these invalidations are "explicit
invalidations", and I suspect whether cache mode has to coop with it.

But here I should add one more thing besides PSI - context entry
invalidation should be one of "the explicit invalidations" as well,
which we need to handle just like PSI when cache mode is on.

> 
> >
> >>Just have a quick go through on driver codes and find this something
> >>interesting in intel_iommu_flush_iotlb_psi():
> >>
> >>...
> >>     /*
> >>      * Fallback to domain selective flush if no PSI support or the size is
> >>      * too big.
> >>      * PSI requires page size to be 2 ^ x, and the base address is naturally
> >>      * aligned to the size
> >>      */
> >>     if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
> >>         iommu->flush.flush_iotlb(iommu, did, 0, 0,
> >>                         DMA_TLB_DSI_FLUSH);
> >>     else
> >>         iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
> >>                         DMA_TLB_PSI_FLUSH);
> >>...
> >I think this is interesting... and I doubt its correctness while with
> >cache mode enabled.
> >
> >If so (sending domain invalidation instead of a big range of page
> >invalidations), how should we capture which pages are unmapped in
> >emulated IOMMU?
> 
> We don't need to track individual pages here, since all pages for a specific
> domain were unmapped I believe?

IMHO this might not be the correct behavior.

If we receive one domain specific invalidation, I agree that we should
invalidate the IOTLB cache for all the devices inside the domain.
However, when cache mode is on, we should be depending on the PSIs to
unmap each page (unless we want to unmap the whole address space, in
this case it's very possible that the guest is just unmapping a range,
not the entire space). If we convert several PSIs into one big DSI,
IMHO we will leave those pages mapped/unmapped while we should
unmap/map them.

> 
> >
> >>It looks like DSI_FLUSH is possible even for CM on.
> >>
> >>And in flush_unmaps():
> >>
> >>...
> >>         /* In caching mode, global flushes turn emulation expensive */
> >>         if (!cap_caching_mode(iommu->cap))
> >>             iommu->flush.flush_iotlb(iommu, 0, 0, 0,
> >>                      DMA_TLB_GLOBAL_FLUSH);
> >>...
> >>
> >>If I understand the comments correctly, GLOBAL is ok for CM too (though the
> >>code did not do it for performance reason).
> >I think it should be okay to send global flush for CM, but not sure
> >whether we should notify anything when we receive it. Hmm, anyway, I
> >think I need some more reading to make sure I understand the whole
> >thing correctly. :)
> >
> >For example, when I see this commit:
> >
> >commit 78d5f0f500e6ba8f6cfd0673475ff4d941d705a2
> >Author: Nadav Amit <nadav.amit@gmail.com>
> >Date:   Thu Apr 8 23:00:41 2010 +0300
> >
> >     intel-iommu: Avoid global flushes with caching mode.
> >     While it may be efficient on real hardware, emulation of global
> >     invalidations is very expensive as all shadow entries must be examined.
> >     This patch changes the behaviour when caching mode is enabled (which is
> >     the case when IOMMU emulation takes place). In this case, page specific
> >     invalidation is used instead.
> >
> >Before I ask someone else besides qemu-devel, I am curious about
> >whether there is existing VT-d emulation code (outside QEMU, of
> >course) so that I can have a reference?
> 
> Yes, it has. The author of this patch - Nadav has done lots of research on
> emulated IOMMU. See following papers:
> 
> https://hal.inria.fr/inria-00493752/document
> http://www.cse.iitd.ac.in/~sbansal/csl862-virt/readings/vIOMMU.pdf

Thanks for these good materials. I will google the author for sure
next time. :)

-- peterx

  reply	other threads:[~2017-01-18  8:11 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-13  3:06 [Qemu-devel] [PATCH RFC v3 00/14] VT-d: vfio enablement and misc enhances Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 01/14] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Peter Xu
2017-01-20  8:32   ` Tian, Kevin
2017-01-20  8:54     ` Peter Xu
2017-01-20  8:59       ` Tian, Kevin
2017-01-20  9:11         ` Peter Xu
2017-01-20  9:20           ` Tian, Kevin
2017-01-20  9:30             ` Peter Xu
2017-01-20 15:42   ` Eric Blake
2017-01-22  2:32     ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 02/14] intel_iommu: simplify irq region translation Peter Xu
2017-01-20  8:22   ` Tian, Kevin
2017-01-20  9:05     ` Peter Xu
2017-01-20  9:15       ` Tian, Kevin
2017-01-20  9:27         ` Peter Xu
2017-01-20  9:52           ` Tian, Kevin
2017-01-20 10:04             ` Peter Xu
2017-01-22  4:42               ` Tian, Kevin
2017-01-22  4:50                 ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 03/14] intel_iommu: renaming gpa to iova where proper Peter Xu
2017-01-20  8:27   ` Tian, Kevin
2017-01-20  9:23     ` Peter Xu
2017-01-20  9:41       ` Tian, Kevin
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 04/14] intel_iommu: fix trace for inv desc handling Peter Xu
2017-01-13  7:46   ` Jason Wang
2017-01-13  9:13     ` Peter Xu
2017-01-13  9:33       ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 05/14] intel_iommu: fix trace for addr translation Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 06/14] intel_iommu: vtd_slpt_level_shift check level Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 07/14] memory: add section range info for IOMMU notifier Peter Xu
2017-01-13  7:55   ` Jason Wang
2017-01-13  9:23     ` Peter Xu
2017-01-13  9:37       ` Jason Wang
2017-01-13 10:22         ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 08/14] memory: provide iommu_replay_all() Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 09/14] memory: introduce memory_region_notify_one() Peter Xu
2017-01-13  7:58   ` Jason Wang
2017-01-16  7:08     ` Peter Xu
2017-01-16  7:38       ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 10/14] memory: add MemoryRegionIOMMUOps.replay() callback Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 11/14] intel_iommu: provide its own replay() callback Peter Xu
2017-01-13  9:26   ` Jason Wang
2017-01-16  7:31     ` Peter Xu
2017-01-16  7:47       ` Jason Wang
2017-01-16  7:59         ` Peter Xu
2017-01-16  8:03           ` Jason Wang
2017-01-16  8:06             ` Peter Xu
2017-01-16  8:23               ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 12/14] intel_iommu: do replay when context invalidate Peter Xu
2017-01-16  5:53   ` Jason Wang
2017-01-16  7:43     ` Peter Xu
2017-01-16  7:52       ` Jason Wang
2017-01-16  8:02         ` Peter Xu
2017-01-16  8:18         ` Peter Xu
2017-01-16  8:28           ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2017-01-16  6:20   ` Jason Wang
2017-01-16  7:50     ` Peter Xu
2017-01-16  8:01       ` Jason Wang
2017-01-16  8:12         ` Peter Xu
2017-01-16  8:25           ` Jason Wang
2017-01-16  8:32             ` Peter Xu
2017-01-16 16:25               ` Michael S. Tsirkin
2017-01-17 14:53                 ` Peter Xu
2017-01-16 19:53   ` Alex Williamson
2017-01-17 14:00     ` Peter Xu
2017-01-17 15:46       ` Alex Williamson
2017-01-18  7:49         ` Peter Xu
2017-01-19  8:20           ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices Peter Xu
2017-01-16  6:30   ` Jason Wang
2017-01-16  9:18     ` Peter Xu
2017-01-16  9:54       ` Jason Wang
2017-01-17 14:45         ` Peter Xu
2017-01-18  3:10           ` Jason Wang
2017-01-18  8:11             ` Peter Xu [this message]
2017-01-18  8:36               ` Jason Wang
2017-01-18  8:46                 ` Peter Xu
2017-01-18  9:38                   ` Tian, Kevin
2017-01-18 10:06                     ` Jason Wang
2017-01-19  3:32                       ` Peter Xu
2017-01-19  3:36                         ` Jason Wang
2017-01-19  3:16                     ` Peter Xu
2017-01-19  6:22                       ` Tian, Kevin
2017-01-19  9:38                         ` Peter Xu
2017-01-19  6:44                     ` Liu, Yi L
2017-01-19  7:02                       ` Jason Wang
2017-01-19  7:02                       ` Peter Xu
2017-01-16  9:20     ` Peter Xu
2017-01-13 15:58 ` [Qemu-devel] [PATCH RFC v3 00/14] VT-d: vfio enablement and misc enhances Michael S. Tsirkin
2017-01-14  2:59   ` Peter Xu
2017-01-17 15:07     ` Michael S. Tsirkin
2017-01-18  7:34       ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170118081137.GS30108@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.