All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Liu, Yi L" <yi.l.liu@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	Peter Xu <peterx@redhat.com>
Cc: "Lan, Tianyu" <tianyu.lan@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"jan.kiszka@siemens.com" <jan.kiszka@siemens.com>,
	"bd.aviv@gmail.com" <bd.aviv@gmail.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices
Date: Thu, 19 Jan 2017 15:02:37 +0800	[thread overview]
Message-ID: <4103ea92-e713-9114-5be9-7e698fc14631@redhat.com> (raw)
In-Reply-To: <A2975661238FB949B60364EF0F2C257438FF9300@shsmsx102.ccr.corp.intel.com>



On 2017年01月19日 14:44, Liu, Yi L wrote:
>> -----Original Message-----
>> From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org]
>> On Behalf Of Tian, Kevin
>> Sent: Wednesday, January 18, 2017 5:39 PM
>> To: Peter Xu <peterx@redhat.com>; Jason Wang <jasowang@redhat.com>
>> Cc: Lan, Tianyu <tianyu.lan@intel.com>; Raj, Ashok <ashok.raj@intel.com>;
>> mst@redhat.com; jan.kiszka@siemens.com; bd.aviv@gmail.com; qemu-
>> devel@nongnu.org; alex.williamson@redhat.com
>> Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio
>> devices
>>
>>> From: Peter Xu [mailto:peterx@redhat.com]
>>> Sent: Wednesday, January 18, 2017 4:46 PM
>>>
>>> On Wed, Jan 18, 2017 at 04:36:05PM +0800, Jason Wang wrote:
>>>>
>>>> On 2017年01月18日 16:11, Peter Xu wrote:
>>>>> On Wed, Jan 18, 2017 at 11:10:53AM +0800, Jason Wang wrote:
>>>>>> On 2017年01月17日 22:45, Peter Xu wrote:
>>>>>>> On Mon, Jan 16, 2017 at 05:54:55PM +0800, Jason Wang wrote:
>>>>>>>> On 2017年01月16日 17:18, Peter Xu wrote:
>>>>>>>>>>>   static void vtd_iotlb_page_invalidate(IntelIOMMUState *s,
>>>>>>>>>>> uint16_t
>>> domain_id,
>>>>>>>>>>>                                         hwaddr addr, uint8_t
>>>>>>>>>>> am)
>>>>>>>>>>>   {
>>>>>>>>>>> @@ -1222,6 +1251,7 @@ static void
>>> vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>>>>>>>>>>>       info.addr = addr;
>>>>>>>>>>>       info.mask = ~((1 << am) - 1);
>>>>>>>>>>>       g_hash_table_foreach_remove(s->iotlb,
>>>>>>>>>>> vtd_hash_remove_by_page,
>>> &info);
>>>>>>>>>>> +    vtd_iotlb_page_invalidate_notify(s, domain_id, addr,
>>>>>>>>>>> + am);
>>>>>>>>>> Is the case of GLOBAL or DSI flush missed, or we don't care it at all?
>>>>>>>>> IMHO we don't. For device assignment, since we are having CM=1
>>>>>>>>> here, we should have explicit page invalidations even if guest
>>>>>>>>> sends global/domain invalidations.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -- peterx
>>>>>>>> Is this spec required?
>>>>>>> I think not. IMO the spec is very coarse grained on describing
>>>>>>> cache mode...
>>>>>>>
>>>>>>>> Btw, it looks to me that both DSI and GLOBAL are indeed explicit
>>>>>>>> flush.
>>>>>>> Actually when cache mode is on, it is unclear to me on how we
>>>>>>> should treat domain/global invalidations, at least from the spec
>>>>>>> (as mentioned earlier). My understanding is that they are not
>>>>>>> "explicit flushes", which IMHO should only mean page selective
>>>>>>> IOTLB invalidations.
>>>>>> Probably not, at least from the view of performance. DSI and
>>>>>> global should be more efficient in some cases.
>>>>> I agree with you that DSI/GLOBAL flushes are more efficient in some
>>>>> way. But IMHO that does not mean these invalidations are "explicit
>>>>> invalidations", and I suspect whether cache mode has to coop with it.
>>>> Well, the spec does not forbid DSI/GLOBAL with CM and the driver
>>>> codes had used them for almost ten years. I can hardly believe it's wrong.
>>> I think we have misunderstanding here. :)
>>>
>>> I never thought we should not send DSI/GLOBAL invalidations with cache
>>> mode. I just think we should not do anything special even if we have
>>> cache mode on when we receive these signals.
>>>
>>> In the spec, "explicit invalidation" is mentioned in the cache mode
>>> chapter:
>>>
>>>      The Caching Mode (CM) field in Capability Register indicates if
>>>      the hardware implementation caches not-present or erroneous
>>>      translation-structure entries. When the CM field is reported as
>>>      Set, any software updates to any remapping structures (including
>>>      updates to not-present entries or present entries whose
>>>      programming resulted in translation faults) requires explicit
>>>      invalidation of the caches.
>>>
>>> And I thought we were discussion about "what is explicit invalidation"
>>> mentioned above.
>> Check 6.5.3.1 Implicit Invalidation on Page Requests
>>
>> 	In addition to the explicit invalidation through invalidation commands
>> 	(see Section 6.5.1 and Section 6.5.2) or through deferred invalidation
>> 	messages (see Section 6.5.4), identified above, Page Requests from
>> 	endpoint devices invalidate entries in the IOTLBs and paging-structure
>> 	caches.
>>
>> My impression is that above indirectly defines invalidation commands (
>> PSI/DSI/GLOBAL) as explicit invalidation, because they are explicitly issued by
>> driver. Then section 6.5.3.1 further describes implicit invalidations caused by
>> other VT-d operations.
>>
>> I will check with VT-d spec owner to clarify.
>>
>>>>> But here I should add one more thing besides PSI - context entry
>>>>> invalidation should be one of "the explicit invalidations" as well,
>>>>> which we need to handle just like PSI when cache mode is on.
>>>>>
>>>>>>>> Just have a quick go through on driver codes and find this
>>>>>>>> something interesting in intel_iommu_flush_iotlb_psi():
>>>>>>>>
>>>>>>>> ...
>>>>>>>>      /*
>>>>>>>>       * Fallback to domain selective flush if no PSI support or the size is
>>>>>>>>       * too big.
>>>>>>>>       * PSI requires page size to be 2 ^ x, and the base address is
>> naturally
>>>>>>>>       * aligned to the size
>>>>>>>>       */
>>>>>>>>      if (!cap_pgsel_inv(iommu->cap) || mask >
>>> cap_max_amask_val(iommu->cap))
>>>>>>>>          iommu->flush.flush_iotlb(iommu, did, 0, 0,
>>>>>>>>                          DMA_TLB_DSI_FLUSH);
>>>>>>>>      else
>>>>>>>>          iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
>>>>>>>>                          DMA_TLB_PSI_FLUSH); ...
>>>>>>> I think this is interesting... and I doubt its correctness while
>>>>>>> with cache mode enabled.
>>>>>>>
>>>>>>> If so (sending domain invalidation instead of a big range of page
>>>>>>> invalidations), how should we capture which pages are unmapped in
>>>>>>> emulated IOMMU?
>>>>>> We don't need to track individual pages here, since all pages for
>>>>>> a specific domain were unmapped I believe?
>>>>> IMHO this might not be the correct behavior.
>>>>>
>>>>> If we receive one domain specific invalidation, I agree that we
>>>>> should invalidate the IOTLB cache for all the devices inside the domain.
>>>>> However, when cache mode is on, we should be depending on the PSIs
>>>>> to unmap each page (unless we want to unmap the whole address
>>>>> space, in this case it's very possible that the guest is just
>>>>> unmapping a range, not the entire space). If we convert several
>>>>> PSIs into one big DSI, IMHO we will leave those pages
>>>>> mapped/unmapped while we should unmap/map them.
>>>> Confused, do you have an example for this? (I fail to understand why
>>>> DSI can't work, at least implementation can convert DSI to several
>>>> PSIs internally).
>>> That's how I understand it. It might be wrong. Btw, could you
>>> elaborate a bit on how can we convert a DSI into several PSIs?
>>>
>>> Thanks,
>> If my understanding above is correct, there is nothing wrong with above
>> IOMMU driver code - actually it makes sense on bare metal when CM is
>> disabled.
>>
>> But yes, DSI/GLOBAL is far less efficient than PSI when CM is enabled.
>> We rely on cache invalidations to indirectly capture remapping structure change.
>> PSI provides accurate info, while DSI/GLOBAL doesn't. To emulate correct
>> behavior of DSI/GLOBAL, we have to pretend that the whole address space
>> (iova=0, size=agaw) needs to be unmapped (for GLOBAL it further means
>> multiple address spaces)
>>
>> Though not efficient, it doesn't mean it's wrong since guest driver follows spec.
>> We can ask for linux IOMMU driver change (CC Ashok) to not use above
>> optimization when cache mode is enabled, but anyway we need emulate correct
>> DSI/GLOBAL behavior to follow spec, because:
>>
>> - even when driver fix is in place, old version still has this logic;
>>
>> - there is still scenario where guest IOMMU driver does want to invalidate the
>> whole address space, e.g. when changing context entry. Asking guest driver to
>> use PSI for such purpose is another bad thing.
> Hi Kevin/Peter/Jason,
>
> I agree we should think DSI/GLOBAL. Herby, I guess there may be a chance to ignore
> DSI/GLOBAL flush if the following assumption is correct.
>
> It seems like that all DSI/GLOBAL flush would always be after a context entry invalidation.

Well it looks like at least for DSI, flush could happen when the size is 
too big?

>
> With this assumption, I remember Peter added memory_replay in context invalidation.
> This memory_replay would walk guest second-level page table and do map. So the
> second-level page table in host should be able to get the latest mapping info. Guest
> IOMMU driver would issue an DSI/GLOBAL flush after changing context. Since the
> mapping info has updated in host, then there is no need to deal this DSI/GLOBAL flush.
>
> So gentlemen, pls help judge if the assumption is correct. If it is correct, then Peter's patch
> may just work without special process against DSI/GLOBAL flush.
>   
> Regards,
> Yi L

Even if this may be the usual case, I think we'd better not make the 
codes depends on (usual) guest behaviors.

Thanks

  reply	other threads:[~2017-01-19  7:02 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-13  3:06 [Qemu-devel] [PATCH RFC v3 00/14] VT-d: vfio enablement and misc enhances Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 01/14] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Peter Xu
2017-01-20  8:32   ` Tian, Kevin
2017-01-20  8:54     ` Peter Xu
2017-01-20  8:59       ` Tian, Kevin
2017-01-20  9:11         ` Peter Xu
2017-01-20  9:20           ` Tian, Kevin
2017-01-20  9:30             ` Peter Xu
2017-01-20 15:42   ` Eric Blake
2017-01-22  2:32     ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 02/14] intel_iommu: simplify irq region translation Peter Xu
2017-01-20  8:22   ` Tian, Kevin
2017-01-20  9:05     ` Peter Xu
2017-01-20  9:15       ` Tian, Kevin
2017-01-20  9:27         ` Peter Xu
2017-01-20  9:52           ` Tian, Kevin
2017-01-20 10:04             ` Peter Xu
2017-01-22  4:42               ` Tian, Kevin
2017-01-22  4:50                 ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 03/14] intel_iommu: renaming gpa to iova where proper Peter Xu
2017-01-20  8:27   ` Tian, Kevin
2017-01-20  9:23     ` Peter Xu
2017-01-20  9:41       ` Tian, Kevin
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 04/14] intel_iommu: fix trace for inv desc handling Peter Xu
2017-01-13  7:46   ` Jason Wang
2017-01-13  9:13     ` Peter Xu
2017-01-13  9:33       ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 05/14] intel_iommu: fix trace for addr translation Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 06/14] intel_iommu: vtd_slpt_level_shift check level Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 07/14] memory: add section range info for IOMMU notifier Peter Xu
2017-01-13  7:55   ` Jason Wang
2017-01-13  9:23     ` Peter Xu
2017-01-13  9:37       ` Jason Wang
2017-01-13 10:22         ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 08/14] memory: provide iommu_replay_all() Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 09/14] memory: introduce memory_region_notify_one() Peter Xu
2017-01-13  7:58   ` Jason Wang
2017-01-16  7:08     ` Peter Xu
2017-01-16  7:38       ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 10/14] memory: add MemoryRegionIOMMUOps.replay() callback Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 11/14] intel_iommu: provide its own replay() callback Peter Xu
2017-01-13  9:26   ` Jason Wang
2017-01-16  7:31     ` Peter Xu
2017-01-16  7:47       ` Jason Wang
2017-01-16  7:59         ` Peter Xu
2017-01-16  8:03           ` Jason Wang
2017-01-16  8:06             ` Peter Xu
2017-01-16  8:23               ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 12/14] intel_iommu: do replay when context invalidate Peter Xu
2017-01-16  5:53   ` Jason Wang
2017-01-16  7:43     ` Peter Xu
2017-01-16  7:52       ` Jason Wang
2017-01-16  8:02         ` Peter Xu
2017-01-16  8:18         ` Peter Xu
2017-01-16  8:28           ` Jason Wang
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2017-01-16  6:20   ` Jason Wang
2017-01-16  7:50     ` Peter Xu
2017-01-16  8:01       ` Jason Wang
2017-01-16  8:12         ` Peter Xu
2017-01-16  8:25           ` Jason Wang
2017-01-16  8:32             ` Peter Xu
2017-01-16 16:25               ` Michael S. Tsirkin
2017-01-17 14:53                 ` Peter Xu
2017-01-16 19:53   ` Alex Williamson
2017-01-17 14:00     ` Peter Xu
2017-01-17 15:46       ` Alex Williamson
2017-01-18  7:49         ` Peter Xu
2017-01-19  8:20           ` Peter Xu
2017-01-13  3:06 ` [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices Peter Xu
2017-01-16  6:30   ` Jason Wang
2017-01-16  9:18     ` Peter Xu
2017-01-16  9:54       ` Jason Wang
2017-01-17 14:45         ` Peter Xu
2017-01-18  3:10           ` Jason Wang
2017-01-18  8:11             ` Peter Xu
2017-01-18  8:36               ` Jason Wang
2017-01-18  8:46                 ` Peter Xu
2017-01-18  9:38                   ` Tian, Kevin
2017-01-18 10:06                     ` Jason Wang
2017-01-19  3:32                       ` Peter Xu
2017-01-19  3:36                         ` Jason Wang
2017-01-19  3:16                     ` Peter Xu
2017-01-19  6:22                       ` Tian, Kevin
2017-01-19  9:38                         ` Peter Xu
2017-01-19  6:44                     ` Liu, Yi L
2017-01-19  7:02                       ` Jason Wang [this message]
2017-01-19  7:02                       ` Peter Xu
2017-01-16  9:20     ` Peter Xu
2017-01-13 15:58 ` [Qemu-devel] [PATCH RFC v3 00/14] VT-d: vfio enablement and misc enhances Michael S. Tsirkin
2017-01-14  2:59   ` Peter Xu
2017-01-17 15:07     ` Michael S. Tsirkin
2017-01-18  7:34       ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4103ea92-e713-9114-5be9-7e698fc14631@redhat.com \
    --to=jasowang@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bd.aviv@gmail.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.