QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Yan Zhao" <yan.y.zhao@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	qemu-devel@nongnu.org, "Eugenio Pérez" <eperezma@redhat.com>,
	"Eric Auger" <eric.auger@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
Date: Thu, 2 Jul 2020 11:01:54 +0800
Message-ID: <ff9e7af0-18c4-57e8-fc94-904fdce1123a@redhat.com> (raw)
In-Reply-To: <69f6d6e7-a0b1-abae-894e-4e81b7e0cc90@redhat.com>


On 2020/7/1 下午4:09, Jason Wang wrote:
>
> On 2020/6/30 下午11:39, Peter Xu wrote:
>> On Tue, Jun 30, 2020 at 10:41:10AM +0800, Jason Wang wrote:
>>>>       /* According to ATS spec table 2.4:
>>>>        * S = 0, bits 15:12 = xxxx     range size: 4K
>>>>        * S = 1, bits 15:12 = xxx0     range size: 8K
>>>>        * S = 1, bits 15:12 = xx01     range size: 16K
>>>>        * S = 1, bits 15:12 = x011     range size: 32K
>>>>        * S = 1, bits 15:12 = 0111     range size: 64K
>>>>        * ...
>>>>        */
>>>
>>> Right, but the comment is probably misleading here, since it's for 
>>> the PCI-E
>>> transaction between IOMMU and device not for the device IOTLB 
>>> invalidation
>>> descriptor.
>>>
>>> For device IOTLB invalidation descriptor, spec allows a [0, ~0ULL]
>>> invalidation:
>>>
>>> "
>>>
>>> 6.5.2.5 Device-TLB Invalidate Descriptor
>>>
>>> ...
>>>
>>> Size (S): The size field indicates the number of consecutive pages 
>>> targeted
>>> by this invalidation
>>> request. If S field is zero, a single page at page address specified by
>>> Address [63:12] is requested
>>> to be invalidated. If S field is Set, the least significant bit in the
>>> Address field with value 0b
>>> indicates the invalidation address range. For example, if S field is 
>>> Set and
>>> Address[12] is Clear, it
>>> indicates an 8KB invalidation address range with base address in 
>>> Address
>>> [63:13]. If S field and
>>> Address[12] is Set and bit 13 is Clear, it indicates a 16KB 
>>> invalidation
>>> address range with base
>>> address in Address [63:14], etc.
>>>
>>> "
>>>
>>> So if we receive an address whose [63] is 0 and the rest is all 1, 
>>> it's then
>>> a [0, ~0ULL] invalidation.
>> Yes.  I think invalidating the whole range is always fine.  It's 
>> still not
>> arbitrary, right?  E.g., we can't even invalidate (0x1000, 0x3000) with
>> device-iotlb because of the address mask, not to say sub-pages.
>
>
> Yes.
>
>
>>
>>>
>>>>>>> How about just convert to use a range [start, end] for any 
>>>>>>> notifier and move
>>>>>>> the checks (e.g the assert) into the actual notifier implemented 
>>>>>>> (vhost or
>>>>>>> vfio)?
>>>>>> IOMMUTLBEntry itself is the abstraction layer of TLB entry.  
>>>>>> Hardware TLB entry
>>>>>> is definitely not arbitrary range either (because AFAICT the 
>>>>>> hardware should
>>>>>> only cache PFN rather than address, so at least PAGE_SIZE aligned).
>>>>>> Introducing this flag will already make this trickier just to 
>>>>>> avoid introducing
>>>>>> another similar struct to IOMMUTLBEntry, but I really don't want 
>>>>>> to make it a
>>>>>> default option...  Not to mention I probably have no reason to 
>>>>>> urge the rest
>>>>>> iommu notifier users (tcg, vfio) to change their existing good 
>>>>>> code to suite
>>>>>> any of the backend who can cooperate with arbitrary address 
>>>>>> ranges...
>>>>> Ok, so it looks like we need a dedicated notifiers to device IOTLB.
>>>> Or we can also make a new flag for device iotlb just like current 
>>>> UNMAP? Then
>>>> we replace the vhost type from UNMAP to DEVICE_IOTLB.  But IMHO 
>>>> using the
>>>> ARBITRARY_LENGTH flag would work in a similar way. DEVICE_IOTLB 
>>>> flag could
>>>> also allow virtio/vhost to only receive one invalidation (now IIUC 
>>>> it'll
>>>> receive both iotlb and device-iotlb for unmapping a page when 
>>>> ats=on), but then
>>>> ats=on will be a must and it could break some old (misconfiged) 
>>>> qemu because
>>>> afaict previously virtio/vhost could even work with vIOMMU 
>>>> (accidentally) even
>>>> without ats=on.
>>>
>>> That's a bug and I don't think we need to workaround 
>>> mis-configurated qemu
>>> :)
>> IMHO it depends on the strictness we want on the qemu cmdline API. :)
>>
>> We should at least check libvirt to make sure it's using ats=on 
>> always, then I
>> agree maybe we can avoid considering the rest...
>>
>> Thanks,
>
>
> Cc libvirt list, but I think we should fix libvirt if they don't 
> provide "ats=on".
>
> Thanks


Libvirt looks fine, according to the domain  XML documentation[1]:

  QEMU's virtio devices have some attributes related to the virtio 
transport under the driver element: The iommu attribute enables the use 
of emulated IOMMU by the device. The attribute ats controls the Address 
Translation Service support for PCIe devices. This is needed to make use 
of IOTLB support (see IOMMU device). Possible values are on or off. 
Since 3.5.0

So I think we agree that a new notifier is needed?

Thanks

[1] https://libvirt.org/formatdomain.html


>
>
>



  reply index

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26  6:41 [RFC v2 0/1] " Eugenio Pérez
2020-06-26  6:41 ` [RFC v2 1/1] " Eugenio Pérez
2020-06-26 21:29   ` Peter Xu
2020-06-27  7:26     ` Yan Zhao
2020-06-27 12:57       ` Peter Xu
2020-06-28  1:36         ` Yan Zhao
2020-06-28  7:03     ` Jason Wang
2020-06-28 14:47       ` Peter Xu
2020-06-29  5:51         ` Jason Wang
2020-06-29 13:34           ` Peter Xu
2020-06-30  2:41             ` Jason Wang
2020-06-30  8:29               ` Jason Wang
2020-06-30  9:21                 ` Michael S. Tsirkin
2020-06-30  9:23                   ` Jason Wang
2020-06-30 15:20                     ` Peter Xu
2020-07-01  8:11                       ` Jason Wang
2020-07-01 12:16                         ` Peter Xu
2020-07-01 12:30                           ` Jason Wang
2020-07-01 12:41                             ` Peter Xu
2020-07-02  3:00                               ` Jason Wang
2020-06-30 15:39               ` Peter Xu
2020-07-01  8:09                 ` Jason Wang
2020-07-02  3:01                   ` Jason Wang [this message]
2020-07-02 15:45                     ` Peter Xu
2020-07-03  7:24                       ` Jason Wang
2020-07-03 13:03                         ` Peter Xu
2020-07-07  8:03                           ` Jason Wang
2020-07-07 19:54                             ` Peter Xu
2020-07-08  5:42                               ` Jason Wang
2020-07-08 14:16                                 ` Peter Xu
2020-07-09  5:58                                   ` Jason Wang
2020-07-09 14:10                                     ` Peter Xu
2020-07-10  6:34                                       ` Jason Wang
2020-07-10 13:30                                         ` Peter Xu
2020-07-13  4:04                                           ` Jason Wang
2020-07-16  1:00                                             ` Peter Xu
2020-07-16  2:54                                               ` Jason Wang
2020-07-17 14:18                                                 ` Peter Xu
2020-07-20  4:02                                                   ` Jason Wang
2020-07-20 13:03                                                     ` Peter Xu
2020-07-21  6:20                                                       ` Jason Wang
2020-07-21 15:10                                                         ` Peter Xu
2020-08-03 16:00                         ` Eugenio Pérez
2020-08-04 20:30                           ` Peter Xu
2020-08-05  5:45                             ` Jason Wang
2020-06-29 15:05 ` [RFC v2 0/1] " Paolo Bonzini
2020-07-03  7:39   ` Eugenio Perez Martin
2020-07-03 10:10     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff9e7af0-18c4-57e8-fc94-904fdce1123a@redhat.com \
    --to=jasowang@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git