IOMMU Archive on lore.kernel.org
 help / color / Atom feed
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Peter Xu <peterx@redhat.com>, "Tian, Kevin" <kevin.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest
Date: Thu, 26 Sep 2019 09:37:05 +0800
Message-ID: <b14a39f3-6aba-8fd2-757a-c244dcbe7b6b@linux.intel.com> (raw)
In-Reply-To: <20190925085204.GR28074@xz-x1>

Hi Peter,

On 9/25/19 4:52 PM, Peter Xu wrote:
> On Wed, Sep 25, 2019 at 08:02:23AM +0000, Tian, Kevin wrote:
>>> From: Peter Xu [mailto:peterx@redhat.com]
>>> Sent: Wednesday, September 25, 2019 3:45 PM
>>>
>>> On Wed, Sep 25, 2019 at 07:21:51AM +0000, Tian, Kevin wrote:
>>>>> From: Peter Xu [mailto:peterx@redhat.com]
>>>>> Sent: Wednesday, September 25, 2019 2:57 PM
>>>>>
>>>>> On Wed, Sep 25, 2019 at 10:48:32AM +0800, Lu Baolu wrote:
>>>>>> Hi Kevin,
>>>>>>
>>>>>> On 9/24/19 3:00 PM, Tian, Kevin wrote:
>>>>>>>>>>        '-----------'
>>>>>>>>>>        '-----------'
>>>>>>>>>>
>>>>>>>>>> This patch series only aims to achieve the first goal, a.k.a using
>>>>>>> first goal? then what are other goals? I didn't spot such information.
>>>>>>>
>>>>>>
>>>>>> The overall goal is to use IOMMU nested mode to avoid shadow page
>>>>> table
>>>>>> and VMEXIT when map an gIOVA. This includes below 4 steps (maybe
>>> not
>>>>>> accurate, but you could get the point.)
>>>>>>
>>>>>> 1) GIOVA mappings over 1st-level page table;
>>>>>> 2) binding vIOMMU 1st level page table to the pIOMMU;
>>>>>> 3) using pIOMMU second level for GPA->HPA translation;
>>>>>> 4) enable nested (a.k.a. dual stage) translation in host.
>>>>>>
>>>>>> This patch set aims to achieve 1).
>>>>>
>>>>> Would it make sense to use 1st level even for bare-metal to replace
>>>>> the 2nd level?
>>>>>
>>>>> What I'm thinking is the DPDK apps - they have MMU page table already
>>>>> there for the huge pages, then if they can use 1st level as the
>>>>> default device page table then it even does not need to map, because
>>>>> it can simply bind the process root page table pointer to the 1st
>>>>> level page root pointer of the device contexts that it uses.
>>>>>
>>>>
>>>> Then you need bear with possible page faults from using CPU page
>>>> table, while most devices don't support it today.
>>>
>>> Right, I was just thinking aloud.  After all neither do we have IOMMU
>>> hardware to support 1st level (or am I wrong?)...  It's just that when
>>
>> You are right. Current VT-d supports only 2nd level.
>>
>>> the 1st level is ready it should sound doable because IIUC PRI should
>>> be always with the 1st level support no matter on IOMMU side or the
>>> device side?
>>
>> No. PRI is not tied to 1st or 2nd level. Actually from device p.o.v, it's
>> just a protocol to trigger page fault, but the device doesn't care whether
>> the page fault is on 1st or 2nd level in the IOMMU side. The only
>> relevant part is that a PRI request can have PASID tagged or cleared.
>> When it's tagged with PASID, the IOMMU will locate the translation
>> table under the given PASID (either 1st or 2nd level is fine, according
>> to PASID entry setting). When no PASID is included, the IOMMU locates
>> the translation from default entry (e.g. PASID#0 or any PASID contained
>> in RID2PASID in VT-d).
>>
>> Your knowledge happened to be correct in deprecated ECS mode. At
>> that time, there is only one 2nd level per context entry which doesn't
>> support page fault, and there is only one 1st level per PASID entry which
>> supports page fault. Then PRI could be indirectly connected to 1st level,
>> but this just changed with new scalable mode.
>>
>> Another note is that the PRI capability only indicates that a device is
>> capable of handling page faults, but not that a device can tolerate
>> page fault for any of its DMA access. If the latter is fasle, using CPU
>> page table for DPDK usage is still risky (and specific to device behavior)
>>
>>>
>>> I'm actually not sure about whether my understanding here is
>>> correct... I thought the pasid binding previously was only for some
>>> vendor kernel drivers but not a general thing to userspace.  I feel
>>> like that should be doable in the future once we've got some new
>>> syscall interface ready to deliver 1st level page table (e.g., via
>>> vfio?) then applications like DPDK seems to be able to use that too
>>> even directly via bare metal.
>>>
>>
>> using 1st level for userspace is different from supporting DMA page
>> fault in userspace. The former is purely about which structure to
>> keep the mapping. I think we may do the same thing for both bare
>> metal and guest (using 2nd level only for GPA when nested is enabled
>> on the IOMMU). But reusing CPU page table for userspace is more
>> tricky. :-)
> 
> Yes I should have mixed up the 1st level page table and PRI a bit, and
> after all my initial question should be irrelevant to this series as
> well so it's already a bit out of topic (sorry for that).

Never mind. Good discussion. :-)

Actually I have plan to use 1st level on bare metal as well. Just
looking forward to more motivation and use cases.

> 
> And, thanks for explaining these. :)
> 

Thanks for Kevin's explanation. :-)

Best regards,
Baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply index

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-23 12:24 Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 1/4] iommu/vt-d: Move domain_flush_cache helper into header Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces Lu Baolu
2019-09-23 20:31   ` Raj, Ashok
2019-09-24  1:38     ` Lu Baolu
2019-09-25  4:30       ` Peter Xu
2019-09-25  4:38         ` Tian, Kevin
2019-09-25  5:24           ` Peter Xu
2019-09-25  6:52             ` Lu Baolu
2019-09-25  7:32               ` Tian, Kevin
2019-09-25  8:35                 ` Peter Xu
2019-09-26  1:42                 ` Lu Baolu
2019-09-25  5:21   ` Peter Xu
2019-09-26  2:35     ` Lu Baolu
2019-09-26  3:49       ` Peter Xu
2019-09-27  2:27         ` Lu Baolu
2019-09-27  5:34           ` Peter Xu
2019-09-28  8:23             ` Lu Baolu
2019-09-29  5:25               ` Peter Xu
2019-10-08  2:20                 ` Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 3/4] iommu/vt-d: Map/unmap domain with mmmap/mmunmap Lu Baolu
2019-09-25  5:00   ` Tian, Kevin
2019-09-25  7:06     ` Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 4/4] iommu/vt-d: Identify domains using first level page table Lu Baolu
2019-09-25  6:50   ` Peter Xu
2019-09-25  7:35     ` Tian, Kevin
2019-09-23 19:27 ` [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest Jacob Pan
2019-09-23 20:25   ` Raj, Ashok
2019-09-24  4:40     ` Lu Baolu
2019-09-24  7:00     ` Tian, Kevin
2019-09-25  2:48       ` Lu Baolu
2019-09-25  6:56         ` Peter Xu
2019-09-25  7:21           ` Tian, Kevin
2019-09-25  7:45             ` Peter Xu
2019-09-25  8:02               ` Tian, Kevin
2019-09-25  8:52                 ` Peter Xu
2019-09-26  1:37                   ` Lu Baolu [this message]
2019-09-24  4:27   ` Lu Baolu

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b14a39f3-6aba-8fd2-757a-c244dcbe7b6b@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org
	public-inbox-index linux-iommu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git