All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Wang <jasowang@redhat.com>
Cc: 'Alex Williamson' <alex.williamson@redhat.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] vhost, iova, and dirty page tracking
Date: Wed, 18 Sep 2019 01:44:28 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19D57B1D1@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <8302a4ae-1914-3046-b3b5-b3234d7dda02@redhat.com>

> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Tuesday, September 17, 2019 6:36 PM
> 
> On 2019/9/17 下午4:48, Tian, Kevin wrote:
> >> From: Jason Wang [mailto:jasowang@redhat.com]
> >> Sent: Monday, September 16, 2019 4:33 PM
> >>
> >>
> >> On 2019/9/16 上午9:51, Tian, Kevin wrote:
> >>> Hi, Jason
> >>>
> >>> We had a discussion about dirty page tracking in VFIO, when vIOMMU
> >>> is enabled:
> >>>
> >>> https://lists.nongnu.org/archive/html/qemu-devel/2019-
> >> 09/msg02690.html
> >>> It's actually a similar model as vhost - Qemu cannot interpose the fast-
> >> path
> >>> DMAs thus relies on the kernel part to track and report dirty page
> >> information.
> >>> Currently Qemu tracks dirty pages in GFN level, thus demanding a
> >> translation
> >>> from IOVA to GPA. Then the open in our discussion is where this
> >> translation
> >>> should happen. Doing the translation in kernel implies a device iotlb
> >> flavor,
> >>> which is what vhost implements today. It requires potentially large
> >> tracking
> >>> structures in the host kernel, but leveraging the existing log_sync flow
> in
> >> Qemu.
> >>> On the other hand, Qemu may perform log_sync for every removal of
> >> IOVA
> >>> mapping and then do the translation itself, then avoiding the GPA
> >> awareness
> >>> in the kernel side. It needs some change to current Qemu log-sync flow,
> >> and
> >>> may bring more overhead if IOVA is frequently unmapped.
> >>>
> >>> So we'd like to hear about your opinions, especially about how you
> came
> >>> down to the current iotlb approach for vhost.
> >>
> >> We don't consider too much in the point when introducing vhost. And
> >> before IOTLB, vhost has already know GPA through its mem table
> >> (GPA->HVA). So it's nature and easier to track dirty pages at GPA level
> >> then it won't any changes in the existing ABI.
> > This is the same situation as VFIO.
> >
> >> For VFIO case, the only advantages of using GPA is that the log can then
> >> be shared among all the devices that belongs to the VM. Otherwise
> >> syncing through IOVA is cleaner.
> > I still worry about the potential performance impact with this approach.
> > In current mdev live migration series, there are multiple system calls
> > involved when retrieving the dirty bitmap information for a given memory
> > range.
> 
> 
> I haven't took a deep look at that series. Technically dirty bitmap
> could be shared between device and driver, then there's no system call
> in synchronization.

That series require Qemu to tell the kernel about the information
about queried region (start, number, and page_size), read
the information about the dirty bitmap (offset, size) and then read
the dirty bitmap. Although the bitmap can be mmaped thus shared, 
earlier reads/writes are conducted by pread/pwrite system calls.
This design is fine for current log_dirty implementation, where 
dirty bitmap is synced in every pre-copy round. But to do it for
every IOVA unmap, it's definitely over-killed. 

> 
> 
> > IOVA mappings might be changed frequently. Though one may
> > argue that frequent IOVA change already has bad performance, it's still
> > not good to introduce further non-negligible overhead in such situation.
> 
> 
> Yes, it depends on the behavior of vIOMMU driver, e.g the frequency and
> granularity of the flushing.
> 
> 
> >
> > On the other hand, I realized that adding IOVA awareness in VFIO is
> > actually easy. Today VFIO already maintains a full list of IOVA and its
> > associated HVA in vfio_dma structure, according to VFIO_MAP and
> > VFIO_UNMAP. As long as we allow the latter two operations to accept
> > another parameter (GPA), IOVA->GPA mapping can be naturally cached
> > in existing vfio_dma objects.
> 
> 
> Note that the HVA to GPA mapping is not an 1:1 mapping. One HVA range
> could be mapped to several GPA ranges.

This is fine. Currently vfio_dma maintains IOVA->HVA mapping.

btw under what condition HVA->GPA is not 1:1 mapping? I didn't realize it.

> 
> 
> >   Those objects are always updated according
> > to MAP and UNMAP ioctls to be up-to-date. Qemu then uniformly
> > retrieves the VFIO dirty bitmap for the entire GPA range in every pre-copy
> > round, regardless of whether vIOMMU is enabled. There is no need of
> > another IOTLB implementation, with the main ask on a v2 MAP/UNMAP
> > interface.
> 
> 
> Or provide GPA to HVA mapping as vhost did. But a question is, I believe
> device can only do dirty page logging through IOVA. So how do you handle
> the case when IOVA is removed in this case?
> 

That's why a log_sync is required each time when IOVA is unmapped, in
Alex's thought. 

Thanks
Kevin

  reply	other threads:[~2019-09-18  1:45 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-16  1:51 [Qemu-devel] vhost, iova, and dirty page tracking Tian, Kevin
2019-09-16  8:33 ` Jason Wang
2019-09-17  8:48   ` Tian, Kevin
2019-09-17 10:36     ` Jason Wang
2019-09-18  1:44       ` Tian, Kevin [this message]
2019-09-18  6:10         ` Jason Wang
2019-09-18  7:41           ` Tian, Kevin
2019-09-18  8:37           ` Tian, Kevin
2019-09-19  1:05             ` Jason Wang
2019-09-19  5:28               ` Yan Zhao
2019-09-19  6:09                 ` Jason Wang
2019-09-19  6:17                   ` Yan Zhao
2019-09-19  6:32                     ` Jason Wang
2019-09-19  6:29                       ` Yan Zhao
2019-09-19  6:32                         ` Yan Zhao
2019-09-19  9:35                           ` Jason Wang
2019-09-19  9:36                             ` Yan Zhao
2019-09-19 10:08                               ` Jason Wang
2019-09-19 10:06                         ` Jason Wang
2019-09-19 10:16                           ` Yan Zhao
2019-09-19 12:14                             ` Jason Wang
2019-09-19  7:16                       ` Tian, Kevin
2019-09-19  9:37                         ` Jason Wang
2019-09-19 14:06                           ` Michael S. Tsirkin
2019-09-20  1:15                             ` Jason Wang
2019-09-20 10:02                               ` Michael S. Tsirkin
2019-09-19 11:14                         ` Paolo Bonzini
2019-09-19 12:39                           ` Jason Wang
2019-09-19 12:45                             ` Paolo Bonzini
2019-09-19 22:54                           ` Tian, Kevin
2019-09-20  1:18                             ` Jason Wang
2019-09-24  2:02                               ` Tian, Kevin
2019-09-25  3:46                                 ` Jason Wang
2019-09-17 14:54     ` Alex Williamson
2019-09-18  1:31       ` Tian, Kevin
2019-09-18  6:03         ` Jason Wang
2019-09-18  7:21           ` Tian, Kevin
2019-09-19 17:20             ` Alex Williamson
2019-09-19 22:40               ` Tian, Kevin
     [not found]       ` <AADFC41AFE54684AB9EE6CBC0274A5D19D57AFB7@SHSMSX104.ccr.corp.intel.com>
2019-09-18  2:15         ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D19D57B1D1@SHSMSX104.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.