All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liu, Yuan1" <yuan1.liu@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"farosas@suse.de" <farosas@suse.de>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"hao.xiang@bytedance.com" <hao.xiang@bytedance.com>,
	"bryan.zhang@bytedance.com" <bryan.zhang@bytedance.com>,
	"Zou, Nanhai" <nanhai.zou@intel.com>
Subject: RE: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression
Date: Fri, 22 Mar 2024 02:06:32 +0000	[thread overview]
Message-ID: <PH7PR11MB594128C3B1C9ED66482DFEDEA3312@PH7PR11MB5941.namprd11.prod.outlook.com> (raw)
In-Reply-To: <ZfxSAgJECCOqtGRh@x1n>

> -----Original Message-----
> From: Peter Xu <peterx@redhat.com>
> Sent: Thursday, March 21, 2024 11:28 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; farosas@suse.de; qemu-
> devel@nongnu.org; hao.xiang@bytedance.com; bryan.zhang@bytedance.com; Zou,
> Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of
> qpl compression
> 
> On Thu, Mar 21, 2024 at 01:37:36AM +0000, Liu, Yuan1 wrote:
> > > -----Original Message-----
> > > From: Peter Xu <peterx@redhat.com>
> > > Sent: Thursday, March 21, 2024 4:32 AM
> > > To: Liu, Yuan1 <yuan1.liu@intel.com>
> > > Cc: Daniel P. Berrangé <berrange@redhat.com>; farosas@suse.de; qemu-
> > > devel@nongnu.org; hao.xiang@bytedance.com; bryan.zhang@bytedance.com;
> Zou,
> > > Nanhai <nanhai.zou@intel.com>
> > > Subject: Re: [PATCH v5 5/7] migration/multifd: implement
> initialization of
> > > qpl compression
> > >
> > > On Wed, Mar 20, 2024 at 04:23:01PM +0000, Liu, Yuan1 wrote:
> > > > let me explain here, during the decompression operation of IAA, the
> > > > decompressed data can be directly output to the virtual address of
> the
> > > > guest memory by IAA hardware.  It can avoid copying the decompressed
> > > data
> > > > to guest memory by CPU.
> > >
> > > I see.
> > >
> > > > Without -mem-prealloc, all the guest memory is not populated, and
> IAA
> > > > hardware needs to trigger I/O page fault first and then output the
> > > > decompressed data to the guest memory region.  Besides that, CPU
> page
> > > > faults will also trigger IOTLB flush operation when IAA devices use
> SVM.
> > >
> > > Oh so the IAA hardware already can use CPU pgtables?  Nice..
> > >
> > > Why IOTLB flush is needed?  AFAIU we're only installing new pages, the
> > > request can either come from a CPU access or a DMA.  In all cases
> there
> > > should have no tearing down of an old page.  Isn't an iotlb flush only
> > > needed if a tear down happens?
> >
> > As far as I know, IAA hardware uses SVM technology to use the CPU's page
> table
> > for address translation (IOMMU scalable mode directly accesses the CPU
> page table).
> > Therefore, when the CPU page table changes, the device's Invalidation
> operation needs
> > to be triggered to update the IOMMU and the device's cache.
> >
> > My current kernel version is mainline 6.2. The issue I see is as
> follows:
> > --Handle_mm_fault
> >  |
> >   -- wp_page_copy
> 
> This is the CoW path.  Not usual at all..
> 
> I assume this issue should only present on destination.  Then the guest
> pages should be the destination of such DMAs to happen, which means these
> should be write faults, and as we see here it is, otherwise it won't
> trigger a CoW.
> 
> However it's not clear to me why a pre-installed zero page existed.  It
> means someone read the guest pages first.
> 
> It might be interesting to know _why_ someone reads the guest pages, even
> if we know they're all zeros.  If we can avoid such reads then it'll be a
> hole rather than a prefaulted read on zero page, then invalidations are
> not
> needed, and I expect that should fix the iotlb storm issue.

The received pages will be read for zero pages check first. Although
these pages are zero pages, and IAA hardware will not access them, the
COW happens and causes following IOTLB flush operation. As far as I know, 
IOMMU quickly detects whether the address range has been used by the device,
and does not invalidate the address that is not used by the device, this has 
not yet been resolved in Linux kernel 6.2. I will check the latest status for
this.
void multifd_recv_zero_page_process(MultiFDRecvParams *p)
{
    for (int i = 0; i < p->zero_num; i++) {
        void *page = p->host + p->zero[i];
        if (!buffer_is_zero(page, p->page_size)) {
            memset(page, 0, p->page_size);
        }
    }
}


> It'll still be good we can fix this first to not make qpl special from
> this
> regard, so that the hope is migration submodule shouldn't rely on any
> pre-config (-mem-prealloc) on guest memory behaviors to work properly.

Even if the IOTLB problem can be avoided, the I/O page fault problem (normal
pages are loaded by the IAA device and solving normal page faults through IOMMU,
the performance is not good)

It can let the decompressed data of the IAA device be output to a pre-populated
memory instead of directly outputting to the guest address, but then each multifd
thread needs two memory copies, one copy from the network to the IAA input 
memory(pre-populated), and another copy from the IAA output memory(pre-populated)
to the guest address, which may become a performance bottleneck at the destination
during the live migration process.

So I think it is still necessary to use the -mem-prealloc option

> >     -- mmu_notifier_invalidate_range
> >       |
> >       -- intel_invalidate_rage
> >         |
> >         -- qi_flush_piotlb
> >         -- qi_flush_dev_iotlb_pasid
> 
> --
> Peter Xu


  reply	other threads:[~2024-03-22  2:07 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-19 16:45 [PATCH v5 0/7] Live Migration With IAA Yuan Liu
2024-03-19 16:45 ` [PATCH v5 1/7] docs/migration: add qpl compression feature Yuan Liu
2024-03-26 17:58   ` Peter Xu
2024-03-27  2:14     ` Liu, Yuan1
2024-03-19 16:45 ` [PATCH v5 2/7] migration/multifd: put IOV initialization into compression method Yuan Liu
2024-03-20 15:18   ` Fabiano Rosas
2024-03-20 15:32     ` Liu, Yuan1
2024-03-19 16:45 ` [PATCH v5 3/7] configure: add --enable-qpl build option Yuan Liu
2024-03-20  8:55   ` Thomas Huth
2024-03-20  8:56     ` Thomas Huth
2024-03-20 14:34       ` Liu, Yuan1
2024-03-20 10:31   ` Daniel P. Berrangé
2024-03-20 14:42     ` Liu, Yuan1
2024-03-19 16:45 ` [PATCH v5 4/7] migration/multifd: add qpl compression method Yuan Liu
2024-03-27 19:49   ` Peter Xu
2024-03-28  3:03     ` Liu, Yuan1
2024-03-19 16:45 ` [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression Yuan Liu
2024-03-20 10:42   ` Daniel P. Berrangé
2024-03-20 15:02     ` Liu, Yuan1
2024-03-20 15:20       ` Daniel P. Berrangé
2024-03-20 16:04         ` Liu, Yuan1
2024-03-20 15:34       ` Peter Xu
2024-03-20 16:23         ` Liu, Yuan1
2024-03-20 20:31           ` Peter Xu
2024-03-21  1:37             ` Liu, Yuan1
2024-03-21 15:28               ` Peter Xu
2024-03-22  2:06                 ` Liu, Yuan1 [this message]
2024-03-22 14:47                   ` Liu, Yuan1
2024-03-22 16:40                     ` Peter Xu
2024-03-27 19:25                       ` Peter Xu
2024-03-28  2:32                         ` Liu, Yuan1
2024-03-28 15:16                           ` Peter Xu
2024-03-29  2:04                             ` Liu, Yuan1
2024-03-19 16:45 ` [PATCH v5 6/7] migration/multifd: implement qpl compression and decompression Yuan Liu
2024-03-19 16:45 ` [PATCH v5 7/7] tests/migration-test: add qpl compression test Yuan Liu
2024-03-20 10:45   ` Daniel P. Berrangé
2024-03-20 15:30     ` Liu, Yuan1
2024-03-20 15:39       ` Daniel P. Berrangé
2024-03-20 16:26         ` Liu, Yuan1
2024-03-26 20:30 ` [PATCH v5 0/7] Live Migration With IAA Peter Xu
2024-03-27  3:20   ` Liu, Yuan1
2024-03-27 19:46     ` Peter Xu
2024-03-28  3:02       ` Liu, Yuan1
2024-03-28 15:22         ` Peter Xu
2024-03-29  3:33           ` Liu, Yuan1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH7PR11MB594128C3B1C9ED66482DFEDEA3312@PH7PR11MB5941.namprd11.prod.outlook.com \
    --to=yuan1.liu@intel.com \
    --cc=berrange@redhat.com \
    --cc=bryan.zhang@bytedance.com \
    --cc=farosas@suse.de \
    --cc=hao.xiang@bytedance.com \
    --cc=nanhai.zou@intel.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.