All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Lan, Tianyu" <tianyu.lan@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, emil.s.tantilov@intel.com,
	kvm@vger.kernel.org, ard.biesheuvel@linaro.org, aik@ozlabs.ru,
	donald.c.skidmore@intel.com, quintela@redhat.com,
	eddie.dong@intel.com, nrupal.jani@intel.com, agraf@suse.de,
	blauwirbel@gmail.com, cornelia.huck@de.ibm.com,
	alex.williamson@redhat.com, kraxel@redhat.com,
	anthony@codemonkey.ws, amit.shah@redhat.com, pbonzini@redhat.com,
	mark.d.rustad@intel.com, lcapitulino@redhat.com,
	gerlitz.or@gmail.com
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Thu, 10 Dec 2015 10:18:40 +0000	[thread overview]
Message-ID: <20151210101840.GA2570@work-vm> (raw)
In-Reply-To: <56685631.50700@intel.com>

* Lan, Tianyu (tianyu.lan@intel.com) wrote:
> On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:
> >I thought about what this is doing at the high level, and I do have some
> >value in what you are trying to do, but I also think we need to clarify
> >the motivation a bit more.  What you are saying is not really what the
> >patches are doing.
> >
> >And with that clearer understanding of the motivation in mind (assuming
> >it actually captures a real need), I would also like to suggest some
> >changes.
> 
> Motivation:
> Most current solutions for migration with passthough device are based on
> the PCI hotplug but it has side affect and can't work for all device.
> 
> For NIC device:
> PCI hotplug solution can work around Network device migration
> via switching VF and PF.
> 
> But switching network interface will introduce service down time.
> 
> I tested the service down time via putting VF and PV interface
> into a bonded interface and ping the bonded interface during plug
> and unplug VF.
> 1) About 100ms when add VF
> 2) About 30ms when del VF
> 
> It also requires guest to do switch configuration. These are hard to
> manage and deploy from our customers. To maintain PV performance during
> migration, host side also needs to assign a VF to PV device. This
> affects scalability.
> 
> These factors block SRIOV NIC passthough usage in the cloud service and
> OPNFV which require network high performance and stability a lot.

Right, that I'll agree it's hard to do migration of a VM which uses
an SRIOV device; and while I think it should be possible to bond a virtio device
to a VF for networking and then hotplug the SR-IOV device I agree it's hard to manage.

> For other kind of devices, it's hard to work.
> We are also adding migration support for QAT(QuickAssist Technology) device.
> 
> QAT device user case introduction.
> Server, networking, big data, and storage applications use QuickAssist
> Technology to offload servers from handling compute-intensive operations,
> such as:
> 1) Symmetric cryptography functions including cipher operations and
> authentication operations
> 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
> cryptography
> 3) Compression and decompression functions including DEFLATE and LZS
> 
> PCI hotplug will not work for such devices during migration and these
> operations will fail when unplug device.

I don't understand that QAT argument; if the device is purely an offload
engine for performance, then why can't you fall back to doing the
same operations in the VM or in QEMU if the card is unavailable?
The tricky bit is dealing with outstanding operations.

> So we are trying implementing a new solution which really migrates
> device state to target machine and won't affect user during migration
> with low service down time.

Right, that's a good aim - the only question is how to do it.

It looks like this is always going to need some device-specific code;
the question I see is whether that's in:
    1) qemu
    2) the host kernel
    3) the guest kernel driver

The objections to this series seem to be that it needs changes to (3);
I can see the worry that the guest kernel driver might not get a chance
to run during the right time in migration and it's painful having to
change every guest driver (although your change is small).

My question is what stage of the migration process do you expect to tell
the guest kernel driver to do this?

    If you do it at the start of the migration, and quiesce the device,
    the migration might take a long time (say 30 minutes) - are you
    intending the device to be quiesced for this long? And where are
    you going to send the traffic?
    If you are, then do you need to do it via this PCI trick, or could
    you just do it via something higher level to quiesce the device.

    Or are you intending to do it just near the end of the migration?
    But then how do we know how long it will take the guest driver to
    respond?

It would be great if we could avoid changing the guest; but at least your guest
driver changes don't actually seem to be that hardware specific; could your
changes actually be moved to generic PCI level so they could be made
to work for lots of drivers?

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

WARNING: multiple messages have this Message-ID (diff)
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Lan, Tianyu" <tianyu.lan@intel.com>
Cc: lcapitulino@redhat.com, alex.williamson@redhat.com,
	emil.s.tantilov@intel.com, kvm@vger.kernel.org,
	ard.biesheuvel@linaro.org, aik@ozlabs.ru,
	donald.c.skidmore@intel.com,
	"Michael S. Tsirkin" <mst@redhat.com>,
	eddie.dong@intel.com, qemu-devel@nongnu.org, agraf@suse.de,
	blauwirbel@gmail.com, quintela@redhat.com, nrupal.jani@intel.com,
	kraxel@redhat.com, anthony@codemonkey.ws,
	cornelia.huck@de.ibm.com, pbonzini@redhat.com,
	mark.d.rustad@intel.com, amit.shah@redhat.com,
	gerlitz.or@gmail.com
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Thu, 10 Dec 2015 10:18:40 +0000	[thread overview]
Message-ID: <20151210101840.GA2570@work-vm> (raw)
In-Reply-To: <56685631.50700@intel.com>

* Lan, Tianyu (tianyu.lan@intel.com) wrote:
> On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:
> >I thought about what this is doing at the high level, and I do have some
> >value in what you are trying to do, but I also think we need to clarify
> >the motivation a bit more.  What you are saying is not really what the
> >patches are doing.
> >
> >And with that clearer understanding of the motivation in mind (assuming
> >it actually captures a real need), I would also like to suggest some
> >changes.
> 
> Motivation:
> Most current solutions for migration with passthough device are based on
> the PCI hotplug but it has side affect and can't work for all device.
> 
> For NIC device:
> PCI hotplug solution can work around Network device migration
> via switching VF and PF.
> 
> But switching network interface will introduce service down time.
> 
> I tested the service down time via putting VF and PV interface
> into a bonded interface and ping the bonded interface during plug
> and unplug VF.
> 1) About 100ms when add VF
> 2) About 30ms when del VF
> 
> It also requires guest to do switch configuration. These are hard to
> manage and deploy from our customers. To maintain PV performance during
> migration, host side also needs to assign a VF to PV device. This
> affects scalability.
> 
> These factors block SRIOV NIC passthough usage in the cloud service and
> OPNFV which require network high performance and stability a lot.

Right, that I'll agree it's hard to do migration of a VM which uses
an SRIOV device; and while I think it should be possible to bond a virtio device
to a VF for networking and then hotplug the SR-IOV device I agree it's hard to manage.

> For other kind of devices, it's hard to work.
> We are also adding migration support for QAT(QuickAssist Technology) device.
> 
> QAT device user case introduction.
> Server, networking, big data, and storage applications use QuickAssist
> Technology to offload servers from handling compute-intensive operations,
> such as:
> 1) Symmetric cryptography functions including cipher operations and
> authentication operations
> 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
> cryptography
> 3) Compression and decompression functions including DEFLATE and LZS
> 
> PCI hotplug will not work for such devices during migration and these
> operations will fail when unplug device.

I don't understand that QAT argument; if the device is purely an offload
engine for performance, then why can't you fall back to doing the
same operations in the VM or in QEMU if the card is unavailable?
The tricky bit is dealing with outstanding operations.

> So we are trying implementing a new solution which really migrates
> device state to target machine and won't affect user during migration
> with low service down time.

Right, that's a good aim - the only question is how to do it.

It looks like this is always going to need some device-specific code;
the question I see is whether that's in:
    1) qemu
    2) the host kernel
    3) the guest kernel driver

The objections to this series seem to be that it needs changes to (3);
I can see the worry that the guest kernel driver might not get a chance
to run during the right time in migration and it's painful having to
change every guest driver (although your change is small).

My question is what stage of the migration process do you expect to tell
the guest kernel driver to do this?

    If you do it at the start of the migration, and quiesce the device,
    the migration might take a long time (say 30 minutes) - are you
    intending the device to be quiesced for this long? And where are
    you going to send the traffic?
    If you are, then do you need to do it via this PCI trick, or could
    you just do it via something higher level to quiesce the device.

    Or are you intending to do it just near the end of the migration?
    But then how do we know how long it will take the guest driver to
    respond?

It would be great if we could avoid changing the guest; but at least your guest
driver changes don't actually seem to be that hardware specific; could your
changes actually be moved to generic PCI level so they could be made
to work for lots of drivers?

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  parent reply	other threads:[~2015-12-10 10:18 UTC|newest]

Thread overview: 142+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-24 13:35 [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Lan Tianyu
2015-11-24 13:35 ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 01/10] Qemu/VFIO: Create head file pci.h to share data struct Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-12-02 22:25   ` Alex Williamson
2015-12-02 22:25     ` [Qemu-devel] " Alex Williamson
2015-12-03  8:40     ` Lan, Tianyu
2015-12-03  8:40       ` [Qemu-devel] " Lan, Tianyu
2015-12-03 15:26       ` Alex Williamson
2015-12-03 15:26         ` [Qemu-devel] " Alex Williamson
2015-11-24 13:35 ` [RFC PATCH V2 03/10] Qemu/VFIO: Rework vfio_std_cap_max_size() function Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 04/10] Qemu/VFIO: Add vfio_find_free_cfg_reg() to find free PCI config space regs Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 05/10] Qemu/VFIO: Expose PCI config space read/write and msix functions Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-12-02 22:25   ` Alex Williamson
2015-12-02 22:25     ` [Qemu-devel] " Alex Williamson
2015-12-03  8:57     ` Lan, Tianyu
2015-12-03  8:57       ` [Qemu-devel] " Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 07/10] Qemu: Add post_load_state() to run after restoring CPU state Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 08/10] Qemu: Add save_before_stop callback to run just before stopping VCPU during migration Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-24 21:03   ` Michael S. Tsirkin
2015-11-24 21:03     ` [Qemu-devel] " Michael S. Tsirkin
2015-11-25 15:32     ` Lan, Tianyu
2015-11-25 15:32       ` [Qemu-devel] " Lan, Tianyu
2015-11-25 15:44       ` Michael S. Tsirkin
2015-11-25 15:44         ` [Qemu-devel] " Michael S. Tsirkin
2015-12-02 22:25   ` Alex Williamson
2015-12-02 22:25     ` [Qemu-devel] " Alex Williamson
2015-12-03  8:56     ` Lan, Tianyu
2015-12-03  8:56       ` [Qemu-devel] " Lan, Tianyu
2015-11-24 13:35 ` [RFC PATCH V2 10/10] Qemu/VFIO: Misc change for enable migration with VFIO Lan Tianyu
2015-11-24 13:35   ` [Qemu-devel] " Lan Tianyu
2015-11-30  8:01 ` [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Michael S. Tsirkin
2015-11-30  8:01   ` [Qemu-devel] " Michael S. Tsirkin
2015-12-01  6:26   ` Lan, Tianyu
2015-12-01  6:26     ` [Qemu-devel] " Lan, Tianyu
2015-12-01 15:02     ` Michael S. Tsirkin
2015-12-01 15:02       ` [Qemu-devel] " Michael S. Tsirkin
2015-12-02 14:08       ` Lan, Tianyu
2015-12-02 14:08         ` [Qemu-devel] " Lan, Tianyu
2015-12-02 14:31         ` Michael S. Tsirkin
2015-12-02 14:31           ` [Qemu-devel] " Michael S. Tsirkin
2015-12-03 14:53           ` Lan, Tianyu
2015-12-03 14:53             ` [Qemu-devel] " Lan, Tianyu
2015-12-04  6:42           ` Lan, Tianyu
2015-12-04  6:42             ` [Qemu-devel] " Lan, Tianyu
2015-12-04  8:05             ` Michael S. Tsirkin
2015-12-04  8:05               ` [Qemu-devel] " Michael S. Tsirkin
2015-12-04 12:11               ` Lan, Tianyu
2015-12-04 12:11                 ` [Qemu-devel] " Lan, Tianyu
2015-12-03 18:32         ` Alexander Duyck
2015-12-03 18:32           ` [Qemu-devel] " Alexander Duyck
2015-12-07 16:50 ` live migration vs device assignment (was Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC) Michael S. Tsirkin
2015-12-07 16:50   ` [Qemu-devel] " Michael S. Tsirkin
2015-12-09 16:26   ` live migration vs device assignment (motivation) Lan, Tianyu
2015-12-09 16:26     ` [Qemu-devel] " Lan, Tianyu
2015-12-09 17:14     ` Alexander Duyck
2015-12-09 17:14       ` [Qemu-devel] " Alexander Duyck
2015-12-10  3:15       ` Lan, Tianyu
2015-12-10  3:15         ` [Qemu-devel] " Lan, Tianyu
2015-12-09 20:07     ` Michael S. Tsirkin
2015-12-09 20:07       ` [Qemu-devel] " Michael S. Tsirkin
2015-12-10  3:04       ` Lan, Tianyu
2015-12-10  3:04         ` [Qemu-devel] " Lan, Tianyu
2015-12-10  8:38         ` Michael S. Tsirkin
2015-12-10  8:38           ` [Qemu-devel] " Michael S. Tsirkin
2015-12-10 14:23           ` Lan, Tianyu
2015-12-10 14:23             ` [Qemu-devel] " Lan, Tianyu
2015-12-10 10:18     ` Dr. David Alan Gilbert [this message]
2015-12-10 10:18       ` Dr. David Alan Gilbert
2015-12-10 11:28       ` Yang Zhang
2015-12-10 11:28         ` Yang Zhang
2015-12-10 11:41         ` Dr. David Alan Gilbert
2015-12-10 11:41           ` Dr. David Alan Gilbert
2015-12-10 13:07           ` Yang Zhang
2015-12-10 13:07             ` Yang Zhang
2015-12-10 14:38           ` Lan, Tianyu
2015-12-10 14:38             ` [Qemu-devel] " Lan, Tianyu
2015-12-10 16:11             ` Michael S. Tsirkin
2015-12-10 16:11               ` Michael S. Tsirkin
2015-12-10 19:17               ` Alexander Duyck
2015-12-10 19:17                 ` Alexander Duyck
2015-12-11  7:32               ` Lan, Tianyu
2015-12-11  7:32                 ` Lan, Tianyu
2015-12-14  9:12                 ` Michael S. Tsirkin
2015-12-14  9:12                   ` Michael S. Tsirkin
2015-12-10 16:23             ` Dr. David Alan Gilbert
2015-12-10 16:23               ` Dr. David Alan Gilbert
2015-12-10 17:16             ` Alexander Duyck
2015-12-10 17:16               ` Alexander Duyck
2015-12-13 15:47               ` Lan, Tianyu
2015-12-13 15:47                 ` Lan, Tianyu
2015-12-13 19:30                 ` Alexander Duyck
2015-12-13 19:30                   ` Alexander Duyck
2015-12-25  7:03                   ` Lan Tianyu
2015-12-25  7:03                     ` [Qemu-devel] " Lan Tianyu
2015-12-25 12:11                     ` Michael S. Tsirkin
2015-12-25 12:11                       ` Michael S. Tsirkin
2015-12-28 17:42                       ` Lan, Tianyu
2015-12-28 17:42                         ` Lan, Tianyu
2015-12-29 16:46                         ` Michael S. Tsirkin
2015-12-29 16:46                           ` Michael S. Tsirkin
2015-12-29 17:04                           ` Alexander Duyck
2015-12-29 17:04                             ` Alexander Duyck
2015-12-29 17:15                             ` Michael S. Tsirkin
2015-12-29 17:15                               ` [Qemu-devel] " Michael S. Tsirkin
2015-12-29 18:04                               ` Alexander Duyck
2015-12-29 18:04                                 ` Alexander Duyck
2016-01-04  2:15                           ` Lan Tianyu
2016-01-04  2:15                             ` Lan Tianyu
2015-12-25 22:31                     ` Alexander Duyck
2015-12-25 22:31                       ` Alexander Duyck
2015-12-27  9:21                       ` Michael S. Tsirkin
2015-12-27  9:21                         ` [Qemu-devel] " Michael S. Tsirkin
2015-12-27 21:45                         ` Alexander Duyck
2015-12-27 21:45                           ` Alexander Duyck
2015-12-28  8:51                           ` Michael S. Tsirkin
2015-12-28  8:51                             ` Michael S. Tsirkin
2015-12-28  3:20                       ` Dong, Eddie
2015-12-28  3:20                         ` Dong, Eddie
2015-12-28  4:26                         ` Alexander Duyck
2015-12-28  4:26                           ` [Qemu-devel] " Alexander Duyck
2015-12-28 11:50                         ` Michael S. Tsirkin
2015-12-28 11:50                           ` Michael S. Tsirkin
2015-12-14  9:26                 ` Michael S. Tsirkin
2015-12-14  9:26                   ` Michael S. Tsirkin
2015-12-28  8:52                   ` Pavel Fedin
2015-12-28  8:52                     ` Pavel Fedin
2015-12-28 11:51                     ` Michael S. Tsirkin
2015-12-28 11:51                       ` Michael S. Tsirkin
2016-03-17  9:15 ` [Qemu-devel] [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC Wei Yang
2016-03-17  9:15   ` Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151210101840.GA2570@work-vm \
    --to=dgilbert@redhat.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=ard.biesheuvel@linaro.org \
    --cc=blauwirbel@gmail.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=donald.c.skidmore@intel.com \
    --cc=eddie.dong@intel.com \
    --cc=emil.s.tantilov@intel.com \
    --cc=gerlitz.or@gmail.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=mark.d.rustad@intel.com \
    --cc=mst@redhat.com \
    --cc=nrupal.jani@intel.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.