netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Lan Tianyu <tianyu.lan@intel.com>,
	bhelgaas@google.com, carolyn.wyborny@intel.com,
	donald.c.skidmore@intel.com, eddie.dong@intel.com,
	nrupal.jani@intel.com, yang.z.zhang@intel.com, agraf@suse.de,
	kvm@vger.kernel.org, pbonzini@redhat.com, qemu-devel@nongnu.org,
	emil.s.tantilov@intel.com, intel-wired-lan@lists.osuosl.org,
	jeffrey.t.kirsher@intel.com, jesse.brandeburg@intel.com,
	john.ronciak@intel.com, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, matthew.vick@intel.com,
	mitch.a.williams@intel.com, netdev@vger.kernel.org,
	shannon.nelson@intel.com
Subject: Re: [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC
Date: Thu, 29 Oct 2015 09:17:06 -0700	[thread overview]
Message-ID: <56324682.5060507@gmail.com> (raw)
In-Reply-To: <5631D9C2.2040206@intel.com>

On 10/29/2015 01:33 AM, Lan Tianyu wrote:
> On 2015年10月29日 14:58, Alexander Duyck wrote:
>> Your code was having to do a bunch of shuffling in order to get things
>> set up so that you could bring the interface back up.  I would argue
>> that it may actually be faster at least on the bring-up to just drop the
>> old rings and start over since it greatly reduced the complexity and the
>> amount of device related data that has to be moved.
> If give up the old ring after migration and keep DMA running before
> stopping VCPU, it seems we don't need to track Tx/Rx descriptor ring and
> just make sure that all Rx buffers delivered to stack has been migrated.
>
> 1) Dummy write Rx buffer before checking Rx descriptor to ensure packet
> migrated first.

Don't dummy write the Rx descriptor.  You should only really need to 
dummy write the Rx buffer and you would do so after checking the 
descriptor, not before.  Otherwise you risk corrupting the Rx buffer 
because it is possible for you to read the Rx buffer, DMA occurs, and 
then you write back the Rx buffer and now you have corrupted the memory.

> 2) Make a copy of Rx descriptor and then use the copied data to check
> buffer status. Not use the original descriptor because it won't be
> migrated and migration may happen between two access of the Rx descriptor.

Do not just blindly copy the Rx descriptor ring.  That is a recipe for 
disaster.  The problem is DMA has to happen in a very specific order for 
things to function correctly.  The Rx buffer has to be written and then 
the Rx descriptor.  The problem is you will end up getting a read-ahead 
on the Rx descriptor ring regardless of which order you dirty things in.

The descriptor is only 16 bytes, you can fit 256 of them in a single 
page.  There is a good chance you probably wouldn't be able to migrate 
if you were under heavy network stress, however you could still have 
several buffers written in the time it takes for you to halt the VM and 
migrate the remaining pages.  Those buffers wouldn't be marked as dirty 
but odds are the page the descriptors are in would be.  As such you will 
end up with the descriptors but not the buffers.

The only way you could possibly migrate the descriptors rings cleanly 
would be to have enough knowledge about the layout of things to force 
the descriptor rings to be migrated first followed by all of the 
currently mapped Rx buffers.  In addition you would need to have some 
means of tracking all of the Rx buffers such as an emulated IOMMU as you 
would need to migrate all of them, not just part.  By doing it this way 
you would get the Rx descriptor rings in the earliest state possible and 
would be essentially emulating the Rx buffer writes occurring before the 
Rx descriptor writes.  You would likely have several Rx buffer writes 
that would be discarded in the process as there would be no descriptor 
for them but at least the state of the system would be consistent.

- Alex

  reply	other threads:[~2015-10-29 16:17 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-21 16:37 [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Lan Tianyu
2015-10-21 16:37 ` [RFC Patch 01/12] PCI: Add virtfn_index for struct pci_device Lan Tianyu
2015-10-21 18:07   ` Alexander Duyck
2015-10-24 14:46     ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 02/12] IXGBE: Add new mail box event to restore VF status in the PF driver Lan Tianyu
2015-10-21 20:34   ` Alexander Duyck
2015-10-21 16:37 ` [RFC Patch 03/12] IXGBE: Add sysfs interface for Qemu to migrate " Lan Tianyu
2015-10-21 20:45   ` Alexander Duyck
2015-10-25  7:21     ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 04/12] IXGBE: Add ixgbe_ping_vf() to notify a specified VF via mailbox msg Lan Tianyu
2015-10-21 16:37 ` [RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf" Lan Tianyu
2015-10-21 20:52   ` Alexander Duyck
2015-10-22 12:51     ` Michael S. Tsirkin
2015-10-24 15:43     ` Lan, Tianyu
2015-10-25  6:03       ` Alexander Duyck
2015-10-25  6:45         ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 06/12] IXGBEVF: Add self emulation layer Lan Tianyu
2015-10-21 20:58   ` Alexander Duyck
2015-10-22 12:50     ` [Qemu-devel] " Michael S. Tsirkin
2015-10-22 15:50       ` Alexander Duyck
2015-10-21 16:37 ` [RFC Patch 07/12] IXGBEVF: Add new mail box event for migration Lan Tianyu
2015-10-21 16:37 ` [RFC Patch 08/12] IXGBEVF: Rework code of finding the end transmit desc of package Lan Tianyu
2015-10-21 21:14   ` Alexander Duyck
2015-10-24 16:12     ` Lan, Tianyu
2015-10-22 12:58   ` Michael S. Tsirkin
2015-10-24 16:08     ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 09/12] IXGBEVF: Add live migration support for VF driver Lan Tianyu
2015-10-21 21:48   ` Alexander Duyck
2015-10-22 12:46   ` Michael S. Tsirkin
2015-10-21 16:37 ` [RFC Patch 10/12] IXGBEVF: Add lock to protect tx/rx ring operation Lan Tianyu
2015-10-21 21:55   ` Alexander Duyck
2015-10-22 12:40   ` Michael S. Tsirkin
2015-10-21 16:37 ` [RFC Patch 11/12] IXGBEVF: Migrate VF statistic data Lan Tianyu
2015-10-22 12:36   ` Michael S. Tsirkin
2015-10-21 16:37 ` [RFC Patch 12/12] IXGBEVF: Track dma dirty pages Lan Tianyu
2015-10-22 12:30   ` Michael S. Tsirkin
2015-10-21 18:45 ` [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Or Gerlitz
2015-10-21 19:20   ` Alex Williamson
2015-10-21 23:26     ` Alexander Duyck
2015-10-22 12:32     ` [Qemu-devel] " Michael S. Tsirkin
2015-10-22 13:01       ` Alex Williamson
2015-10-22 13:06         ` Michael S. Tsirkin
2015-10-22 15:58     ` Or Gerlitz
2015-10-22 16:17       ` Alex Williamson
2015-10-22 12:55 ` [Qemu-devel] " Michael S. Tsirkin
2015-10-23 18:36 ` Alexander Duyck
2015-10-23 19:05   ` Alex Williamson
2015-10-23 20:01     ` Alexander Duyck
2015-10-26  5:36   ` Lan Tianyu
2015-10-26 15:03     ` Alexander Duyck
2015-10-29  6:12       ` Lan Tianyu
2015-10-29  6:58         ` Alexander Duyck
2015-10-29  8:33           ` Lan Tianyu
2015-10-29 16:17             ` Alexander Duyck [this message]
2015-10-30  2:41               ` Lan Tianyu
2015-10-30 18:04                 ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56324682.5060507@gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=agraf@suse.de \
    --cc=bhelgaas@google.com \
    --cc=carolyn.wyborny@intel.com \
    --cc=donald.c.skidmore@intel.com \
    --cc=eddie.dong@intel.com \
    --cc=emil.s.tantilov@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.ronciak@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=matthew.vick@intel.com \
    --cc=mitch.a.williams@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=nrupal.jani@intel.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shannon.nelson@intel.com \
    --cc=tianyu.lan@intel.com \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).