From: "Michael S. Tsirkin" <mst@redhat.com>
To: Lan Tianyu <tianyu.lan@intel.com>
Cc: bhelgaas@google.com, carolyn.wyborny@intel.com,
donald.c.skidmore@intel.com, eddie.dong@intel.com,
nrupal.jani@intel.com, yang.z.zhang@intel.com, agraf@suse.de,
kvm@vger.kernel.org, pbonzini@redhat.com, qemu-devel@nongnu.org,
emil.s.tantilov@intel.com, intel-wired-lan@lists.osuosl.org,
jeffrey.t.kirsher@intel.com, jesse.brandeburg@intel.com,
john.ronciak@intel.com, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, matthew.vick@intel.com,
mitch.a.williams@intel.com, netdev@vger.kernel.org,
shannon.nelson@intel.com
Subject: Re: [RFC Patch 12/12] IXGBEVF: Track dma dirty pages
Date: Thu, 22 Oct 2015 15:30:46 +0300 [thread overview]
Message-ID: <20151022150137-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <1445445464-5056-13-git-send-email-tianyu.lan@intel.com>
On Thu, Oct 22, 2015 at 12:37:44AM +0800, Lan Tianyu wrote:
> Migration relies on tracking dirty page to migrate memory.
> Hardware can't automatically mark a page as dirty after DMA
> memory access. VF descriptor rings and data buffers are modified
> by hardware when receive and transmit data. To track such dirty memory
> manually, do dummy writes(read a byte and write it back) during receive
> and transmit data.
>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index d22160f..ce7bd7a 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -414,6 +414,9 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
> if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
> break;
>
> + /* write back status to mark page dirty */
Which page? the descriptor ring? What does marking it dirty accomplish
though, given that we might migrate right before this happens?
It might be a good idea to just specify addresses of rings
to hypervisor, and have it send the ring pages after VM
and the VF are stopped.
> + eop_desc->wb.status = eop_desc->wb.status;
> +
Compiler is likely to optimize this out.
You also probably need a wmb here ...
> /* clear next_to_watch to prevent false hangs */
> tx_buffer->next_to_watch = NULL;
> tx_buffer->desc_num = 0;
> @@ -946,15 +949,17 @@ static struct sk_buff *ixgbevf_fetch_rx_buffer(struct ixgbevf_ring *rx_ring,
> {
> struct ixgbevf_rx_buffer *rx_buffer;
> struct page *page;
> + u8 *page_addr;
>
> rx_buffer = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
> page = rx_buffer->page;
> prefetchw(page);
>
> - if (likely(!skb)) {
> - void *page_addr = page_address(page) +
> - rx_buffer->page_offset;
> + /* Mark page dirty */
Looks like there's a race condition here: VM could
migrate at this point. RX ring will indicate
packet has been received, but page data would be stale.
One solution I see is explicitly testing for this
condition and discarding the packet.
For example, hypervisor could increment some counter
in RAM during migration.
Then:
x = read counter
get packet from rx ring
mark page dirty
y = read counter
if (x != y)
discard packet
> + page_addr = page_address(page) + rx_buffer->page_offset;
> + *page_addr = *page_addr;
Compiler is likely to optimize this out.
You also probably need a wmb here ...
>
> + if (likely(!skb)) {
> /* prefetch first cache line of first page */
> prefetch(page_addr);
prefetch makes no sense if you read it right here.
> #if L1_CACHE_BYTES < 128
> @@ -1032,6 +1037,9 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
> if (!ixgbevf_test_staterr(rx_desc, IXGBE_RXD_STAT_DD))
> break;
>
> + /* Write back status to mark page dirty */
> + rx_desc->wb.upper.status_error = rx_desc->wb.upper.status_error;
> +
same question as for tx.
> /* This memory barrier is needed to keep us from reading
> * any other fields out of the rx_desc until we know the
> * RXD_STAT_DD bit is set
> --
> 1.8.4.rc0.1.g8f6a3e5.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-10-22 12:30 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-21 16:37 [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Lan Tianyu
2015-10-21 16:37 ` [RFC Patch 01/12] PCI: Add virtfn_index for struct pci_device Lan Tianyu
2015-10-21 18:07 ` Alexander Duyck
2015-10-24 14:46 ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 02/12] IXGBE: Add new mail box event to restore VF status in the PF driver Lan Tianyu
2015-10-21 20:34 ` Alexander Duyck
2015-10-21 16:37 ` [RFC Patch 03/12] IXGBE: Add sysfs interface for Qemu to migrate " Lan Tianyu
2015-10-21 20:45 ` Alexander Duyck
2015-10-25 7:21 ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 04/12] IXGBE: Add ixgbe_ping_vf() to notify a specified VF via mailbox msg Lan Tianyu
2015-10-21 16:37 ` [RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf" Lan Tianyu
2015-10-21 20:52 ` Alexander Duyck
2015-10-22 12:51 ` Michael S. Tsirkin
2015-10-24 15:43 ` Lan, Tianyu
2015-10-25 6:03 ` Alexander Duyck
2015-10-25 6:45 ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 06/12] IXGBEVF: Add self emulation layer Lan Tianyu
2015-10-21 20:58 ` Alexander Duyck
2015-10-22 12:50 ` [Qemu-devel] " Michael S. Tsirkin
2015-10-22 15:50 ` Alexander Duyck
2015-10-21 16:37 ` [RFC Patch 07/12] IXGBEVF: Add new mail box event for migration Lan Tianyu
2015-10-21 16:37 ` [RFC Patch 08/12] IXGBEVF: Rework code of finding the end transmit desc of package Lan Tianyu
2015-10-21 21:14 ` Alexander Duyck
2015-10-24 16:12 ` Lan, Tianyu
2015-10-22 12:58 ` Michael S. Tsirkin
2015-10-24 16:08 ` Lan, Tianyu
2015-10-21 16:37 ` [RFC Patch 09/12] IXGBEVF: Add live migration support for VF driver Lan Tianyu
2015-10-21 21:48 ` Alexander Duyck
2015-10-22 12:46 ` Michael S. Tsirkin
2015-10-21 16:37 ` [RFC Patch 10/12] IXGBEVF: Add lock to protect tx/rx ring operation Lan Tianyu
2015-10-21 21:55 ` Alexander Duyck
2015-10-22 12:40 ` Michael S. Tsirkin
2015-10-21 16:37 ` [RFC Patch 11/12] IXGBEVF: Migrate VF statistic data Lan Tianyu
2015-10-22 12:36 ` Michael S. Tsirkin
2015-10-21 16:37 ` [RFC Patch 12/12] IXGBEVF: Track dma dirty pages Lan Tianyu
2015-10-22 12:30 ` Michael S. Tsirkin [this message]
2015-10-21 18:45 ` [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC Or Gerlitz
2015-10-21 19:20 ` Alex Williamson
2015-10-21 23:26 ` Alexander Duyck
2015-10-22 12:32 ` [Qemu-devel] " Michael S. Tsirkin
2015-10-22 13:01 ` Alex Williamson
2015-10-22 13:06 ` Michael S. Tsirkin
2015-10-22 15:58 ` Or Gerlitz
2015-10-22 16:17 ` Alex Williamson
2015-10-22 12:55 ` [Qemu-devel] " Michael S. Tsirkin
2015-10-23 18:36 ` Alexander Duyck
2015-10-23 19:05 ` Alex Williamson
2015-10-23 20:01 ` Alexander Duyck
2015-10-26 5:36 ` Lan Tianyu
2015-10-26 15:03 ` Alexander Duyck
2015-10-29 6:12 ` Lan Tianyu
2015-10-29 6:58 ` Alexander Duyck
2015-10-29 8:33 ` Lan Tianyu
2015-10-29 16:17 ` Alexander Duyck
2015-10-30 2:41 ` Lan Tianyu
2015-10-30 18:04 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151022150137-mutt-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=agraf@suse.de \
--cc=bhelgaas@google.com \
--cc=carolyn.wyborny@intel.com \
--cc=donald.c.skidmore@intel.com \
--cc=eddie.dong@intel.com \
--cc=emil.s.tantilov@intel.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=john.ronciak@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=matthew.vick@intel.com \
--cc=mitch.a.williams@intel.com \
--cc=netdev@vger.kernel.org \
--cc=nrupal.jani@intel.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shannon.nelson@intel.com \
--cc=tianyu.lan@intel.com \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).