From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Tue, 29 Dec 2015 09:04:51 -0800
Message-ID: <CAKgT0UfGtMjuGSxL7ZLJB+avErHX84qFnt7AR9ydzVt2vkSzhg@mail.gmail.com>
References: <20151210101840.GA2570@work-vm>
	<566961C1.6030000@gmail.com>
	<20151210114114.GE2570@work-vm>
	<56698E68.5040207@intel.com>
	<CAKgT0UduOMvnVAUvRgnXkMPDwvOBh_5RimCgnb0zRr7aOyza4A@mail.gmail.com>
	<566D9320.8000209@intel.com>
	<CAKgT0Uc9g5aqKUKudD4Rj+1KfbGZn6VLzZxGv7UrRK+dy3wEVA@mail.gmail.com>
	<567CEA53.5030601@intel.com>
	<20151225140336-mutt-send-email-mst@redhat.com>
	<56817476.8080607@intel.com>
	<20151229184426-mutt-send-email-mst@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Cc: "Lan, Tianyu" <tianyu.lan@intel.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Yang Zhang <yang.zhang.wz@gmail.com>, qemu-devel@nongnu.org,
	"Tantilov, Emil S" <emil.s.tantilov@intel.com>,
	kvm@vger.kernel.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	aik@ozlabs.ru, "Skidmore, Donald C" <donald.c.skidmore@intel.com>,
	quintela@redhat.com, "Dong, Eddie" <eddie.dong@intel.com>,
	"Jani, Nrupal" <nrupal.jani@intel.com>,
	Alexander Graf <agraf@suse.de>,
	Blue Swirl <blauwirbel@gmail.com>, cornelia.huck@de.ibm.com,
	Alex Williamson <alex.williamson@redhat.com>,
	kraxel@redhat.com, Anthony Liguori <anthony@codemonkey.ws>,
	amit.shah@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
	"Rustad, Mark D" <mark.d.rustad@intel.com>, lcapitulino@redhat.com,
	Or Gerlitz <gerlitz.or@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-ig0-f194.google.com ([209.85.213.194]:36600 "EHLO
	mail-ig0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752628AbbL2REw (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 29 Dec 2015 12:04:52 -0500
Received: by mail-ig0-f194.google.com with SMTP id o2so2775310iga.3
        for <kvm@vger.kernel.org>; Tue, 29 Dec 2015 09:04:52 -0800 (PST)
In-Reply-To: <20151229184426-mutt-send-email-mst@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Dec 29, 2015 at 8:46 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Dec 29, 2015 at 01:42:14AM +0800, Lan, Tianyu wrote:
>>
>>
>> On 12/25/2015 8:11 PM, Michael S. Tsirkin wrote:
>> >As long as you keep up this vague talk about performance during
>> >migration, without even bothering with any measurements, this patchset
>> >will keep going nowhere.
>> >
>>
>> I measured network service downtime for "keep device alive"(RFC patch V1
>> presented) and "put down and up network interface"(RFC patch V2 presented)
>> during migration with some optimizations.
>>
>> The former is around 140ms and the later is around 240ms.
>>
>> My patchset relies on the maibox irq which doesn't work in the suspend state
>> and so can't get downtime for suspend/resume cases. Will try to get the
>> result later.
>
>
> Interesting. So you sare saying merely ifdown/ifup is 100ms?
> This does not sound reasonable.
> Is there a chance you are e.g. getting IP from dhcp?


Actually it wouldn't surprise me if that is due to a reset logic in
the driver.  For starters there is a 10 msec delay in the call
ixgbevf_reset_hw_vf which I believe is present to allow the PF time to
clear registers after the VF has requested a reset.  There is also a
10 to 20 msec sleep in ixgbevf_down which occurs after the Rx queues
were disabled.  That is in addition to the fact that the function that
disables the queues does so serially and polls each queue until the
hardware acknowledges that the queues are actually disabled.  The
driver also does the serial enable with poll logic on re-enabling the
queues which likely doesn't help things.

Really this driver is probably in need of a refactor to clean the
cruft out of the reset and initialization logic.  I suspect we have
far more delays than we really need and that is the source of much of
the slow down.

- Alex

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42447)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alexander.duyck@gmail.com>) id 1aDxhd-0001OK-Mp
	for qemu-devel@nongnu.org; Tue, 29 Dec 2015 12:04:54 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alexander.duyck@gmail.com>) id 1aDxhc-0003Ri-Ne
	for qemu-devel@nongnu.org; Tue, 29 Dec 2015 12:04:53 -0500
Received: from mail-ig0-x244.google.com ([2607:f8b0:4001:c05::244]:35716)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alexander.duyck@gmail.com>) id 1aDxhc-0003Rd-In
	for qemu-devel@nongnu.org; Tue, 29 Dec 2015 12:04:52 -0500
Received: by mail-ig0-x244.google.com with SMTP id mv3so21298923igc.2
	for <qemu-devel@nongnu.org>; Tue, 29 Dec 2015 09:04:52 -0800 (PST)
MIME-Version: 1.0
In-Reply-To: <20151229184426-mutt-send-email-mst@redhat.com>
References: <20151210101840.GA2570@work-vm> <566961C1.6030000@gmail.com>
	<20151210114114.GE2570@work-vm> <56698E68.5040207@intel.com>
	<CAKgT0UduOMvnVAUvRgnXkMPDwvOBh_5RimCgnb0zRr7aOyza4A@mail.gmail.com>
	<566D9320.8000209@intel.com>
	<CAKgT0Uc9g5aqKUKudD4Rj+1KfbGZn6VLzZxGv7UrRK+dy3wEVA@mail.gmail.com>
	<567CEA53.5030601@intel.com>
	<20151225140336-mutt-send-email-mst@redhat.com>
	<56817476.8080607@intel.com>
	<20151229184426-mutt-send-email-mst@redhat.com>
Date: Tue, 29 Dec 2015 09:04:51 -0800
Message-ID: <CAKgT0UfGtMjuGSxL7ZLJB+avErHX84qFnt7AR9ydzVt2vkSzhg@mail.gmail.com>
From: Alexander Duyck <alexander.duyck@gmail.com>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>, "Tantilov,
	Emil S" <emil.s.tantilov@intel.com>, kvm@vger.kernel.org, aik@ozlabs.ru, qemu-devel@nongnu.org, lcapitulino@redhat.com, Blue Swirl <blauwirbel@gmail.com>, kraxel@redhat.com, "Rustad,
	Mark D" <mark.d.rustad@intel.com>, quintela@redhat.com, "Skidmore, Donald C" <donald.c.skidmore@intel.com>, Alexander Graf <agraf@suse.de>, Or Gerlitz <gerlitz.or@gmail.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Alex Williamson <alex.williamson@redhat.com>, Anthony Liguori <anthony@codemonkey.ws>, cornelia.huck@de.ibm.com, "Lan, Tianyu" <tianyu.lan@intel.com>, Ard Biesheuvel <ard.biesheuvel@linaro.org>, "Dong,
	Eddie" <eddie.dong@intel.com>, "Jani,
	Nrupal" <nrupal.jani@intel.com>, amit.shah@redhat.com, Paolo Bonzini <pbonzini@redhat.com>

On Tue, Dec 29, 2015 at 8:46 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Dec 29, 2015 at 01:42:14AM +0800, Lan, Tianyu wrote:
>>
>>
>> On 12/25/2015 8:11 PM, Michael S. Tsirkin wrote:
>> >As long as you keep up this vague talk about performance during
>> >migration, without even bothering with any measurements, this patchset
>> >will keep going nowhere.
>> >
>>
>> I measured network service downtime for "keep device alive"(RFC patch V1
>> presented) and "put down and up network interface"(RFC patch V2 presented)
>> during migration with some optimizations.
>>
>> The former is around 140ms and the later is around 240ms.
>>
>> My patchset relies on the maibox irq which doesn't work in the suspend state
>> and so can't get downtime for suspend/resume cases. Will try to get the
>> result later.
>
>
> Interesting. So you sare saying merely ifdown/ifup is 100ms?
> This does not sound reasonable.
> Is there a chance you are e.g. getting IP from dhcp?


Actually it wouldn't surprise me if that is due to a reset logic in
the driver.  For starters there is a 10 msec delay in the call
ixgbevf_reset_hw_vf which I believe is present to allow the PF time to
clear registers after the VF has requested a reset.  There is also a
10 to 20 msec sleep in ixgbevf_down which occurs after the Rx queues
were disabled.  That is in addition to the fact that the function that
disables the queues does so serially and polls each queue until the
hardware acknowledges that the queues are actually disabled.  The
driver also does the serial enable with poll logic on re-enabling the
queues which likely doesn't help things.

Really this driver is probably in need of a refactor to clean the
cruft out of the reset and initialization logic.  I suspect we have
far more delays than we really need and that is the source of much of
the slow down.

- Alex