From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [Qemu-devel] live migration vs device assignment (motivation) Date: Tue, 29 Dec 2015 09:04:51 -0800 Message-ID: References: <20151210101840.GA2570@work-vm> <566961C1.6030000@gmail.com> <20151210114114.GE2570@work-vm> <56698E68.5040207@intel.com> <566D9320.8000209@intel.com> <567CEA53.5030601@intel.com> <20151225140336-mutt-send-email-mst@redhat.com> <56817476.8080607@intel.com> <20151229184426-mutt-send-email-mst@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: "Lan, Tianyu" , "Dr. David Alan Gilbert" , Yang Zhang , qemu-devel@nongnu.org, "Tantilov, Emil S" , kvm@vger.kernel.org, Ard Biesheuvel , aik@ozlabs.ru, "Skidmore, Donald C" , quintela@redhat.com, "Dong, Eddie" , "Jani, Nrupal" , Alexander Graf , Blue Swirl , cornelia.huck@de.ibm.com, Alex Williamson , kraxel@redhat.com, Anthony Liguori , amit.shah@redhat.com, Paolo Bonzini , "Rustad, Mark D" , lcapitulino@redhat.com, Or Gerlitz To: "Michael S. Tsirkin" Return-path: Received: from mail-ig0-f194.google.com ([209.85.213.194]:36600 "EHLO mail-ig0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752628AbbL2REw (ORCPT ); Tue, 29 Dec 2015 12:04:52 -0500 Received: by mail-ig0-f194.google.com with SMTP id o2so2775310iga.3 for ; Tue, 29 Dec 2015 09:04:52 -0800 (PST) In-Reply-To: <20151229184426-mutt-send-email-mst@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Dec 29, 2015 at 8:46 AM, Michael S. Tsirkin wrote: > On Tue, Dec 29, 2015 at 01:42:14AM +0800, Lan, Tianyu wrote: >> >> >> On 12/25/2015 8:11 PM, Michael S. Tsirkin wrote: >> >As long as you keep up this vague talk about performance during >> >migration, without even bothering with any measurements, this patchset >> >will keep going nowhere. >> > >> >> I measured network service downtime for "keep device alive"(RFC patch V1 >> presented) and "put down and up network interface"(RFC patch V2 presented) >> during migration with some optimizations. >> >> The former is around 140ms and the later is around 240ms. >> >> My patchset relies on the maibox irq which doesn't work in the suspend state >> and so can't get downtime for suspend/resume cases. Will try to get the >> result later. > > > Interesting. So you sare saying merely ifdown/ifup is 100ms? > This does not sound reasonable. > Is there a chance you are e.g. getting IP from dhcp? Actually it wouldn't surprise me if that is due to a reset logic in the driver. For starters there is a 10 msec delay in the call ixgbevf_reset_hw_vf which I believe is present to allow the PF time to clear registers after the VF has requested a reset. There is also a 10 to 20 msec sleep in ixgbevf_down which occurs after the Rx queues were disabled. That is in addition to the fact that the function that disables the queues does so serially and polls each queue until the hardware acknowledges that the queues are actually disabled. The driver also does the serial enable with poll logic on re-enabling the queues which likely doesn't help things. Really this driver is probably in need of a refactor to clean the cruft out of the reset and initialization logic. I suspect we have far more delays than we really need and that is the source of much of the slow down. - Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42447) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aDxhd-0001OK-Mp for qemu-devel@nongnu.org; Tue, 29 Dec 2015 12:04:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aDxhc-0003Ri-Ne for qemu-devel@nongnu.org; Tue, 29 Dec 2015 12:04:53 -0500 Received: from mail-ig0-x244.google.com ([2607:f8b0:4001:c05::244]:35716) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aDxhc-0003Rd-In for qemu-devel@nongnu.org; Tue, 29 Dec 2015 12:04:52 -0500 Received: by mail-ig0-x244.google.com with SMTP id mv3so21298923igc.2 for ; Tue, 29 Dec 2015 09:04:52 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20151229184426-mutt-send-email-mst@redhat.com> References: <20151210101840.GA2570@work-vm> <566961C1.6030000@gmail.com> <20151210114114.GE2570@work-vm> <56698E68.5040207@intel.com> <566D9320.8000209@intel.com> <567CEA53.5030601@intel.com> <20151225140336-mutt-send-email-mst@redhat.com> <56817476.8080607@intel.com> <20151229184426-mutt-send-email-mst@redhat.com> Date: Tue, 29 Dec 2015 09:04:51 -0800 Message-ID: From: Alexander Duyck Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] live migration vs device assignment (motivation) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Yang Zhang , "Tantilov, Emil S" , kvm@vger.kernel.org, aik@ozlabs.ru, qemu-devel@nongnu.org, lcapitulino@redhat.com, Blue Swirl , kraxel@redhat.com, "Rustad, Mark D" , quintela@redhat.com, "Skidmore, Donald C" , Alexander Graf , Or Gerlitz , "Dr. David Alan Gilbert" , Alex Williamson , Anthony Liguori , cornelia.huck@de.ibm.com, "Lan, Tianyu" , Ard Biesheuvel , "Dong, Eddie" , "Jani, Nrupal" , amit.shah@redhat.com, Paolo Bonzini On Tue, Dec 29, 2015 at 8:46 AM, Michael S. Tsirkin wrote: > On Tue, Dec 29, 2015 at 01:42:14AM +0800, Lan, Tianyu wrote: >> >> >> On 12/25/2015 8:11 PM, Michael S. Tsirkin wrote: >> >As long as you keep up this vague talk about performance during >> >migration, without even bothering with any measurements, this patchset >> >will keep going nowhere. >> > >> >> I measured network service downtime for "keep device alive"(RFC patch V1 >> presented) and "put down and up network interface"(RFC patch V2 presented) >> during migration with some optimizations. >> >> The former is around 140ms and the later is around 240ms. >> >> My patchset relies on the maibox irq which doesn't work in the suspend state >> and so can't get downtime for suspend/resume cases. Will try to get the >> result later. > > > Interesting. So you sare saying merely ifdown/ifup is 100ms? > This does not sound reasonable. > Is there a chance you are e.g. getting IP from dhcp? Actually it wouldn't surprise me if that is due to a reset logic in the driver. For starters there is a 10 msec delay in the call ixgbevf_reset_hw_vf which I believe is present to allow the PF time to clear registers after the VF has requested a reset. There is also a 10 to 20 msec sleep in ixgbevf_down which occurs after the Rx queues were disabled. That is in addition to the fact that the function that disables the queues does so serially and polls each queue until the hardware acknowledges that the queues are actually disabled. The driver also does the serial enable with poll logic on re-enabling the queues which likely doesn't help things. Really this driver is probably in need of a refactor to clean the cruft out of the reset and initialization logic. I suspect we have far more delays than we really need and that is the source of much of the slow down. - Alex