From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zhao Yan Subject: Re: [PATCH 0/5] QEMU VFIO live migration Date: Wed, 13 Mar 2019 21:12:22 -0400 Message-ID: <20190314011222.GA17006@joy-OptiPlex-7040> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> <20190221134051.2c28893e@w520.home> <20190225022255.GP16456@joy-OptiPlex-7040> <20190307104421.534ea56f@w520.home> <20190312025747.GI21353@joy-OptiPlex-7040> <20190313011301.GA16072@joy-OptiPlex-7040> <20190313131454.09f886c1@w520.home> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "cjia@nvidia.com" , "kvm@vger.kernel.org" , "aik@ozlabs.ru" , "Zhengxiao.zx@Alibaba-inc.com" , "shuangtai.tst@alibaba-inc.com" , "qemu-devel@nongnu.org" , "kwankhede@nvidia.com" , "eauger@redhat.com" , "Liu, Yi L" , "eskultet@redhat.com" , "Yang, Ziye" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "arei.gonglei@huawei.com" , "felipe@nutanix.com" , "Ken.Xue@amd.com" , "Tian, Kevin" , "dgilbert@redhat.com" , "intel-gvt-dev@lists.freedesktop.org" , "L To: Alex Williamson Return-path: Content-Disposition: inline In-Reply-To: <20190313131454.09f886c1@w520.home> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel2=m.gmane.org@nongnu.org Sender: "Qemu-devel" List-Id: kvm.vger.kernel.org On Thu, Mar 14, 2019 at 03:14:54AM +0800, Alex Williamson wrote: > On Tue, 12 Mar 2019 21:13:01 -0400 > Zhao Yan wrote: > > > hi Alex > > Any comments to the sequence below? > > > > Actaully we have some concerns and suggestions to userspace-opaque migration > > data. > > > > 1. if data is opaque to userspace, kernel interface must be tightly bound to > > migration. > > e.g. vendor driver has to know state (running + not logging) should not > > return any data, and state (running + logging) should return whole > > snapshot first and dirty later. it also has to know qemu migration will > > not call GET_BUFFER in state (running + not logging), otherwise, it has > > to adjust its behavior. > > This all just sounds like defining the protocol we expect with the > interface. For instance if we define a session as beginning when > logging is enabled and ending when the device is stopped and the > interface reports no more data is available, then we can state that any > partial accumulation of data is incomplete relative to migration. If > userspace wants to initiate a new migration stream, they can simply > toggle logging. How the vendor driver provides the data during the > session is not defined, but beginning the session with a snapshot > followed by repeated iterations of dirtied data is certainly a valid > approach. > > > 2. vendor driver cannot ensure userspace get all the data it intends to > > save in pre-copy phase. > > e.g. in stop-and-copy phase, vendor driver has to first check and send > > data in previous phase. > > First, I don't think the device has control of when QEMU switches from > pre-copy to stop-and-copy, the protocol needs to support that > transition at any point. However, it seems a simply data available > counter provides an indication of when it might be optimal to make such > a transition. If a vendor driver follows a scheme as above, the > available data counter would indicate a large value, the entire initial > snapshot of the device. As the migration continues and pages are > dirtied, the device would reach a steady state amount of data > available, depending on the guest activity. This could indicate to the > user to stop the device. The migration stream would not be considered > completed until the available data counter reaches zero while the > device is in the stopped|logging state. > > > 3. if all the sequence is tightly bound to live migration, can we remove the > > logging state? what about adding two states migrate-in and migrate-out? > > so there are four states: running, stopped, migrate-in, migrate-out. > > migrate-out is for source side when migration starts. together with > > state running and stopped, it can substitute state logging. > > migrate-in is for target side. > > In fact, Kirti's implementation specifies a data direction, but I think > we still need logging to indicate sessions. I'd also assume that > logging implies some overhead for the vendor driver. > ok. If you prefer logging, I'm ok with it. just found migrate-in and migrate-out are more universal againt hardware requirement changes. > > On Tue, Mar 12, 2019 at 10:57:47AM +0800, Zhao Yan wrote: > > > hi Alex > > > thanks for your reply. > > > > > > So, if we choose migration data to be userspace opaque, do you think below > > > sequence is the right behavior for vendor driver to follow: > > > > > > 1. initially LOGGING state is not set. If userspace calls GET_BUFFER to > > > vendor driver, vendor driver should reject and return 0. > > What would this state mean otherwise? If we're not logging then it > should not be expected that we can construct dirtied data from a > previous read of the state before logging was enabled (it would be > outside of the "session"). So at best this is an incomplete segment of > the initial snapshot of the device, but that presumes how the vendor > driver constructs the data. I wouldn't necessarily mandate the vendor > driver reject it, but I think we should consider it undefined and > vendor specific relative to the migration interface. > > > > 2. then LOGGING state is set, if userspace calls GET_BUFFER to vendor > > > driver, > > > a. vendor driver shoud first query a whole snapshot of device memory > > > (let's use this term to represent device's standalone memory for now), > > > b. vendor driver returns a chunk of data just queried to userspace, > > > while recording current pos in data. > > > c. vendor driver finds all data just queried is finished transmitting to > > > userspace, and queries only dirty data in device memory now. > > > d. vendor driver returns a chunk of data just quered (this time is dirty > > > data )to userspace while recording current pos in data > > > e. if all data is transmited to usespace and still GET_BUFFERs come from > > > userspace, vendor driver starts another round of dirty data query. > > This is a valid vendor driver approach, but it's outside the scope of > the interface definition. A vendor driver could also decide to not > provide any data until both stopped and logging are set and then > provide a fixed, final snapshot. The interface supports either > approach by defining the protocol to interact with it. > > > > 3. if LOGGING state is unset then, and userpace calls GET_BUFFER to vendor > > > driver, > > > a. if vendor driver finds there's previously untransmitted data, returns > > > them until all transmitted. > > > b. vendor driver then queries dirty data again and transmits them. > > > c. at last, vendor driver queris device config data (which has to be > > > queried at last and sent once) and transmits them. > > This seems broken, the vendor driver is presuming the user intentions. > If logging is unset, we return to bullet 1, reading data is undefined > and vendor specific. It's outside of the session. > > > > for the 1 bullet, if LOGGING state is firstly set and migration aborts > > > then, vendor driver has to be able to detect that condition. so seemingly, > > > vendor driver has to know more qemu's migration state, like migration > > > called and failed. Do you think that's acceptable? > > If migration aborts, logging is cleared and the device continues > operation. If a new migration is started, the session is initiated by > enabling logging. Sound reasonable? Thanks, > For the flow, I still have a question. There are 2 approaches below, which one do you prefer? Approach A, in precopy stage, the sequence is (1) .save_live_pending --> return whole snapshot size .save_live_iterate --> save whole snapshot (2) .save_live_pending --> get dirty data, return dirty data size .save_live_iterate --> save all dirty data (3) .save_live_pending --> get dirty data again, return dirty data size .save_live_iterate --> save all dirty data Approach B, in precopy stage, the sequence is (1) .save_live_pending --> return whole snapshot size .save_live_iterate --> save part of snapshot (2) .save_live_pending --> return rest part of whole snapshot size + current dirty data size .save_live_iterate --> save part of snapshot (3) repeat (2) until whole snapshot saved. (4) .save_live_pending --> get diryt data and return current dirty data size .save_live_iterate --> save part of dirty data (5) .save_live_pending --> return reset part of dirty data size + delta size of dirty data .save_live_iterate --> save part of dirty data (6) repeat (5) until precopy stops > Alex > _______________________________________________ > intel-gvt-dev mailing list > intel-gvt-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev