From mboxrd@z Thu Jan 1 00:00:00 1970 From: Igor Mammedov Subject: Re: [Qemu-devel] [PATCH v2 0/8] nvdimm: hotplug support Date: Tue, 11 Oct 2016 14:32:23 +0200 Message-ID: <20161011143223.666aba4a@nial.brq.redhat.com> References: <1470984850-66891-1-git-send-email-guangrong.xiao@linux.intel.com> <20161003154800.17e3ba30@nial.brq.redhat.com> <20161010145931.59c0c5bb@nial.brq.redhat.com> <560c2e37-ef5d-c0b0-0059-29d07cecd443@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com, gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, dan.j.williams@intel.com, rth@twiddle.net To: Xiao Guangrong Return-path: Received: from mx1.redhat.com ([209.132.183.28]:41098 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752735AbcJKMcy (ORCPT ); Tue, 11 Oct 2016 08:32:54 -0400 In-Reply-To: <560c2e37-ef5d-c0b0-0059-29d07cecd443@linux.intel.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, 10 Oct 2016 21:57:55 +0800 Xiao Guangrong wrote: > On 10/10/2016 08:59 PM, Igor Mammedov wrote: > > On Sat, 8 Oct 2016 16:34:06 +0800 > > Xiao Guangrong wrote: > > > >> On 10/03/2016 09:48 PM, Igor Mammedov wrote: > >>> On Fri, 12 Aug 2016 14:54:02 +0800 > >>> Xiao Guangrong wrote: > >>> > >>> General design issue in this series is regenerating > >>> _FIT data every time inside of _FIT read loop. > >>> > >>> The issue here is that if FIT data doesn't fit in one page > >>> RFIT would be called several times resulting in calling > >>> nvdimm_dsm_func_read_fit() N-times, and in between > >>> these N exits generated FIT data could change > >>> (for example if another NVDIMM has been hotpluged in between) > >>> resulting in corrupted concatenated FIT buffer returned by > >>> _FIT method. > >>> So one need either to block follow up hotplug or make sure > >>> that FIT data regenerated/updated only once per _FIT > >>> invocation and stay immutable until _FIT is completed > >>> (maybe RCU protected buffer). > >>> > > > > > >>> Regenerating/updating inside qemu could also be shared > >>> with NFIT table creation path so that both cold/hotplug > >>> would reuse the same data which are updated only when > >>> a NVDIMM is added/removed. > > try to take into account this point so that FIT data > > aren't regenerated on every RFIT invocation. > > Okay. > > The reason that the FIT was always regenerated is avoiding saving > FIT during live-migration. Well, it seems that we are hard to > avoid it now. :) > > > > > > >> Explicitly sync up QEMU and OSPM is hard as OSPM is running > >> in a VM which is vulnerable and can not be expected always > >> triggering correct sequence. > >> > >> So how about introducing generation-number which is updated > >> for each hotplug event and a new DSM function reserved to > >> get this number by OSPM. Then the ACPI code is like this: > >> > >> restart: > >> old-number = DSM(Read_NUM); > >> FIT = DSM(FIT) > >> new-number = DSM(Read_NUM); > >> > >> if (old-number != new-number) > >> goto restart; > >> > >> return FIT > > It's a bit complicated. > > > > Hmm, how about this algorithm: > > struct fit_data { > u64 generation_number; > GArray *fit; > }; > > struct fit_data *fit; > u64 current_number; > > The update-side (hotplug happens): > new_fit = malloc(); > > old_fit = rcu_dereference(fit); > > new_fit->generation_number = old_fit->generation_number + 1; > new_fit->fit = ...generate-fit...; > > rcu_assign(fit, new_fit); > sync_rcu() > free(old_fit); > > > On the read-side (DSM issued by OSPM): > { > rcu_read_lock() > > fit = rcu_dereference(fit); > > /* if it is the first time issuing RFIT. */ > if (fit-read-position == 0) > current_number = fit->generation_number; > else if (current_number != fit->generation_number) > return FAIL_FIT_CHANGED; > > ... fill returned info with fit->fit...; > > rcu_read_unlock() > } > > Of course, @fit and @current_number should be persistent during live migration. you can drop RCU and @current_number, and @fit could be exactly recreated on target side from -device nvdim ... set of options, which must include all currently present (including hotplugged) devices from source side. It's sufficient to invalidate and restart DMA transfer in flight. i.e. pc_dimm_memory_plug () -> regenerate_fit() -> set local flag @fit-dirty -> start QEMU.fit_read() -> if offset == 0 ? clear @fit-dirty make sure fit_read() won't conflict with regenerate_fit() either plain mutex or RCU would do the job ... OSPM.rfit() ... if (@fit-dirty) abort/restart DMA from start In migration case, the target would have @fit-dirty set thanks to pc_dimm_memory_plug () -> regenerate_fit() and any DMA in flight would be restarted. That's a little bit inefficient as source_fit exactly matches target_fit and DMA could continue, but it would save us from having fit specific migration state to transfer and maintain.