From mboxrd@z Thu Jan  1 00:00:00 1970
From: Igor Mammedov <imammedo@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 0/8] nvdimm: hotplug support
Date: Tue, 11 Oct 2016 14:32:23 +0200
Message-ID: <20161011143223.666aba4a@nial.brq.redhat.com>
References: <1470984850-66891-1-git-send-email-guangrong.xiao@linux.intel.com>
        <20161003154800.17e3ba30@nial.brq.redhat.com>
        <e025c740-50ad-8010-3387-4c2c257187dd@linux.intel.com>
        <20161010145931.59c0c5bb@nial.brq.redhat.com>
        <560c2e37-ef5d-c0b0-0059-29d07cecd443@linux.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
        gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org,
        stefanha@redhat.com, pbonzini@redhat.com, dan.j.williams@intel.com,
        rth@twiddle.net
To: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:41098 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1752735AbcJKMcy (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 11 Oct 2016 08:32:54 -0400
In-Reply-To: <560c2e37-ef5d-c0b0-0059-29d07cecd443@linux.intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Mon, 10 Oct 2016 21:57:55 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> On 10/10/2016 08:59 PM, Igor Mammedov wrote:
> > On Sat, 8 Oct 2016 16:34:06 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >  
> >> On 10/03/2016 09:48 PM, Igor Mammedov wrote:  
> >>> On Fri, 12 Aug 2016 14:54:02 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>> General design issue in this series is regenerating
> >>> _FIT data every time inside of _FIT read loop.
> >>>
> >>> The issue here is that if FIT data doesn't fit in one page
> >>> RFIT would be called several times resulting in calling
> >>> nvdimm_dsm_func_read_fit() N-times, and in between
> >>> these N exits generated FIT data could change
> >>> (for example if another NVDIMM has been hotpluged in between)
> >>> resulting in corrupted concatenated FIT buffer returned by
> >>> _FIT method.
> >>> So one need either to block follow up hotplug or make sure
> >>> that FIT data regenerated/updated only once per _FIT
> >>> invocation and stay immutable until _FIT is completed
> >>> (maybe RCU protected buffer).
> >>>  
> >
> >  
> >>> Regenerating/updating inside qemu could also be shared
> >>> with NFIT table creation path so that both cold/hotplug
> >>> would reuse the same data which are updated only when
> >>> a NVDIMM is added/removed.  
> > try to take into account this point so that FIT data
> > aren't regenerated on every RFIT invocation.  
> 
> Okay.
> 
> The reason that the FIT was always regenerated is avoiding saving
> FIT during live-migration. Well, it seems that we are hard to
> avoid it now. :)
> 
> >
> >  
> >> Explicitly sync up QEMU and OSPM is hard as OSPM is running
> >> in a VM which is vulnerable and can not be expected always
> >> triggering correct sequence.
> >>
> >> So how about introducing generation-number which is updated
> >> for each hotplug event and a new DSM function reserved to
> >> get this number by OSPM. Then the ACPI code is like this:
> >>
> >> restart:
> >>     old-number = DSM(Read_NUM);
> >>     FIT = DSM(FIT)
> >>     new-number = DSM(Read_NUM);
> >>
> >>     if (old-number != new-number)
> >>        goto restart;
> >>
> >>     return FIT  
> > It's a bit complicated.
> >  
> 
> Hmm, how about this algorithm:
> 
> struct fit_data {
> 	u64 generation_number;
> 	GArray *fit;
> };
> 
> struct fit_data *fit;
> u64 current_number;
> 
> The update-side (hotplug happens):
>    new_fit = malloc();
> 
>    old_fit = rcu_dereference(fit);
> 
>    new_fit->generation_number = old_fit->generation_number + 1;
>    new_fit->fit = ...generate-fit...;
> 
>    rcu_assign(fit, new_fit);
>    sync_rcu()
>    free(old_fit);
> 
> 
> On the read-side (DSM issued by OSPM):
> {
>     rcu_read_lock()
> 
>     fit = rcu_dereference(fit);
> 
>     /* if it is the first time issuing RFIT. */
>     if (fit-read-position == 0)
>       current_number = fit->generation_number;
>     else if (current_number != fit->generation_number)
>       return FAIL_FIT_CHANGED;
> 
>     ... fill returned info with fit->fit...;
> 
>     rcu_read_unlock()
> }
> 
> Of course, @fit and @current_number should be persistent during live migration.
you can drop RCU and @current_number,
and @fit could be exactly recreated on target side from
 -device nvdim ...
set of options, which must include all currently present
(including hotplugged) devices from source side.

It's sufficient to invalidate and restart DMA transfer in flight.

i.e.
pc_dimm_memory_plug ()
  -> regenerate_fit()
      -> set local flag @fit-dirty

  -> start QEMU.fit_read()
      -> if offset == 0 ? clear @fit-dirty
      make sure fit_read() won't conflict with regenerate_fit()
        either plain mutex or RCU would do the job

  ...
  OSPM.rfit()
     ...
     if (@fit-dirty)
         abort/restart DMA from start

In migration case, the target would have @fit-dirty set thanks to
pc_dimm_memory_plug () -> regenerate_fit() and any DMA in flight
would be restarted.
That's a little bit inefficient as source_fit exactly matches
target_fit and DMA could continue, but it would save us from
having fit specific migration state to transfer and maintain.