All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Ganesh <ganeshgr@linux.ibm.com>
Cc: Aravinda Prasad <arawinda.p@gmail.com>,
	aik@ozlabs.ru, qemu-devel@nongnu.org, groug@kaod.org,
	paulus@ozlabs.org, qemu-ppc@nongnu.org
Subject: Re: [PATCH v18 6/7] migration: Include migration support for machine check handling
Date: Wed, 8 Jan 2020 16:45:43 +1100	[thread overview]
Message-ID: <20200108054543.GA8586@umbus.fritz.box> (raw)
In-Reply-To: <066a1db3-254a-5607-915e-0392fefd72e6@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 9020 bytes --]

On Tue, Jan 07, 2020 at 04:58:14PM +0530, Ganesh wrote:
> 
> On 1/3/20 7:55 AM, David Gibson wrote:
> > On Thu, Jan 02, 2020 at 01:21:10PM +0530, Ganesh Goudar wrote:
> > > From: Aravinda Prasad <arawinda.p@gmail.com>
> > > 
> > > This patch includes migration support for machine check
> > > handling. Especially this patch blocks VM migration
> > > requests until the machine check error handling is
> > > complete as these errors are specific to the source
> > > hardware and is irrelevant on the target hardware.
> > > 
> > > [Do not set FWNMI cap in post_load, now its done in .apply hook]
> > > Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> > > Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
> > > ---
> > >   hw/ppc/spapr.c         | 41 +++++++++++++++++++++++++++++++++++++++++
> > >   hw/ppc/spapr_events.c  | 20 +++++++++++++++++++-
> > >   hw/ppc/spapr_rtas.c    |  4 ++++
> > >   include/hw/ppc/spapr.h |  1 +
> > >   4 files changed, 65 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 975d7da734..4acdc30100 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -46,6 +46,7 @@
> > >   #include "migration/qemu-file-types.h"
> > >   #include "migration/global_state.h"
> > >   #include "migration/register.h"
> > > +#include "migration/blocker.h"
> > >   #include "mmu-hash64.h"
> > >   #include "mmu-book3s-v3.h"
> > >   #include "cpu-models.h"
> > > @@ -1685,6 +1686,8 @@ static void spapr_machine_reset(MachineState *machine)
> > >       /* Signal all vCPUs waiting on this condition */
> > >       qemu_cond_broadcast(&spapr->mc_delivery_cond);
> > > +
> > > +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
> > >   }
> > >   static void spapr_create_nvram(SpaprMachineState *spapr)
> > > @@ -1967,6 +1970,43 @@ static const VMStateDescription vmstate_spapr_dtb = {
> > >       },
> > >   };
> > > +static bool spapr_fwnmi_needed(void *opaque)
> > > +{
> > > +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > > +
> > > +    return spapr->fwnmi_calls_registered;
> > I think it would be better to base this directly on the cap, rather
> > than a variable set later.
> Ok, ill revert to older way
> > 
> > > +}
> > > +
> > > +static int spapr_fwnmi_pre_save(void *opaque)
> > > +{
> > > +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > > +
> > > +    /*
> > > +     * Check if machine check handling is in progress and print a
> > > +     * warning message.
> > > +     */
> > > +    if (spapr->mc_status != -1) {
> > > +        warn_report("A machine check is being handled during migration. The"
> > > +                "handler may run and log hardware error on the destination");
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_spapr_machine_check = {
> > > +    .name = "spapr_machine_check",
> > > +    .version_id = 1,
> > > +    .minimum_version_id = 1,
> > > +    .needed = spapr_fwnmi_needed,
> > > +    .pre_save = spapr_fwnmi_pre_save,
> > > +    .fields = (VMStateField[]) {
> > > +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> > > +        VMSTATE_INT32(mc_status, SpaprMachineState),
> > > +        VMSTATE_BOOL(fwnmi_calls_registered, SpaprMachineState),
> > This doesn't make sense to migrate - it will always have its final
> > value by the time the guest is running in a migratable state.
> Ok, ill remove it.
> > 
> > > +        VMSTATE_END_OF_LIST()
> > > +    },
> > > +};
> > > +
> > >   static const VMStateDescription vmstate_spapr = {
> > >       .name = "spapr",
> > >       .version_id = 3,
> > > @@ -2001,6 +2041,7 @@ static const VMStateDescription vmstate_spapr = {
> > >           &vmstate_spapr_cap_large_decr,
> > >           &vmstate_spapr_cap_ccf_assist,
> > >           &vmstate_spapr_cap_fwnmi,
> > > +        &vmstate_spapr_machine_check,
> > >           NULL
> > >       }
> > >   };
> > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > > index 54eaf28a9e..7092687fa0 100644
> > > --- a/hw/ppc/spapr_events.c
> > > +++ b/hw/ppc/spapr_events.c
> > > @@ -43,6 +43,7 @@
> > >   #include "qemu/main-loop.h"
> > >   #include "hw/ppc/spapr_ovec.h"
> > >   #include <libfdt.h>
> > > +#include "migration/blocker.h"
> > >   #define RTAS_LOG_VERSION_MASK                   0xff000000
> > >   #define   RTAS_LOG_VERSION_6                    0x06000000
> > > @@ -843,6 +844,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > >   {
> > >       SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > >       CPUState *cs = CPU(cpu);
> > > +    int ret;
> > > +    Error *local_err = NULL;
> > >       if (spapr->guest_machine_check_addr == -1) {
> > >           /*
> > > @@ -872,8 +875,23 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > >               return;
> > >           }
> > >       }
> > > -    spapr->mc_status = cpu->vcpu_id;
> > > +    error_setg(&spapr->fwnmi_migration_blocker,
> > > +               "Live migration not supported during machine check handling");
> > > +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> > > +    if (ret == -EBUSY) {
> > > +        /*
> > > +         * We don't want to abort so we let the migration to continue.
> > > +         * In a rare case, the machine check handler will run on the target.
> > > +         * Though this is not preferable, it is better than aborting
> > > +         * the migration or killing the VM.
> > > +         */
> > > +        error_free(spapr->fwnmi_migration_blocker);
> > > +        spapr->fwnmi_migration_blocker = NULL;
> > > +        warn_report("Received a fwnmi while migration was in progress");
> > Didn't we change from initializing the blocker Error at init time
> > because there was a case where we could have two migration blockers
> > registered at once?  If that's so then we need entirely different
> > instances of the blocker Error.  Just dynamiically allocating them
> > doesn't help us if there can still only be one at a time.
> 
> I agree, but this how we were doing it before.
> 
> Are you suggesting to have per cpu blocker Error instance ?

I was, but..

> I think initializing the blocker Error at init time and not freeing it, is
> much simpler
> 
> and cleaner. And if we receive multiple fwnmi events on different cpus
> almost
> 
> at the same time, Though we will be prepending same migration blocker
> instance
> 
> multiple times to the migration_blockers list, IIUC we will not be
> unblocking migration
> 
> till the migration_blockers list is empty. Please let me know if you are ok
> with initializing
> 
> blocker error at init time.

.. I realized I was mistaken.  It wasn't premature unblocking I was
concerned about, but corrupting the actual list structure.  I thought
we were using an intrusive linked list like the QLIST_*() stuff for
this, which can't tolerate double adding an element.  In fact we're
using g_slist_*() which has it's own wrapper nodes around the pointers
given here, so we're ok after all.

So creating the blocker error at init time is the way to go after all.

> > > +    }
> > > +
> > > +    spapr->mc_status = cpu->vcpu_id;
> > >       spapr_mce_dispatch_elog(cpu, recovered);
> > >   }
> > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > index 54b142f35b..3409f6b896 100644
> > > --- a/hw/ppc/spapr_rtas.c
> > > +++ b/hw/ppc/spapr_rtas.c
> > > @@ -50,6 +50,7 @@
> > >   #include "hw/ppc/fdt.h"
> > >   #include "target/ppc/mmu-hash64.h"
> > >   #include "target/ppc/mmu-book3s-v3.h"
> > > +#include "migration/blocker.h"
> > >   static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > >                                      uint32_t token, uint32_t nargs,
> > > @@ -448,6 +449,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> > >       spapr->mc_status = -1;
> > >       qemu_cond_signal(&spapr->mc_delivery_cond);
> > >       rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
> > > +    error_free(spapr->fwnmi_migration_blocker);
> > > +    spapr->fwnmi_migration_blocker = NULL;
> > >   }
> > >   static struct rtas_call {
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index a90e677cc3..ac246c8be3 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -220,6 +220,7 @@ struct SpaprMachineState {
> > >       SpaprTpmProxy *tpm_proxy;
> > >       bool fwnmi_calls_registered;
> > > +    Error *fwnmi_migration_blocker;
> > >   };
> > >   #define H_SUCCESS         0
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-01-08  5:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-02  7:51 [PATCH v18 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
2020-01-02  7:51 ` [PATCH v18 1/7] Wrapper function to wait on condition for the main loop mutex Ganesh Goudar
2020-01-05 15:17   ` Greg Kurz
2020-01-07 11:30     ` Ganesh
2020-01-02  7:51 ` [PATCH v18 2/7] ppc: spapr: Introduce FWNMI capability Ganesh Goudar
2020-01-06  9:07   ` Greg Kurz
2020-01-07 11:31     ` Ganesh
2020-01-02  7:51 ` [PATCH v18 3/7] target/ppc: Handle NMI guest exit Ganesh Goudar
2020-01-06  9:45   ` Greg Kurz
2020-01-07 11:32     ` Ganesh
2020-01-02  7:51 ` [PATCH v18 4/7] target/ppc: Build rtas error log upon an MCE Ganesh Goudar
2020-01-02  7:51 ` [PATCH v18 5/7] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Ganesh Goudar
2020-01-03  2:19   ` [PATCH v18 5/7] ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" " David Gibson
2020-01-07  6:27     ` Ganesh
2020-01-08  1:04       ` David Gibson
2020-01-08 18:49         ` Ganesh
2020-01-09  1:37           ` David Gibson
2020-01-02  7:51 ` [PATCH v18 6/7] migration: Include migration support for machine check handling Ganesh Goudar
2020-01-03  2:25   ` David Gibson
2020-01-07 11:28     ` Ganesh
2020-01-08  5:45       ` David Gibson [this message]
2020-01-02  7:51 ` [PATCH v18 7/7] ppc: spapr: Activate the FWNMI functionality Ganesh Goudar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200108054543.GA8586@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=arawinda.p@gmail.com \
    --cc=ganeshgr@linux.ibm.com \
    --cc=groug@kaod.org \
    --cc=paulus@ozlabs.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.