From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54343C07E96 for ; Thu, 8 Jul 2021 16:48:27 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D6C4D61450 for ; Thu, 8 Jul 2021 16:48:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6C4D61450 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:55536 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1XCL-0001di-El for qemu-devel@archiver.kernel.org; Thu, 08 Jul 2021 12:48:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33716) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1WMD-0001d6-5D for qemu-devel@nongnu.org; Thu, 08 Jul 2021 11:54:33 -0400 Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629]:43761) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1m1WMA-0002nK-T4 for qemu-devel@nongnu.org; Thu, 08 Jul 2021 11:54:32 -0400 Received: by mail-ej1-x629.google.com with SMTP id v20so10466154eji.10 for ; Thu, 08 Jul 2021 08:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jYH9m2FZ/G//6ex7GckEfsJ3q6FjpBwCiwIip9lu87A=; b=NHruHW9jOY96iCTSqa4puU15/sd1ZaOMxzzThUlcWSVACEUTT0XYeUG4DueNKYZK7b RDda5hU5x8JQLNObaOZHtao/ocnSRqlDwKwsrCmAj/DQ3jeL5gMf8ETe0YySu8iFv2du lz6VIpI4E+I4LK2QPDHx7P7H/yHGcmxWnJuCpul40sG9N7ClX3NeR/a0EzwBfoewSaHj nt+ETBOHExd+ALvzmsMBDXKyKia3G8I6B2NMYOpmc9AOtdMb8b4l/RXgv+u6VhfCS9eS 5PuW6Hd63fuYl/aLVpqU/BUvvYzbdpbpmhakYaIho0LGlGCCJsmAzyVV6uTXZdfbwozh zFdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jYH9m2FZ/G//6ex7GckEfsJ3q6FjpBwCiwIip9lu87A=; b=ogDnR+iBtmF4AaNgWwnkLKymkFl45nkTuJc15N9UK13ZezVXNqfsQt53yiil6pB9+O fBYNau7Fx+1QP01F2549Ms5cmP+resvr1FosIX50J52BbUL5nQ/fAoRzVrVXKTZtlnwt XAdrCcONBH0a7BN9kCwmgUF+w3E6zjBPhGPXrSRLu/VK8LxQHRpr9xpRDCTQvhbcAGHv OkBzVtR4dDmuRwbU5oSm4It2Hou3Ut0JnJ0xvpNf2F2F6YKh+0SYzsWBALIY1XrhMnPd 6JnrLq1LKEKIgv/JPMoq1OhXXK/q2K5dyr2/VoVsd9c1Zkx+XZdbQXDLejsrKyGbpEH9 EyJQ== X-Gm-Message-State: AOAM530hrAw8xrcGyh8ZgP7Bp/cnfkTymNLpC8nbViD60mJSAu9lluv2 iMZqg5h6mQfrr5VEX3VS0XQhjFrlr+WVunrp1EU= X-Google-Smtp-Source: ABdhPJzSZc8hiORjPJgpVl01mibodfpGGT3QMnqten050GD0YPVRlgQajPwi/M1kp+qs2Q+eBSpCjQ9w01rG6tB8NQ0= X-Received: by 2002:a17:907:3f89:: with SMTP id hr9mr26773680ejc.381.1625759669384; Thu, 08 Jul 2021 08:54:29 -0700 (PDT) MIME-Version: 1.0 References: <1625678434-240960-1-git-send-email-steven.sistare@oracle.com> <1625678434-240960-12-git-send-email-steven.sistare@oracle.com> In-Reply-To: From: =?UTF-8?B?TWFyYy1BbmRyw6kgTHVyZWF1?= Date: Thu, 8 Jul 2021 19:54:17 +0400 Message-ID: Subject: Re: [PATCH V5 11/25] cpr: restart mode To: Steve Sistare Content-Type: multipart/alternative; boundary="000000000000957aa205c69eab2d" Received-SPF: pass client-ip=2a00:1450:4864:20::629; envelope-from=marcandre.lureau@gmail.com; helo=mail-ej1-x629.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jason Zeng , Juan Quintela , Eric Blake , "Michael S. Tsirkin" , QEMU , "Dr. David Alan Gilbert" , Alex Williamson , Stefan Hajnoczi , Paolo Bonzini , "Daniel P. Berrange" , =?UTF-8?Q?Philippe_Mathieu=2DDaud=C3=A9?= , =?UTF-8?B?QWxleCBCZW5uw6ll?= , Markus Armbruster Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --000000000000957aa205c69eab2d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jul 8, 2021 at 7:43 PM Marc-Andr=C3=A9 Lureau wrote: > Hi > > On Wed, Jul 7, 2021 at 9:31 PM Steve Sistare > wrote: > >> Provide the cprsave restart mode, which preserves the guest VM across a >> restart of the qemu process. After cprsave, the caller passes qemu >> command-line arguments to cprexec, which directly exec's the new qemu >> binary. The arguments must include -S so new qemu starts in a paused >> state. >> The caller resumes the guest by calling cprload. >> >> To use the restart mode, qemu must be started with the memfd-alloc machi= ne >> option. The memfd's are saved to the environment and kept open across >> exec, >> after which they are found from the environment and re-mmap'd. Hence >> guest >> ram is preserved in place, albeit with new virtual addresses in the qemu >> process. >> >> The restart mode supports vfio devices in a subsequent patch. >> >> Signed-off-by: Steve Sistare >> > > What's the plan to make it work with -object memory-backend-memfd -machin= e > memory-backend? (or memory-backend-file, I guess that should work?) > > It seems to be addressed in some way in a later "hostmem-memfd: cpr support" patch. Imho it's worth mentioning in the commit message, reorganize patches closer. And the checks be added anyway for unsupported configurations. There should be some extra checks before accepting cprexec() on a > misconfigured VM. > > --- >> migration/cpr.c | 21 +++++++++++++++++++++ >> softmmu/physmem.c | 6 +++++- >> 2 files changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/migration/cpr.c b/migration/cpr.c >> index c5bad8a..fb57dec 100644 >> --- a/migration/cpr.c >> +++ b/migration/cpr.c >> @@ -29,6 +29,7 @@ >> #include "sysemu/xen.h" >> #include "hw/vfio/vfio-common.h" >> #include "hw/virtio/vhost.h" >> +#include "qemu/env.h" >> >> QEMUFile *qf_file_open(const char *path, int flags, int mode, >> const char *name, Error **errp) >> @@ -108,6 +109,26 @@ done: >> return; >> } >> >> +static int preserve_fd(const char *name, const char *val, void *handle) >> +{ >> + qemu_clr_cloexec(atoi(val)); >> + return 0; >> +} >> + >> +void cprexec(strList *args, Error **errp) >> +{ >> + if (xen_enabled()) { >> + error_setg(errp, "xen does not support cprexec"); >> + return; >> + } >> + if (!runstate_check(RUN_STATE_SAVE_VM)) { >> + error_setg(errp, "runstate is not save-vm"); >> + return; >> + } >> + walkenv(FD_PREFIX, preserve_fd, 0); > > > I am not convinced that relying on environment variables here is the bes= t > thing to do. > > + qemu_system_exec_request(args); >> +} >> + >> void cprload(const char *file, Error **errp) >> { >> QEMUFile *f; >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >> index b149250..8a65ef7 100644 >> --- a/softmmu/physmem.c >> +++ b/softmmu/physmem.c >> @@ -65,6 +65,7 @@ >> #include "qemu/pmem.h" >> >> #include "qemu/memfd.h" >> +#include "qemu/env.h" >> #include "migration/vmstate.h" >> >> #include "qemu/range.h" >> @@ -1986,7 +1987,7 @@ static void ram_block_add(RAMBlock *new_block, >> Error **errp) >> } else { >> name =3D memory_region_name(new_block->mr); >> if (ms->memfd_alloc) { >> > > > - int mfd =3D -1; /* placeholder until next patch= */ >> + int mfd =3D getenv_fd(name); >> mr->align =3D QEMU_VMALLOC_ALIGN; >> if (mfd < 0) { >> mfd =3D qemu_memfd_create(name, maxlen + mr->align, >> @@ -1994,7 +1995,9 @@ static void ram_block_add(RAMBlock *new_block, >> Error **errp) >> if (mfd < 0) { >> return; >> } >> + setenv_fd(name, mfd); >> } >> + qemu_clr_cloexec(mfd); >> > > Why clear it now, and on exec again? > > new_block->flags |=3D RAM_SHARED; >> addr =3D file_ram_alloc(new_block, maxlen, mfd, >> false, false, 0, errp); >> @@ -2246,6 +2249,7 @@ void qemu_ram_free(RAMBlock *block) >> } >> >> qemu_mutex_lock_ramlist(); >> + unsetenv_fd(memory_region_name(block->mr)); >> QLIST_REMOVE_RCU(block, next); >> ram_list.mru_block =3D NULL; >> /* Write list before version */ >> -- >> 1.8.3.1 >> >> >> > > -- > Marc-Andr=C3=A9 Lureau > --=20 Marc-Andr=C3=A9 Lureau --000000000000957aa205c69eab2d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Jul 8, 2021 at 7:43 PM Marc-A= ndr=C3=A9 Lureau <marcandr= e.lureau@gmail.com> wrote:
Hi

On Wed, Jul 7, 2021 at 9:31 P= M Steve Sistare <steven.sistare@oracle.com> wrote:
Provide the cprsave restart mode, which pr= eserves the guest VM across a
restart of the qemu process.=C2=A0 After cprsave, the caller passes qemu command-line arguments to cprexec, which directly exec's the new qemu binary.=C2=A0 The arguments must include -S so new qemu starts in a paused = state.
The caller resumes the guest by calling cprload.

To use the restart mode, qemu must be started with the memfd-alloc machine<= br> option.=C2=A0 The memfd's are saved to the environment and kept open ac= ross exec,
after which they are found from the environment and re-mmap'd.=C2=A0 He= nce guest
ram is preserved in place, albeit with new virtual addresses in the qemu process.

The restart mode supports vfio devices in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

What's the plan to make it work with -object memory-bac= kend-memfd -machine memory-backend? (or memory-backend-file, I guess that s= hould work?)


It seems to be addressed in some way in a later "hostmem-memfd:= cpr support" patch. Imho it's worth mentioning in the commit mess= age, reorganize patches closer. And the checks be added anyway for unsuppor= ted configurations.


There should be some extra checks before accepting cprex= ec() on a misconfigured VM.

---
=C2=A0migration/cpr.c=C2=A0 =C2=A0| 21 +++++++++++++++++++++
=C2=A0softmmu/physmem.c |=C2=A0 6 +++++-
=C2=A02 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/migration/cpr.c b/migration/cpr.c
index c5bad8a..fb57dec 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -29,6 +29,7 @@
=C2=A0#include "sysemu/xen.h"
=C2=A0#include "hw/vfio/vfio-common.h"
=C2=A0#include "hw/virtio/vhost.h"
+#include "qemu/env.h"

=C2=A0QEMUFile *qf_file_open(const char *path, int flags, int mode,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const char *name, Error **errp)
@@ -108,6 +109,26 @@ done:
=C2=A0 =C2=A0 =C2=A0return;
=C2=A0}

+static int preserve_fd(const char *name, const char *val, void *handle) +{
+=C2=A0 =C2=A0 qemu_clr_cloexec(atoi(val));
+=C2=A0 =C2=A0 return 0;
+}
+
+void cprexec(strList *args, Error **errp)
+{
+=C2=A0 =C2=A0 if (xen_enabled()) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_setg(errp, "xen does not support cp= rexec");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
+=C2=A0 =C2=A0 }
+=C2=A0 =C2=A0 if (!runstate_check(RUN_STATE_SAVE_VM)) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_setg(errp, "runstate is not save-vm= ");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
+=C2=A0 =C2=A0 }
+=C2=A0 =C2=A0 walkenv(FD_PREFIX, preserve_fd, 0);

I am=C2=A0 not convinced that relying on environment variables here= is the best thing to do.

+=C2=A0 =C2=A0 qemu_system_exec_request(args);
+}
+
=C2=A0void cprload(const char *file, Error **errp)
=C2=A0{
=C2=A0 =C2=A0 =C2=A0QEMUFile *f;
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index b149250..8a65ef7 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -65,6 +65,7 @@
=C2=A0#include "qemu/pmem.h"

=C2=A0#include "qemu/memfd.h"
+#include "qemu/env.h"
=C2=A0#include "migration/vmstate.h"

=C2=A0#include "qemu/range.h"
@@ -1986,7 +1987,7 @@ static void ram_block_add(RAMBlock *new_block, Error = **errp)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0name =3D memory_region_name= (new_block->mr);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (ms->memfd_alloc) {


-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int mfd =3D -1;=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* placeholder until next patch */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int mfd =3D getenv= _fd(name);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mr->align = =3D QEMU_VMALLOC_ALIGN;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (mfd < = 0) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0mfd =3D qemu_memfd_create(name, maxlen + mr->align,
@@ -1994,7 +1995,9 @@ static void ram_block_add(RAMBlock *new_block, Error = **errp)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0if (mfd < 0) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0return;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sete= nv_fd(name, mfd);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 qemu_clr_cloexec(m= fd);

Why clear it now, and on exec agai= n?

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0new_block->= ;flags |=3D RAM_SHARED;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0addr =3D file= _ram_alloc(new_block, maxlen, mfd,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0false, fa= lse, 0, errp);
@@ -2246,6 +2249,7 @@ void qemu_ram_free(RAMBlock *block)
=C2=A0 =C2=A0 =C2=A0}

=C2=A0 =C2=A0 =C2=A0qemu_mutex_lock_ramlist();
+=C2=A0 =C2=A0 unsetenv_fd(memory_region_name(block->mr));
=C2=A0 =C2=A0 =C2=A0QLIST_REMOVE_RCU(block, next);
=C2=A0 =C2=A0 =C2=A0ram_list.mru_block =3D NULL;
=C2=A0 =C2=A0 =C2=A0/* Write list before version */
--
1.8.3.1




--
Marc-Andr= =C3=A9 Lureau


--
Marc-Andr=C3=A9 Lureau
--000000000000957aa205c69eab2d--