All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] add MEMORY_FAILURE event
@ 2020-09-22  9:56 zhenwei pi
  2020-09-22  9:56 ` [PATCH v2 1/3] target-i386: seperate MCIP & MCE_MASK error reason zhenwei pi
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: zhenwei pi @ 2020-09-22  9:56 UTC (permalink / raw)
  To: pbonzini, peter.maydell; +Cc: mtosatti, armbru, pizhenwei, qemu-devel

v1->v2:
Suggested by Peter Maydell, rename events to make them
architecture-neutral:
'PC-RAM' -> 'guest-memory'
'guest-triple-fault' -> 'guest-mce-fatal'

Suggested by Paolo, add more fields in event:
'action-required': boolean type to distinguish a guest-mce is AR/AO.
'recursive': boolean type. set true if: previous MCE in processing
             in guest, another AO MCE occurs.

v1:
Although QEMU could catch signal BUS to handle hardware memory
corrupted event, sadly, QEMU just prints a little log and try to fix
it silently.

In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
actions of QEMU, then uplayer could know what situaction QEMU hit and
did. And further step we can do: if a host server hits a 'hypervisor-ignore'
or 'guest-mce', scheduler could migrate VM to another host; if hitting
'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
healthy servers to launch VM.

Zhenwei Pi (3):
  target-i386: seperate MCIP & MCE_MASK error reason
  qapi/run-state.json: introduce memory failure event
  target-i386: post memory failure event to uplayer

 qapi/run-state.json  | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/helper.c | 40 +++++++++++++++++++++++++------
 target/i386/kvm.c    |  7 +++++-
 3 files changed, 106 insertions(+), 8 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] target-i386: seperate MCIP & MCE_MASK error reason
  2020-09-22  9:56 [PATCH v2 0/3] add MEMORY_FAILURE event zhenwei pi
@ 2020-09-22  9:56 ` zhenwei pi
  2020-09-22 10:23   ` Philippe Mathieu-Daudé
  2020-09-22  9:56 ` [PATCH v2 2/3] qapi/run-state.json: introduce memory failure event zhenwei pi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: zhenwei pi @ 2020-09-22  9:56 UTC (permalink / raw)
  To: pbonzini, peter.maydell; +Cc: mtosatti, armbru, pizhenwei, qemu-devel

Previously we can only get a simple string "Triple fault" in qemu
log. Add detailed message for the two reasons to describe why qemu
has to reset the guest.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 target/i386/helper.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 70be53e2c3..0c7fd32491 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -857,6 +857,8 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
     X86CPU *cpu = X86_CPU(cs);
     CPUX86State *cenv = &cpu->env;
     uint64_t *banks = cenv->mce_banks + 4 * params->bank;
+    char msg[64];
+    bool need_reset = false;
 
     cpu_synchronize_state(cs);
 
@@ -894,16 +896,25 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
             return;
         }
 
-        if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
-            !(cenv->cr[4] & CR4_MCE_MASK)) {
-            monitor_printf(params->mon,
-                           "CPU %d: Previous MCE still in progress, raising"
-                           " triple fault\n",
-                           cs->cpu_index);
-            qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
+        if (cenv->mcg_status & MCG_STATUS_MCIP) {
+            need_reset = true;
+            snprintf(msg, sizeof(msg), "CPU %d: Previous MCE still in progress,"
+                     " raising triple fault", cs->cpu_index);
+        }
+
+        if (!(cenv->cr[4] & CR4_MCE_MASK)) {
+            need_reset = true;
+            snprintf(msg, sizeof(msg), "CPU %d: MCE capability is not enabled,"
+                     " raising triple fault", cs->cpu_index);
+        }
+
+        if (need_reset) {
+            monitor_printf(params->mon, "%s", msg);
+            qemu_log_mask(CPU_LOG_RESET, "%s\n", msg);
             qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
             return;
         }
+
         if (banks[1] & MCI_STATUS_VAL) {
             params->status |= MCI_STATUS_OVER;
         }
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] qapi/run-state.json: introduce memory failure event
  2020-09-22  9:56 [PATCH v2 0/3] add MEMORY_FAILURE event zhenwei pi
  2020-09-22  9:56 ` [PATCH v2 1/3] target-i386: seperate MCIP & MCE_MASK error reason zhenwei pi
@ 2020-09-22  9:56 ` zhenwei pi
  2020-09-22  9:56 ` [PATCH v2 3/3] target-i386: post memory failure event to uplayer zhenwei pi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: zhenwei pi @ 2020-09-22  9:56 UTC (permalink / raw)
  To: pbonzini, peter.maydell; +Cc: mtosatti, armbru, pizhenwei, qemu-devel

Introduce 4 memory failure events for a guest. Then uplayer could
know when/why/what happened to a guest during hitting a hardware
memory failure.

Suggested by Peter Maydell, rename events name&description to make
them architecture-neutral; and suggested by Paolo, add more info to
distinguish a guest-mce is AR/AO.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 qapi/run-state.json | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/qapi/run-state.json b/qapi/run-state.json
index 7cc9f96a5b..f40111ac67 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -475,3 +475,70 @@
            'psw-mask': 'uint64',
            'psw-addr': 'uint64',
            'reason': 'S390CrashReason' } }
+
+##
+# @MEMORY_FAILURE:
+#
+# Emitted when a memory failure occurs on host side.
+#
+# @action: action that has been taken. action is defined as @MemoryFailureAction.
+#
+# Since: 5.2
+#
+# Example:
+#
+# <- { "event": "MEMORY_FAILURE",
+#      "data": { "action": "guest-mce" } }
+#
+##
+{ 'event': 'MEMORY_FAILURE',
+  'data': { 'action': 'MemoryFailureAction',
+            '*flags': 'MemoryFailureFlags'} }
+
+##
+# @MemoryFailureAction:
+#
+# Hardware memory failure occurs, handled by QEMU.
+#
+# @hypervisor-ignore: action optional memory failure at QEMU process address
+#                     space (none guest memory, but used by QEMU itself), QEMU
+#                     could ignore this hardware memory failure.
+#
+# @hypervisor-fatal: action required memory failure at QEMU process address
+#                    space (none guest memory, but used by QEMU itself), QEMU
+#                    has to stop itself.
+#
+# @guest-mce-inject: memory failure at guest memory, and guest enables MCE
+#                    handling mechanism, QEMU injects MCE to guest.
+#
+# @guest-mce-fatal: memory failure at guest memory, but guest is not ready to
+#                   to handle MCE(typical cases: guest has no MCE mechanism, or
+#                   guest disables MCE, or during previous MCE still in
+#                   processing, an AR MCE occurs). QEMU has to raise a fault and
+#                   shutdown/reset. Also see detailed info in QEMU log.
+#
+# Since: 5.2
+#
+##
+{ 'enum': 'MemoryFailureAction',
+  'data': [ 'hypervisor-ignore',
+            'hypervisor-fatal',
+            'guest-mce-inject',
+            'guest-mce-fatal' ] }
+
+##
+# @MemoryFailureFlags:
+#
+# Structure of flags for each memory failure event.
+#
+# @action-required: describe a MCE event as AR/AO.
+#
+# @recursive: previous MCE in processing in guest, another AO MCE
+#             occurs, set recursive as true.
+#
+# Since: 5.2
+#
+##
+{ 'struct': 'MemoryFailureFlags',
+  'data': { '*action-required': 'bool',
+            '*recursive': 'bool'} }
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] target-i386: post memory failure event to uplayer
  2020-09-22  9:56 [PATCH v2 0/3] add MEMORY_FAILURE event zhenwei pi
  2020-09-22  9:56 ` [PATCH v2 1/3] target-i386: seperate MCIP & MCE_MASK error reason zhenwei pi
  2020-09-22  9:56 ` [PATCH v2 2/3] qapi/run-state.json: introduce memory failure event zhenwei pi
@ 2020-09-22  9:56 ` zhenwei pi
  2020-09-22 10:30   ` Philippe Mathieu-Daudé
  2020-09-22 15:40 ` [PATCH v2 0/3] add MEMORY_FAILURE event no-reply
  2020-09-28 12:01 ` PING: " zhenwei pi
  4 siblings, 1 reply; 9+ messages in thread
From: zhenwei pi @ 2020-09-22  9:56 UTC (permalink / raw)
  To: pbonzini, peter.maydell; +Cc: mtosatti, armbru, pizhenwei, qemu-devel

Post memory failure event to uplayer to handle hardware memory
corrupted event. Rather than simple QEMU log, QEMU could report more
effective message to uplayer. For example, guest crashes by MCE,
selecting another host server is a better choice.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 target/i386/helper.c | 15 +++++++++++++++
 target/i386/kvm.c    |  7 ++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 0c7fd32491..47823c29e4 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/qapi-events-run-state.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "qemu/qemu-print.h"
@@ -858,6 +859,7 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
     CPUX86State *cenv = &cpu->env;
     uint64_t *banks = cenv->mce_banks + 4 * params->bank;
     char msg[64];
+    MemoryFailureFlags mf_flags = {0};
     bool need_reset = false;
 
     cpu_synchronize_state(cs);
@@ -869,6 +871,12 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
     if (!(params->flags & MCE_INJECT_UNCOND_AO)
         && !(params->status & MCI_STATUS_AR)
         && (cenv->mcg_status & MCG_STATUS_MCIP)) {
+        mf_flags.has_action_required = true;
+        mf_flags.action_required = false;
+        mf_flags.has_recursive = true;
+        mf_flags.recursive = true;
+        qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_GUEST_MCE_INJECT,
+                                       true, &mf_flags);
         return;
     }
 
@@ -909,6 +917,8 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
         }
 
         if (need_reset) {
+            qapi_event_send_memory_failure(
+                 MEMORY_FAILURE_ACTION_GUEST_MCE_FATAL, false, NULL);
             monitor_printf(params->mon, "%s", msg);
             qemu_log_mask(CPU_LOG_RESET, "%s\n", msg);
             qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
@@ -934,6 +944,11 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
     } else {
         banks[1] |= MCI_STATUS_OVER;
     }
+
+    mf_flags.has_action_required = true;
+    mf_flags.action_required = !!(params->status & MCI_STATUS_AR);
+    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_GUEST_MCE_INJECT,
+                                   true, &mf_flags);
 }
 
 void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 9efb07e7c8..989889c291 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -14,6 +14,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-run-state.h"
 #include <sys/ioctl.h>
 #include <sys/utsname.h>
 
@@ -577,6 +578,8 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
 
 static void hardware_memory_error(void *host_addr)
 {
+    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_HYPERVISOR_FATAL,
+                                   false, NULL);
     error_report("QEMU got Hardware memory error at addr %p", host_addr);
     exit(1);
 }
@@ -631,7 +634,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
         hardware_memory_error(addr);
     }
 
-    /* Hope we are lucky for AO MCE */
+    /* Hope we are lucky for AO MCE, just notify a event */
+    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_HYPERVISOR_IGNORE,
+                                   false, NULL);
 }
 
 static void kvm_reset_exception(CPUX86State *env)
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] target-i386: seperate MCIP & MCE_MASK error reason
  2020-09-22  9:56 ` [PATCH v2 1/3] target-i386: seperate MCIP & MCE_MASK error reason zhenwei pi
@ 2020-09-22 10:23   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 9+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-22 10:23 UTC (permalink / raw)
  To: zhenwei pi, pbonzini, peter.maydell; +Cc: mtosatti, armbru, qemu-devel

On 9/22/20 11:56 AM, zhenwei pi wrote:
> Previously we can only get a simple string "Triple fault" in qemu
> log. Add detailed message for the two reasons to describe why qemu
> has to reset the guest.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  target/i386/helper.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/target/i386/helper.c b/target/i386/helper.c
> index 70be53e2c3..0c7fd32491 100644
> --- a/target/i386/helper.c
> +++ b/target/i386/helper.c
> @@ -857,6 +857,8 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>      X86CPU *cpu = X86_CPU(cs);
>      CPUX86State *cenv = &cpu->env;
>      uint64_t *banks = cenv->mce_banks + 4 * params->bank;
> +    char msg[64];

The preferred for is now to use 'g_autofree char *msg = NULL'
here and g_strdup_printf() instead of snprintf().

> +    bool need_reset = false;
>  
>      cpu_synchronize_state(cs);
>  
> @@ -894,16 +896,25 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>              return;
>          }
>  
> -        if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
> -            !(cenv->cr[4] & CR4_MCE_MASK)) {
> -            monitor_printf(params->mon,
> -                           "CPU %d: Previous MCE still in progress, raising"
> -                           " triple fault\n",
> -                           cs->cpu_index);
> -            qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
> +        if (cenv->mcg_status & MCG_STATUS_MCIP) {
> +            need_reset = true;
> +            snprintf(msg, sizeof(msg), "CPU %d: Previous MCE still in progress,"
> +                     " raising triple fault", cs->cpu_index);
> +        }
> +
> +        if (!(cenv->cr[4] & CR4_MCE_MASK)) {
> +            need_reset = true;
> +            snprintf(msg, sizeof(msg), "CPU %d: MCE capability is not enabled,"
> +                     " raising triple fault", cs->cpu_index);
> +        }
> +
> +        if (need_reset) {
> +            monitor_printf(params->mon, "%s", msg);
> +            qemu_log_mask(CPU_LOG_RESET, "%s\n", msg);
>              qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
>              return;
>          }
> +
>          if (banks[1] & MCI_STATUS_VAL) {
>              params->status |= MCI_STATUS_OVER;
>          }
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] target-i386: post memory failure event to uplayer
  2020-09-22  9:56 ` [PATCH v2 3/3] target-i386: post memory failure event to uplayer zhenwei pi
@ 2020-09-22 10:30   ` Philippe Mathieu-Daudé
  2020-09-23  3:12     ` [External] " zhenwei pi
  0 siblings, 1 reply; 9+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-22 10:30 UTC (permalink / raw)
  To: zhenwei pi, pbonzini, peter.maydell; +Cc: mtosatti, armbru, qemu-devel

On 9/22/20 11:56 AM, zhenwei pi wrote:
> Post memory failure event to uplayer to handle hardware memory
> corrupted event. Rather than simple QEMU log, QEMU could report more
> effective message to uplayer. For example, guest crashes by MCE,
> selecting another host server is a better choice.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  target/i386/helper.c | 15 +++++++++++++++
>  target/i386/kvm.c    |  7 ++++++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/helper.c b/target/i386/helper.c
> index 0c7fd32491..47823c29e4 100644
> --- a/target/i386/helper.c
> +++ b/target/i386/helper.c
> @@ -18,6 +18,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qapi/qapi-events-run-state.h"
>  #include "cpu.h"
>  #include "exec/exec-all.h"
>  #include "qemu/qemu-print.h"
> @@ -858,6 +859,7 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>      CPUX86State *cenv = &cpu->env;
>      uint64_t *banks = cenv->mce_banks + 4 * params->bank;
>      char msg[64];
> +    MemoryFailureFlags mf_flags = {0};
>      bool need_reset = false;
>  
>      cpu_synchronize_state(cs);
> @@ -869,6 +871,12 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>      if (!(params->flags & MCE_INJECT_UNCOND_AO)
>          && !(params->status & MCI_STATUS_AR)
>          && (cenv->mcg_status & MCG_STATUS_MCIP)) {
> +        mf_flags.has_action_required = true;
> +        mf_flags.action_required = false;
> +        mf_flags.has_recursive = true;
> +        mf_flags.recursive = true;
> +        qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_GUEST_MCE_INJECT,
> +                                       true, &mf_flags);

Can you extract a function such:

static void emit_guest_mce_failure(bool action_required, bool recursive)
{
  ...
}

To use as:

           emit_guest_mce_failure(true, true);

>          return;
>      }
>  
> @@ -909,6 +917,8 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>          }
>  
>          if (need_reset) {
> +            qapi_event_send_memory_failure(
> +                 MEMORY_FAILURE_ACTION_GUEST_MCE_FATAL, false, NULL);
>              monitor_printf(params->mon, "%s", msg);
>              qemu_log_mask(CPU_LOG_RESET, "%s\n", msg);
>              qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
> @@ -934,6 +944,11 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>      } else {
>          banks[1] |= MCI_STATUS_OVER;
>      }
> +
> +    mf_flags.has_action_required = true;
> +    mf_flags.action_required = !!(params->status & MCI_STATUS_AR);
> +    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_GUEST_MCE_INJECT,
> +                                   true, &mf_flags);

And here:

       emit_guest_mce_failure(params->status & MCI_STATUS_AR, false);

>  }
>  
>  void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 9efb07e7c8..989889c291 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -14,6 +14,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-events-run-state.h"
>  #include <sys/ioctl.h>
>  #include <sys/utsname.h>
>  
> @@ -577,6 +578,8 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
>  
>  static void hardware_memory_error(void *host_addr)
>  {
> +    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_HYPERVISOR_FATAL,
> +                                   false, NULL);
>      error_report("QEMU got Hardware memory error at addr %p", host_addr);
>      exit(1);
>  }
> @@ -631,7 +634,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>          hardware_memory_error(addr);
>      }
>  
> -    /* Hope we are lucky for AO MCE */
> +    /* Hope we are lucky for AO MCE, just notify a event */
> +    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_HYPERVISOR_IGNORE,
> +                                   false, NULL);
>  }
>  
>  static void kvm_reset_exception(CPUX86State *env)
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] add MEMORY_FAILURE event
  2020-09-22  9:56 [PATCH v2 0/3] add MEMORY_FAILURE event zhenwei pi
                   ` (2 preceding siblings ...)
  2020-09-22  9:56 ` [PATCH v2 3/3] target-i386: post memory failure event to uplayer zhenwei pi
@ 2020-09-22 15:40 ` no-reply
  2020-09-28 12:01 ` PING: " zhenwei pi
  4 siblings, 0 replies; 9+ messages in thread
From: no-reply @ 2020-09-22 15:40 UTC (permalink / raw)
  To: pizhenwei
  Cc: peter.maydell, mtosatti, qemu-devel, pizhenwei, armbru, pbonzini

Patchew URL: https://patchew.org/QEMU/20200922095630.394893-1-pizhenwei@bytedance.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

C linker for the host machine: cc ld.bfd 2.27-43
Host machine cpu family: x86_64
Host machine cpu: x86_64
../src/meson.build:10: WARNING: Module unstable-keyval has no backwards or forwards compatibility and might not exist in future releases.
Program sh found: YES
Program python3 found: YES (/usr/bin/python3)
Configuring ninjatool using configuration
---
Not run: 259
Failures: 192
Failed 1 of 121 iotests
make: *** [check-block] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 709, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--rm', '--label', 'com.qemu.instance.uuid=c2bcee7055544144a0155e97d2f7a118', '-u', '1001', '--security-opt', 'seccomp=unconfined', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-prfzegjw/src/docker-src.2020-09-22-11.22.51.1488:/var/tmp/qemu:z,ro', 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=c2bcee7055544144a0155e97d2f7a118
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-prfzegjw/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    17m39.310s
user    0m20.428s


The full log is available at
http://patchew.org/logs/20200922095630.394893-1-pizhenwei@bytedance.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [External] Re: [PATCH v2 3/3] target-i386: post memory failure event to uplayer
  2020-09-22 10:30   ` Philippe Mathieu-Daudé
@ 2020-09-23  3:12     ` zhenwei pi
  0 siblings, 0 replies; 9+ messages in thread
From: zhenwei pi @ 2020-09-23  3:12 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, pbonzini, peter.maydell
  Cc: mtosatti, armbru, qemu-devel



On 9/22/20 6:30 PM, Philippe Mathieu-Daudé wrote:
> On 9/22/20 11:56 AM, zhenwei pi wrote:
>> Post memory failure event to uplayer to handle hardware memory
>> corrupted event. Rather than simple QEMU log, QEMU could report more
>> effective message to uplayer. For example, guest crashes by MCE,
>> selecting another host server is a better choice.
>>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>> ---
>>   target/i386/helper.c | 15 +++++++++++++++
>>   target/i386/kvm.c    |  7 ++++++-
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/i386/helper.c b/target/i386/helper.c
>> index 0c7fd32491..47823c29e4 100644
>> --- a/target/i386/helper.c
>> +++ b/target/i386/helper.c
>> @@ -18,6 +18,7 @@
>>    */
>>   
>>   #include "qemu/osdep.h"
>> +#include "qapi/qapi-events-run-state.h"
>>   #include "cpu.h"
>>   #include "exec/exec-all.h"
>>   #include "qemu/qemu-print.h"
>> @@ -858,6 +859,7 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>>       CPUX86State *cenv = &cpu->env;
>>       uint64_t *banks = cenv->mce_banks + 4 * params->bank;
>>       char msg[64];
>> +    MemoryFailureFlags mf_flags = {0};
>>       bool need_reset = false;
>>   
>>       cpu_synchronize_state(cs);
>> @@ -869,6 +871,12 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>>       if (!(params->flags & MCE_INJECT_UNCOND_AO)
>>           && !(params->status & MCI_STATUS_AR)
>>           && (cenv->mcg_status & MCG_STATUS_MCIP)) {
>> +        mf_flags.has_action_required = true;
>> +        mf_flags.action_required = false;
>> +        mf_flags.has_recursive = true;
>> +        mf_flags.recursive = true;
>> +        qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_GUEST_MCE_INJECT,
>> +                                       true, &mf_flags);
> 
> Can you extract a function such:
> 
> static void emit_guest_mce_failure(bool action_required, bool recursive)
> {
>    ...
> }
> 
> To use as:
> 
>             emit_guest_mce_failure(true, true);
> 
>>           return;
>>       }
>>

There are 2 field in struct MemoryFailureFlags currently, maybe more 
fields need to be added in the future, and emit_guest_mce_failure need 
more arguments too. So is it worth wrapping a function now?


>> @@ -909,6 +917,8 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>>           }
>>   
>>           if (need_reset) {
>> +            qapi_event_send_memory_failure(
>> +                 MEMORY_FAILURE_ACTION_GUEST_MCE_FATAL, false, NULL);
>>               monitor_printf(params->mon, "%s", msg);
>>               qemu_log_mask(CPU_LOG_RESET, "%s\n", msg);
>>               qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
>> @@ -934,6 +944,11 @@ static void do_inject_x86_mce(CPUState *cs, run_on_cpu_data data)
>>       } else {
>>           banks[1] |= MCI_STATUS_OVER;
>>       }
>> +
>> +    mf_flags.has_action_required = true;
>> +    mf_flags.action_required = !!(params->status & MCI_STATUS_AR);
>> +    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_GUEST_MCE_INJECT,
>> +                                   true, &mf_flags);
> 
> And here:
> 
>         emit_guest_mce_failure(params->status & MCI_STATUS_AR, false);
> 
>>   }
>>   
>>   void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index 9efb07e7c8..989889c291 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -14,6 +14,7 @@
>>   
>>   #include "qemu/osdep.h"
>>   #include "qapi/error.h"
>> +#include "qapi/qapi-events-run-state.h"
>>   #include <sys/ioctl.h>
>>   #include <sys/utsname.h>
>>   
>> @@ -577,6 +578,8 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
>>   
>>   static void hardware_memory_error(void *host_addr)
>>   {
>> +    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_HYPERVISOR_FATAL,
>> +                                   false, NULL);
>>       error_report("QEMU got Hardware memory error at addr %p", host_addr);
>>       exit(1);
>>   }
>> @@ -631,7 +634,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>           hardware_memory_error(addr);
>>       }
>>   
>> -    /* Hope we are lucky for AO MCE */
>> +    /* Hope we are lucky for AO MCE, just notify a event */
>> +    qapi_event_send_memory_failure(MEMORY_FAILURE_ACTION_HYPERVISOR_IGNORE,
>> +                                   false, NULL);
>>   }
>>   
>>   static void kvm_reset_exception(CPUX86State *env)
>>
> 

-- 
zhenwei pi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* PING: [PATCH v2 0/3] add MEMORY_FAILURE event
  2020-09-22  9:56 [PATCH v2 0/3] add MEMORY_FAILURE event zhenwei pi
                   ` (3 preceding siblings ...)
  2020-09-22 15:40 ` [PATCH v2 0/3] add MEMORY_FAILURE event no-reply
@ 2020-09-28 12:01 ` zhenwei pi
  4 siblings, 0 replies; 9+ messages in thread
From: zhenwei pi @ 2020-09-28 12:01 UTC (permalink / raw)
  To: pbonzini, peter.maydell; +Cc: mtosatti, armbru, qemu-devel

PING

On 9/22/20 5:56 PM, zhenwei pi wrote:
> v1->v2:
> Suggested by Peter Maydell, rename events to make them
> architecture-neutral:
> 'PC-RAM' -> 'guest-memory'
> 'guest-triple-fault' -> 'guest-mce-fatal'
> 
> Suggested by Paolo, add more fields in event:
> 'action-required': boolean type to distinguish a guest-mce is AR/AO.
> 'recursive': boolean type. set true if: previous MCE in processing
>               in guest, another AO MCE occurs.
> 
> v1:
> Although QEMU could catch signal BUS to handle hardware memory
> corrupted event, sadly, QEMU just prints a little log and try to fix
> it silently.
> 
> In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
> actions of QEMU, then uplayer could know what situaction QEMU hit and
> did. And further step we can do: if a host server hits a 'hypervisor-ignore'
> or 'guest-mce', scheduler could migrate VM to another host; if hitting
> 'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
> healthy servers to launch VM.
> 
> Zhenwei Pi (3):
>    target-i386: seperate MCIP & MCE_MASK error reason
>    qapi/run-state.json: introduce memory failure event
>    target-i386: post memory failure event to uplayer
> 
>   qapi/run-state.json  | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   target/i386/helper.c | 40 +++++++++++++++++++++++++------
>   target/i386/kvm.c    |  7 +++++-
>   3 files changed, 106 insertions(+), 8 deletions(-)
> 

-- 
zhenwei pi


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-28 12:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22  9:56 [PATCH v2 0/3] add MEMORY_FAILURE event zhenwei pi
2020-09-22  9:56 ` [PATCH v2 1/3] target-i386: seperate MCIP & MCE_MASK error reason zhenwei pi
2020-09-22 10:23   ` Philippe Mathieu-Daudé
2020-09-22  9:56 ` [PATCH v2 2/3] qapi/run-state.json: introduce memory failure event zhenwei pi
2020-09-22  9:56 ` [PATCH v2 3/3] target-i386: post memory failure event to uplayer zhenwei pi
2020-09-22 10:30   ` Philippe Mathieu-Daudé
2020-09-23  3:12     ` [External] " zhenwei pi
2020-09-22 15:40 ` [PATCH v2 0/3] add MEMORY_FAILURE event no-reply
2020-09-28 12:01 ` PING: " zhenwei pi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.