All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v4 0/2] Containing AER unrecoverable errors
@ 2017-09-19 13:23 Venu Busireddy
  2017-09-19 13:23 ` [PATCH RESEND v4 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
  2017-09-19 13:23 ` [PATCH RESEND v4 2/2] xl: Register the AER event handler to handle " Venu Busireddy
  0 siblings, 2 replies; 3+ messages in thread
From: Venu Busireddy @ 2017-09-19 13:23 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu
  Cc: Andrew Cooper, Jan Beulich

This patch set is part of a set of patches that together allow containment
of unrecoverable AER errors from PCIe devices assigned to guests in
passthrough mode. The containment is achieved by forcibly removing the
erring PCIe device from the guest.

The original xen-pciback patch corresponding to this patch set is:
https://lists.xen.org/archives/html/xen-devel/2017-06/msg03274.html.
It will be reposted after this patch set is accepted.

Changes in v4:
  * Made the following changes suggested by Wei Liu.
    - Combine multiple LIBXL_HAVE_* definitions into one.
    - Use libxl__calloc() instead of malloc().

Changes in v3:
  * Made the following changes suggested by Wei Liu.
    - Added LIBXL_HAVE macros to libxl.h.
    - Don't hard-code dom0's domid to 0. Instead, use libxl__get_domid().
    - Corrected comments.
  * Made the following changes based on comments from Ian Jackson.
    - Got rid of the global variable aer_watch.
    - Added documentation (comments in code) for the new API calls.
    - Removed the unnecessary writes to xenstore.

Changes in v2:
  - Instead of killing the guest and hiding the device, forcibly remove
    the device from the guest.

Venu Busireddy (2):
  libxl: Implement the handler to handle unrecoverable AER errors.
  xl: Register the AER event handler to handle AER errors.

 tools/libxl/libxl.h          |  7 ++++
 tools/libxl/libxl_event.h    | 13 +++++++
 tools/libxl/libxl_internal.h |  7 ++++
 tools/libxl/libxl_pci.c      | 84 ++++++++++++++++++++++++++++++++++++++++++++
 tools/xl/xl_vmcontrol.c      |  9 +++++
 5 files changed, 120 insertions(+)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH RESEND v4 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-09-19 13:23 [PATCH RESEND v4 0/2] Containing AER unrecoverable errors Venu Busireddy
@ 2017-09-19 13:23 ` Venu Busireddy
  2017-09-19 13:23 ` [PATCH RESEND v4 2/2] xl: Register the AER event handler to handle " Venu Busireddy
  1 sibling, 0 replies; 3+ messages in thread
From: Venu Busireddy @ 2017-09-19 13:23 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu
  Cc: Andrew Cooper, Jan Beulich

Implement the callback function to handle unrecoverable AER errors, and
also the public APIs that can be used to register/unregister the handler.
When an AER error occurs, the handler will forcibly remove the erring
PCIe device from the guest.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/libxl/libxl.h          |  7 ++++
 tools/libxl/libxl_event.h    | 13 +++++++
 tools/libxl/libxl_internal.h |  7 ++++
 tools/libxl/libxl_pci.c      | 84 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 111 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 7cf0f31..03c5565 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1044,6 +1044,13 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
  */
 #define LIBXL_HAVE_QED 1
 
+/* LIBXL_HAVE_AER_EVENTS_HANDLER
+ *
+ * If this is defined, libxl has the library functions called
+ * libxl_reg_aer_events_handler and libxl_unreg_aer_events_handler.
+ */
+#define LIBXL_HAVE_AER_EVENTS_HANDLER 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 1ea789e..1aea906 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -184,6 +184,19 @@ void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
    * may generate only a DEATH event.
    */
 
+typedef struct libxl__aer_watch libxl_aer_watch;
+int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch **)
+                        LIBXL_EXTERNAL_CALLERS_ONLY;
+  /*
+   * Registers a handler to handle the occurrence of unrecoverable AER errors.
+   * This function depends on the calling application running the libxl's
+   * internal event loop. Toolstacks that do not use libxl's internal
+   * event loop must arrange to have their own event loop created and enter
+   * libxl (say, call libxl_event_wait()), to enable the event to be processed.
+   */
+void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch *)
+                        LIBXL_EXTERNAL_CALLERS_ONLY;
+
 typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;
 int libxl_evenable_disk_eject(libxl_ctx *ctx, uint32_t domid, const char *vdev,
                         libxl_ev_user, libxl_evgen_disk_eject **evgen_out);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index afe6652..2b74286 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -352,6 +352,13 @@ struct libxl__ev_child {
     LIBXL_LIST_ENTRY(struct libxl__ev_child) entry;
 };
 
+/*
+ * Structure used for AER event handling.
+ */
+struct libxl__aer_watch {
+    uint32_t domid;
+    libxl__ev_xswatch watch;
+};
 
 /*
  * evgen structures, which are the state we use for generating
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 65ad5e5..d1008f8 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1678,6 +1678,90 @@ static int libxl_device_pci_compare(libxl_device_pci *d1,
     return COMPARE_PCI(d1, d2);
 }
 
+static void aer_backend_watch_callback(libxl__egc *egc,
+                                       libxl__ev_xswatch *watch,
+                                       const char *watch_path,
+                                       const char *event_path)
+{
+    EGC_GC;
+    libxl_aer_watch *aer_ws = CONTAINER_OF(watch, *aer_ws, watch);
+    int rc;
+    uint32_t dom, bus, dev, fn;
+    uint32_t domid = aer_ws->domid;
+    char *p, *path;
+    const char *aerFailedSBDF;
+    libxl_device_pci pcidev;
+
+    /* Extract the backend directory. */
+    path = libxl__strdup(gc, event_path);
+    p = strrchr(path, '/');
+    if ((p == NULL) || (strcmp(p, "/aerFailedSBDF") != 0))
+        return;
+    /* Truncate the string so it points to the backend directory. */
+    *p = '\0';
+
+    /* Fetch the value of the failed PCI device. */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
+    if (rc || !aerFailedSBDF)
+        return;
+    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &dom, &bus, &dev, &fn);
+
+    libxl_device_pci_init(&pcidev);
+    pcidev_struct_fill(&pcidev, dom, bus, dev, fn, 0);
+    /* Forcibly remove the device from the guest */
+    rc = libxl__device_pci_remove_common(gc, domid, &pcidev, 1);
+    if (rc)
+        LOGD(ERROR, domid, " libxl__device_pci_remove_common() failed, rc=x%x",
+                (unsigned int)rc);
+
+    return;
+}
+
+int libxl_reg_aer_events_handler(libxl_ctx *ctx,
+                                 uint32_t domid,
+                                 libxl_aer_watch **aer_ws_out)
+{
+    int rc = 0;
+    uint32_t pciback_domid;
+    char *be_path;
+    libxl_aer_watch *aer_ws = NULL;
+    GC_INIT(ctx);
+
+    *aer_ws_out = NULL;
+
+    rc = libxl__get_domid(gc, (uint32_t *)(&pciback_domid));
+    if (rc) {
+        LOGD(ERROR, domid, " libxl__get_domid() failed, rc = %d", rc);
+        goto out;
+    }
+
+    aer_ws = libxl__calloc(NOGC, 1, sizeof(libxl_aer_watch));
+    aer_ws->domid = domid;
+    be_path = GCSPRINTF("/local/domain/%u/backend/pci/%u/%u/%s",
+            pciback_domid, domid, pciback_domid, "aerFailedSBDF");
+    rc = libxl__ev_xswatch_register(gc, &aer_ws->watch,
+            aer_backend_watch_callback, be_path);
+    *aer_ws_out = aer_ws;
+
+out:
+    GC_FREE;
+    return rc;
+}
+
+void libxl_unreg_aer_events_handler(libxl_ctx *ctx,
+                                    uint32_t domid,
+                                    libxl_aer_watch *aer_ws)
+{
+    GC_INIT(ctx);
+
+    libxl__ev_xswatch_deregister(gc, &aer_ws->watch);
+
+    free(aer_ws);
+    GC_FREE;
+    return;
+}
+
 DEFINE_DEVICE_TYPE_STRUCT_X(pcidev, pci);
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH RESEND v4 2/2] xl: Register the AER event handler to handle AER errors
  2017-09-19 13:23 [PATCH RESEND v4 0/2] Containing AER unrecoverable errors Venu Busireddy
  2017-09-19 13:23 ` [PATCH RESEND v4 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
@ 2017-09-19 13:23 ` Venu Busireddy
  1 sibling, 0 replies; 3+ messages in thread
From: Venu Busireddy @ 2017-09-19 13:23 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu
  Cc: Andrew Cooper, Jan Beulich

When a guest is created, register the AER event handler to handle the
AER errors. When an AER error occurs, the handler will forcibly remove
the erring PCIe device from the guest.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/xl/xl_vmcontrol.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25..9855cdb 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -656,6 +656,7 @@ int create_domain(struct domain_create *dom_info)
     const char *restore_source = NULL;
     int migrate_fd = dom_info->migrate_fd;
     bool config_in_json;
+    libxl_aer_watch *aer_ws = NULL;
 
     int i;
     int need_daemon = daemonize;
@@ -966,6 +967,12 @@ start:
     LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
         d_config.c_info.name, domid, (long)getpid());
 
+    ret = libxl_reg_aer_events_handler(ctx, domid, &aer_ws);
+    if (ret) {
+        /* Log the error, and move on... */
+        LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret);
+    }
+
     ret = libxl_evenable_domain_death(ctx, domid, 0, &deathw);
     if (ret) goto out;
 
@@ -993,6 +1000,7 @@ start:
             LOG("Domain %u has shut down, reason code %d 0x%x", domid,
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
+            libxl_unreg_aer_events_handler(ctx, domid, aer_ws);
             switch (handle_domain_death(&domid, event, &d_config)) {
             case DOMAIN_RESTART_SOFT_RESET:
                 domid_soft_reset = domid;
@@ -1059,6 +1067,7 @@ start:
 
         case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
             LOG("Domain %u has been destroyed.", domid);
+            libxl_unreg_aer_events_handler(ctx, domid, aer_ws);
             libxl_event_free(ctx, event);
             ret = 0;
             goto out;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-09-19 13:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19 13:23 [PATCH RESEND v4 0/2] Containing AER unrecoverable errors Venu Busireddy
2017-09-19 13:23 ` [PATCH RESEND v4 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
2017-09-19 13:23 ` [PATCH RESEND v4 2/2] xl: Register the AER event handler to handle " Venu Busireddy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.