All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Containing AER unrecoverable errors
@ 2017-08-07 23:54 ` Venu Busireddy
  2017-08-07 23:54   ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
  2017-08-07 23:54   ` [PATCH v3 2/2] xl: Register the AER event handler to handle " Venu Busireddy
  0 siblings, 2 replies; 19+ messages in thread
From: Venu Busireddy @ 2017-08-07 23:54 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu
  Cc: Andrew Cooper, Jan Beulich

This patch set is part of a set of patches that together allow containment
of unrecoverable AER errors from PCIe devices assigned to guests in
passthrough mode. The containment is achieved by forcibly removing the
erring PCIe device from the guest.

The original xen-pciback patch corresponding to this patch set is:
https://lists.xen.org/archives/html/xen-devel/2017-06/msg03274.html.
It will be reposted after this patch set is accepted.

Changes in v3:
  * Made the following changes suggested by Wei Liu.
    - Added LIBXL_HAVE macros to libxl.h.
    - Don't hard-code dom0's domid to 0. Instead, use libxl__get_domid().
    - Corrected comments.
  * Made the following changes based on comments from Ian Jackson.
    - Got rid of the global variable aer_watch.
    - Added documentation (comments in code) for the new API calls.
    - Removed the unnecessary writes to xenstore.

Changes in v2:
  - Instead of killing the guest and hiding the device, forcibly remove
    the device from the guest.

Venu Busireddy (2):
  libxl: Implement the handler to handle unrecoverable AER errors.
  xl: Register the AER event handler to handle AER errors.

 tools/libxl/libxl.h          | 14 +++++++
 tools/libxl/libxl_event.h    | 13 +++++++
 tools/libxl/libxl_internal.h |  7 ++++
 tools/libxl/libxl_pci.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
 tools/xl/xl_vmcontrol.c      |  9 +++++
 5 files changed, 133 insertions(+)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-08-07 23:54 ` [PATCH v3 " Venu Busireddy
@ 2017-08-07 23:54   ` Venu Busireddy
  2017-08-08 14:33     ` Wei Liu
  2017-08-07 23:54   ` [PATCH v3 2/2] xl: Register the AER event handler to handle " Venu Busireddy
  1 sibling, 1 reply; 19+ messages in thread
From: Venu Busireddy @ 2017-08-07 23:54 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu
  Cc: Andrew Cooper, Jan Beulich

Implement the callback function to handle unrecoverable AER errors, and
also the public APIs that can be used to register/unregister the handler.
When an AER error occurs, the handler will forcibly remove the erring
PCIe device from the guest.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/libxl/libxl.h          | 14 +++++++
 tools/libxl/libxl_event.h    | 13 +++++++
 tools/libxl/libxl_internal.h |  7 ++++
 tools/libxl/libxl_pci.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 124 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 7cf0f31..c5af0aa 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1044,6 +1044,20 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
  */
 #define LIBXL_HAVE_QED 1
 
+/* LIBXL_HAVE_REG_AER_EVENTS_HANDLER
+ *
+ * If it is defined, libxl has a library function called
+ * libxl_reg_aer_events_handler.
+ */
+#define LIBXL_HAVE_REG_AER_EVENTS_HANDLER 1
+
+/* LIBXL_HAVE_UNREG_AER_EVENTS_HANDLER
+ *
+ * If it is defined, libxl has a library function called
+ * libxl_unreg_aer_events_handler.
+ */
+#define LIBXL_HAVE_UNREG_AER_EVENTS_HANDLER 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 1ea789e..1aea906 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -184,6 +184,19 @@ void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
    * may generate only a DEATH event.
    */
 
+typedef struct libxl__aer_watch libxl_aer_watch;
+int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch **)
+                        LIBXL_EXTERNAL_CALLERS_ONLY;
+  /*
+   * Registers a handler to handle the occurrence of unrecoverable AER errors.
+   * This function depends on the calling application running the libxl's
+   * internal event loop. Toolstacks that do not use libxl's internal
+   * event loop must arrange to have their own event loop created and enter
+   * libxl (say, call libxl_event_wait()), to enable the event to be processed.
+   */
+void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch *)
+                        LIBXL_EXTERNAL_CALLERS_ONLY;
+
 typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;
 int libxl_evenable_disk_eject(libxl_ctx *ctx, uint32_t domid, const char *vdev,
                         libxl_ev_user, libxl_evgen_disk_eject **evgen_out);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index afe6652..2b74286 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -352,6 +352,13 @@ struct libxl__ev_child {
     LIBXL_LIST_ENTRY(struct libxl__ev_child) entry;
 };
 
+/*
+ * Structure used for AER event handling.
+ */
+struct libxl__aer_watch {
+    uint32_t domid;
+    libxl__ev_xswatch watch;
+};
 
 /*
  * evgen structures, which are the state we use for generating
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 65ad5e5..feedf27 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1678,6 +1678,96 @@ static int libxl_device_pci_compare(libxl_device_pci *d1,
     return COMPARE_PCI(d1, d2);
 }
 
+static void aer_backend_watch_callback(libxl__egc *egc,
+                                       libxl__ev_xswatch *watch,
+                                       const char *watch_path,
+                                       const char *event_path)
+{
+    EGC_GC;
+    libxl_aer_watch *aer_ws = CONTAINER_OF(watch, *aer_ws, watch);
+    int rc;
+    uint32_t dom, bus, dev, fn;
+    uint32_t domid = aer_ws->domid;
+    char *p, *path;
+    const char *aerFailedSBDF;
+    libxl_device_pci pcidev;
+
+    /* Extract the backend directory. */
+    path = libxl__strdup(gc, event_path);
+    p = strrchr(path, '/');
+    if ((p == NULL) || (strcmp(p, "/aerFailedSBDF") != 0))
+        return;
+    /* Truncate the string so it points to the backend directory. */
+    *p = '\0';
+
+    /* Fetch the value of the failed PCI device. */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
+    if (rc || !aerFailedSBDF)
+        return;
+    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &dom, &bus, &dev, &fn);
+
+    libxl_device_pci_init(&pcidev);
+    pcidev_struct_fill(&pcidev, dom, bus, dev, fn, 0);
+    /* Forcibly remove the device from the guest */
+    rc = libxl__device_pci_remove_common(gc, domid, &pcidev, 1);
+    if (rc)
+        LOGD(ERROR, domid, " libxl__device_pci_remove_common() failed, rc=x%x",
+                (unsigned int)rc);
+
+    return;
+}
+
+int libxl_reg_aer_events_handler(libxl_ctx *ctx,
+                                 uint32_t domid,
+                                 libxl_aer_watch **aer_ws_out)
+{
+    int rc = 0;
+    int dom0_domid;
+    char *be_path;
+    libxl_aer_watch *aer_ws = NULL;
+    GC_INIT(ctx);
+
+    *aer_ws_out = NULL;
+
+    rc = libxl__get_domid(gc, (uint32_t *)(&dom0_domid));
+    if (rc) {
+        LOGD(ERROR, domid, " libxl__get_domid() failed, rc = %d", rc);
+        goto out;
+    }
+
+    aer_ws = malloc(sizeof(libxl_aer_watch));
+    if (!aer_ws) {
+        rc = ERROR_NOMEM;
+        goto out;
+    }
+    memset(aer_ws, 0, sizeof(libxl_aer_watch));
+
+    aer_ws->domid = domid;
+    be_path = GCSPRINTF("/local/domain/%d/backend/pci/%u/%d/%s",
+            dom0_domid, domid, dom0_domid, "aerFailedSBDF");
+    rc = libxl__ev_xswatch_register(gc, &aer_ws->watch,
+            aer_backend_watch_callback, be_path);
+    *aer_ws_out = aer_ws;
+
+out:
+    GC_FREE;
+    return rc;
+}
+
+void libxl_unreg_aer_events_handler(libxl_ctx *ctx,
+                                    uint32_t domid,
+                                    libxl_aer_watch *aer_ws)
+{
+    GC_INIT(ctx);
+
+    libxl__ev_xswatch_deregister(gc, &aer_ws->watch);
+
+    free(aer_ws);
+    GC_FREE;
+    return;
+}
+
 DEFINE_DEVICE_TYPE_STRUCT_X(pcidev, pci);
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/2] xl: Register the AER event handler to handle AER errors
  2017-08-07 23:54 ` [PATCH v3 " Venu Busireddy
  2017-08-07 23:54   ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
@ 2017-08-07 23:54   ` Venu Busireddy
  1 sibling, 0 replies; 19+ messages in thread
From: Venu Busireddy @ 2017-08-07 23:54 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu
  Cc: Andrew Cooper, Jan Beulich

When a guest is created, register the AER event handler to handle the
AER errors. When an AER error occurs, the handler will forcibly remove
the erring PCIe device from the guest.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/xl/xl_vmcontrol.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25..9855cdb 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -656,6 +656,7 @@ int create_domain(struct domain_create *dom_info)
     const char *restore_source = NULL;
     int migrate_fd = dom_info->migrate_fd;
     bool config_in_json;
+    libxl_aer_watch *aer_ws = NULL;
 
     int i;
     int need_daemon = daemonize;
@@ -966,6 +967,12 @@ start:
     LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
         d_config.c_info.name, domid, (long)getpid());
 
+    ret = libxl_reg_aer_events_handler(ctx, domid, &aer_ws);
+    if (ret) {
+        /* Log the error, and move on... */
+        LOG("libxl_reg_aer_events_handler() failed, ret = 0x%08x", ret);
+    }
+
     ret = libxl_evenable_domain_death(ctx, domid, 0, &deathw);
     if (ret) goto out;
 
@@ -993,6 +1000,7 @@ start:
             LOG("Domain %u has shut down, reason code %d 0x%x", domid,
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
+            libxl_unreg_aer_events_handler(ctx, domid, aer_ws);
             switch (handle_domain_death(&domid, event, &d_config)) {
             case DOMAIN_RESTART_SOFT_RESET:
                 domid_soft_reset = domid;
@@ -1059,6 +1067,7 @@ start:
 
         case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
             LOG("Domain %u has been destroyed.", domid);
+            libxl_unreg_aer_events_handler(ctx, domid, aer_ws);
             libxl_event_free(ctx, event);
             ret = 0;
             goto out;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-08-07 23:54   ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
@ 2017-08-08 14:33     ` Wei Liu
  2017-08-08 14:40       ` Wei Liu
  2017-08-08 14:51       ` Venu Busireddy
  0 siblings, 2 replies; 19+ messages in thread
From: Wei Liu @ 2017-08-08 14:33 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich

On Mon, Aug 07, 2017 at 06:54:56PM -0500, Venu Busireddy wrote:
> Implement the callback function to handle unrecoverable AER errors, and
> also the public APIs that can be used to register/unregister the handler.
> When an AER error occurs, the handler will forcibly remove the erring
> PCIe device from the guest.
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  tools/libxl/libxl.h          | 14 +++++++
>  tools/libxl/libxl_event.h    | 13 +++++++
>  tools/libxl/libxl_internal.h |  7 ++++
>  tools/libxl/libxl_pci.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 124 insertions(+)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 7cf0f31..c5af0aa 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1044,6 +1044,20 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
>   */
>  #define LIBXL_HAVE_QED 1
>  
> +/* LIBXL_HAVE_REG_AER_EVENTS_HANDLER
> + *
> + * If it is defined, libxl has a library function called
> + * libxl_reg_aer_events_handler.
> + */
> +#define LIBXL_HAVE_REG_AER_EVENTS_HANDLER 1
> +
> +/* LIBXL_HAVE_UNREG_AER_EVENTS_HANDLER
> + *
> + * If it is defined, libxl has a library function called
> + * libxl_unreg_aer_events_handler.
> + */
> +#define LIBXL_HAVE_UNREG_AER_EVENTS_HANDLER 1
> +

You can consolidate both into

LIBLX_HAVE_AER_EVENTS_HANDLER

>  typedef char **libxl_string_list;
>  void libxl_string_list_dispose(libxl_string_list *sl);
>  int libxl_string_list_length(const libxl_string_list *sl);
> diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> index 1ea789e..1aea906 100644
> --- a/tools/libxl/libxl_event.h
> +++ b/tools/libxl/libxl_event.h
> @@ -184,6 +184,19 @@ void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
>     * may generate only a DEATH event.
>     */
>  
> +typedef struct libxl__aer_watch libxl_aer_watch;
> +int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch **)
> +                        LIBXL_EXTERNAL_CALLERS_ONLY;
> +  /*
> +   * Registers a handler to handle the occurrence of unrecoverable AER errors.
> +   * This function depends on the calling application running the libxl's
> +   * internal event loop. Toolstacks that do not use libxl's internal
> +   * event loop must arrange to have their own event loop created and enter
> +   * libxl (say, call libxl_event_wait()), to enable the event to be processed.
> +   */
> +void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch *)
> +                        LIBXL_EXTERNAL_CALLERS_ONLY;
> +
>  typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;
>  int libxl_evenable_disk_eject(libxl_ctx *ctx, uint32_t domid, const char *vdev,
>                          libxl_ev_user, libxl_evgen_disk_eject **evgen_out);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index afe6652..2b74286 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -352,6 +352,13 @@ struct libxl__ev_child {
>      LIBXL_LIST_ENTRY(struct libxl__ev_child) entry;
>  };
>  
> +/*
> + * Structure used for AER event handling.
> + */
> +struct libxl__aer_watch {
> +    uint32_t domid;
> +    libxl__ev_xswatch watch;
> +};
>  
>  /*
>   * evgen structures, which are the state we use for generating
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 65ad5e5..feedf27 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -1678,6 +1678,96 @@ static int libxl_device_pci_compare(libxl_device_pci *d1,
>      return COMPARE_PCI(d1, d2);
>  }
>  
> +static void aer_backend_watch_callback(libxl__egc *egc,
> +                                       libxl__ev_xswatch *watch,
> +                                       const char *watch_path,
> +                                       const char *event_path)
> +{
> +    EGC_GC;
> +    libxl_aer_watch *aer_ws = CONTAINER_OF(watch, *aer_ws, watch);
> +    int rc;
> +    uint32_t dom, bus, dev, fn;
> +    uint32_t domid = aer_ws->domid;
> +    char *p, *path;
> +    const char *aerFailedSBDF;
> +    libxl_device_pci pcidev;
> +
> +    /* Extract the backend directory. */
> +    path = libxl__strdup(gc, event_path);
> +    p = strrchr(path, '/');
> +    if ((p == NULL) || (strcmp(p, "/aerFailedSBDF") != 0))
> +        return;
> +    /* Truncate the string so it points to the backend directory. */
> +    *p = '\0';
> +
> +    /* Fetch the value of the failed PCI device. */
> +    rc = libxl__xs_read_checked(gc, XBT_NULL,
> +            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
> +    if (rc || !aerFailedSBDF)
> +        return;
> +    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &dom, &bus, &dev, &fn);
> +
> +    libxl_device_pci_init(&pcidev);
> +    pcidev_struct_fill(&pcidev, dom, bus, dev, fn, 0);
> +    /* Forcibly remove the device from the guest */
> +    rc = libxl__device_pci_remove_common(gc, domid, &pcidev, 1);
> +    if (rc)
> +        LOGD(ERROR, domid, " libxl__device_pci_remove_common() failed, rc=x%x",
> +                (unsigned int)rc);
> +
> +    return;
> +}
> +
> +int libxl_reg_aer_events_handler(libxl_ctx *ctx,
> +                                 uint32_t domid,
> +                                 libxl_aer_watch **aer_ws_out)

Afaict libxl_aer_watch is an opaque type to external caller, so this
won't work, right?

> +{
> +    int rc = 0;
> +    int dom0_domid;

uint32_t pciback_domid;

> +    char *be_path;
> +    libxl_aer_watch *aer_ws = NULL;
> +    GC_INIT(ctx);
> +
> +    *aer_ws_out = NULL;
> +
> +    rc = libxl__get_domid(gc, (uint32_t *)(&dom0_domid));
> +    if (rc) {
> +        LOGD(ERROR, domid, " libxl__get_domid() failed, rc = %d", rc);
> +        goto out;
> +    }
> +
> +    aer_ws = malloc(sizeof(libxl_aer_watch));

libxl__calloc(NOGC, ...);

And then you can skip the check and memset.

> +    if (!aer_ws) {
> +        rc = ERROR_NOMEM;
> +        goto out;
> +    }
> +    memset(aer_ws, 0, sizeof(libxl_aer_watch));
> +
> +    aer_ws->domid = domid;
> +    be_path = GCSPRINTF("/local/domain/%d/backend/pci/%u/%d/%s",
> +            dom0_domid, domid, dom0_domid, "aerFailedSBDF");

Use %u here.

> +    rc = libxl__ev_xswatch_register(gc, &aer_ws->watch,
> +            aer_backend_watch_callback, be_path);
> +    *aer_ws_out = aer_ws;
> +
> +out:
> +    GC_FREE;
> +    return rc;
> +}
> +
> +void libxl_unreg_aer_events_handler(libxl_ctx *ctx,
> +                                    uint32_t domid,
> +                                    libxl_aer_watch *aer_ws)
> +{
> +    GC_INIT(ctx);
> +
> +    libxl__ev_xswatch_deregister(gc, &aer_ws->watch);
> +
> +    free(aer_ws);
> +    GC_FREE;
> +    return;
> +}

I think a bigger question is whether you agree with Ian's comments
regarding API design and whether you have more questions?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-08-08 14:33     ` Wei Liu
@ 2017-08-08 14:40       ` Wei Liu
  2017-08-08 14:51       ` Venu Busireddy
  1 sibling, 0 replies; 19+ messages in thread
From: Wei Liu @ 2017-08-08 14:40 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich

On Tue, Aug 08, 2017 at 03:33:01PM +0100, Wei Liu wrote:
> > +
> > +int libxl_reg_aer_events_handler(libxl_ctx *ctx,
> > +                                 uint32_t domid,
> > +                                 libxl_aer_watch **aer_ws_out)
> 
> Afaict libxl_aer_watch is an opaque type to external caller, so this
> won't work, right?
> 

Oh, it is a pointer to a pointer, so the storage size is known.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-08-08 14:33     ` Wei Liu
  2017-08-08 14:40       ` Wei Liu
@ 2017-08-08 14:51       ` Venu Busireddy
  2017-08-08 15:12         ` Wei Liu
  2017-09-21 17:12         ` Ian Jackson
  1 sibling, 2 replies; 19+ messages in thread
From: Venu Busireddy @ 2017-08-08 14:51 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Ian Jackson, Jan Beulich, xen-devel

On 2017-08-08 15:33:01 +0100, Wei Liu wrote:
> On Mon, Aug 07, 2017 at 06:54:56PM -0500, Venu Busireddy wrote:
> > Implement the callback function to handle unrecoverable AER errors, and
> > also the public APIs that can be used to register/unregister the handler.
> > When an AER error occurs, the handler will forcibly remove the erring
> > PCIe device from the guest.
> > 
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > ---
> >  tools/libxl/libxl.h          | 14 +++++++
> >  tools/libxl/libxl_event.h    | 13 +++++++
> >  tools/libxl/libxl_internal.h |  7 ++++
> >  tools/libxl/libxl_pci.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 124 insertions(+)
> > 
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index 7cf0f31..c5af0aa 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -1044,6 +1044,20 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
> >   */
> >  #define LIBXL_HAVE_QED 1
> >  
> > +/* LIBXL_HAVE_REG_AER_EVENTS_HANDLER
> > + *
> > + * If it is defined, libxl has a library function called
> > + * libxl_reg_aer_events_handler.
> > + */
> > +#define LIBXL_HAVE_REG_AER_EVENTS_HANDLER 1
> > +
> > +/* LIBXL_HAVE_UNREG_AER_EVENTS_HANDLER
> > + *
> > + * If it is defined, libxl has a library function called
> > + * libxl_unreg_aer_events_handler.
> > + */
> > +#define LIBXL_HAVE_UNREG_AER_EVENTS_HANDLER 1
> > +
> 
> You can consolidate both into
> 
> LIBLX_HAVE_AER_EVENTS_HANDLER
> 
> >  typedef char **libxl_string_list;
> >  void libxl_string_list_dispose(libxl_string_list *sl);
> >  int libxl_string_list_length(const libxl_string_list *sl);
> > diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> > index 1ea789e..1aea906 100644
> > --- a/tools/libxl/libxl_event.h
> > +++ b/tools/libxl/libxl_event.h
> > @@ -184,6 +184,19 @@ void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
> >     * may generate only a DEATH event.
> >     */
> >  
> > +typedef struct libxl__aer_watch libxl_aer_watch;
> > +int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch **)
> > +                        LIBXL_EXTERNAL_CALLERS_ONLY;
> > +  /*
> > +   * Registers a handler to handle the occurrence of unrecoverable AER errors.
> > +   * This function depends on the calling application running the libxl's
> > +   * internal event loop. Toolstacks that do not use libxl's internal
> > +   * event loop must arrange to have their own event loop created and enter
> > +   * libxl (say, call libxl_event_wait()), to enable the event to be processed.
> > +   */
> > +void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t, libxl_aer_watch *)
> > +                        LIBXL_EXTERNAL_CALLERS_ONLY;
> > +
> >  typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;
> >  int libxl_evenable_disk_eject(libxl_ctx *ctx, uint32_t domid, const char *vdev,
> >                          libxl_ev_user, libxl_evgen_disk_eject **evgen_out);
> > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> > index afe6652..2b74286 100644
> > --- a/tools/libxl/libxl_internal.h
> > +++ b/tools/libxl/libxl_internal.h
> > @@ -352,6 +352,13 @@ struct libxl__ev_child {
> >      LIBXL_LIST_ENTRY(struct libxl__ev_child) entry;
> >  };
> >  
> > +/*
> > + * Structure used for AER event handling.
> > + */
> > +struct libxl__aer_watch {
> > +    uint32_t domid;
> > +    libxl__ev_xswatch watch;
> > +};
> >  
> >  /*
> >   * evgen structures, which are the state we use for generating
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 65ad5e5..feedf27 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -1678,6 +1678,96 @@ static int libxl_device_pci_compare(libxl_device_pci *d1,
> >      return COMPARE_PCI(d1, d2);
> >  }
> >  
> > +static void aer_backend_watch_callback(libxl__egc *egc,
> > +                                       libxl__ev_xswatch *watch,
> > +                                       const char *watch_path,
> > +                                       const char *event_path)
> > +{
> > +    EGC_GC;
> > +    libxl_aer_watch *aer_ws = CONTAINER_OF(watch, *aer_ws, watch);
> > +    int rc;
> > +    uint32_t dom, bus, dev, fn;
> > +    uint32_t domid = aer_ws->domid;
> > +    char *p, *path;
> > +    const char *aerFailedSBDF;
> > +    libxl_device_pci pcidev;
> > +
> > +    /* Extract the backend directory. */
> > +    path = libxl__strdup(gc, event_path);
> > +    p = strrchr(path, '/');
> > +    if ((p == NULL) || (strcmp(p, "/aerFailedSBDF") != 0))
> > +        return;
> > +    /* Truncate the string so it points to the backend directory. */
> > +    *p = '\0';
> > +
> > +    /* Fetch the value of the failed PCI device. */
> > +    rc = libxl__xs_read_checked(gc, XBT_NULL,
> > +            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
> > +    if (rc || !aerFailedSBDF)
> > +        return;
> > +    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &dom, &bus, &dev, &fn);
> > +
> > +    libxl_device_pci_init(&pcidev);
> > +    pcidev_struct_fill(&pcidev, dom, bus, dev, fn, 0);
> > +    /* Forcibly remove the device from the guest */
> > +    rc = libxl__device_pci_remove_common(gc, domid, &pcidev, 1);
> > +    if (rc)
> > +        LOGD(ERROR, domid, " libxl__device_pci_remove_common() failed, rc=x%x",
> > +                (unsigned int)rc);
> > +
> > +    return;
> > +}
> > +
> > +int libxl_reg_aer_events_handler(libxl_ctx *ctx,
> > +                                 uint32_t domid,
> > +                                 libxl_aer_watch **aer_ws_out)
> 
> Afaict libxl_aer_watch is an opaque type to external caller, so this
> won't work, right?
> 
> > +{
> > +    int rc = 0;
> > +    int dom0_domid;
> 
> uint32_t pciback_domid;
> 
> > +    char *be_path;
> > +    libxl_aer_watch *aer_ws = NULL;
> > +    GC_INIT(ctx);
> > +
> > +    *aer_ws_out = NULL;
> > +
> > +    rc = libxl__get_domid(gc, (uint32_t *)(&dom0_domid));
> > +    if (rc) {
> > +        LOGD(ERROR, domid, " libxl__get_domid() failed, rc = %d", rc);
> > +        goto out;
> > +    }
> > +
> > +    aer_ws = malloc(sizeof(libxl_aer_watch));
> 
> libxl__calloc(NOGC, ...);
> 
> And then you can skip the check and memset.
> 
> > +    if (!aer_ws) {
> > +        rc = ERROR_NOMEM;
> > +        goto out;
> > +    }
> > +    memset(aer_ws, 0, sizeof(libxl_aer_watch));
> > +
> > +    aer_ws->domid = domid;
> > +    be_path = GCSPRINTF("/local/domain/%d/backend/pci/%u/%d/%s",
> > +            dom0_domid, domid, dom0_domid, "aerFailedSBDF");
> 
> Use %u here.
> 
> > +    rc = libxl__ev_xswatch_register(gc, &aer_ws->watch,
> > +            aer_backend_watch_callback, be_path);
> > +    *aer_ws_out = aer_ws;
> > +
> > +out:
> > +    GC_FREE;
> > +    return rc;
> > +}
> > +
> > +void libxl_unreg_aer_events_handler(libxl_ctx *ctx,
> > +                                    uint32_t domid,
> > +                                    libxl_aer_watch *aer_ws)
> > +{
> > +    GC_INIT(ctx);
> > +
> > +    libxl__ev_xswatch_deregister(gc, &aer_ws->watch);
> > +
> > +    free(aer_ws);
> > +    GC_FREE;
> > +    return;
> > +}

I will implement all your above suggestions in v4.

> I think a bigger question is whether you agree with Ian's comments
> regarding API design and whether you have more questions?

Ian suggested that I document the use of the API (about the event loop),
and I believe I addressed it. I don't have any more questions. Just
waiting for Ian's "Ack", or more comments.

Venu



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-08-08 14:51       ` Venu Busireddy
@ 2017-08-08 15:12         ` Wei Liu
  2017-09-21 17:12         ` Ian Jackson
  1 sibling, 0 replies; 19+ messages in thread
From: Wei Liu @ 2017-08-08 15:12 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich

On Tue, Aug 08, 2017 at 09:51:36AM -0500, Venu Busireddy wrote:
> I will implement all your above suggestions in v4.
> 
> > I think a bigger question is whether you agree with Ian's comments
> > regarding API design and whether you have more questions?
> 
> Ian suggested that I document the use of the API (about the event loop),
> and I believe I addressed it. I don't have any more questions. Just
> waiting for Ian's "Ack", or more comments.
> 

Ian is currently away and won't be back till August 28 iirc. I suggest
you wait for him to comment. I am not quite sure what he asked for.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-08-08 14:51       ` Venu Busireddy
  2017-08-08 15:12         ` Wei Liu
@ 2017-09-21 17:12         ` Ian Jackson
  2018-04-03 15:06           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages] Ian Jackson
  2018-04-10 19:52           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
  1 sibling, 2 replies; 19+ messages in thread
From: Ian Jackson @ 2017-09-21 17:12 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

Venu Busireddy writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"):
> On 2017-08-08 15:33:01 +0100, Wei Liu wrote:
> > I think a bigger question is whether you agree with Ian's comments
> > regarding API design and whether you have more questions?
> 
> Ian suggested that I document the use of the API (about the event loop),
> and I believe I addressed it. I don't have any more questions. Just
> waiting for Ian's "Ack", or more comments.

I'm afraid that I still have reservations about the design questions.
Evidently I didn't make my questions clear enough.

The most important question that seems unanswered to me is this:

  Why is this only sometimes the right thing to do ?  On what basis
  might a user choose ?

To which you answered:

  This is not an "only sometimes" thing. User doesn't choose it. We always
  want to watch for AER errors.

But this leads to more fundamental questions.

If this behaviour is always required, why do we have an API call to
request it ?  It sounds like not calling this new function of yours is
always a mistake.  Ie this function (which has an obscure name) is
like "IAC DONT RANDOMY-LOSE" (see RFC748, from 1st April 1978)
except that you are making DO RANDOMLY-LOSE the default (in violation
of the RFC, should anyone talk to the server over telnet...)

If you are inventing a new kind of monitoring process that must be run
for all domains, that is a thing that libxl does not have right now.
At least, it doesn't have it in this form.  (xl has the reboot
monitor, and this is done differently in libvirt.)

It was indeed a design principle of libxl that it should (at least,
wherever possible) be possible to run a domain _without_ a monitoring
process imposed by libxl.

So: why is what this API call requests, not done automatically by
pciback or by Xen ?

And: if you are inventing a new monitoring process that must be run
for every domain, you should call this out much more explicitly as a
fundamental design change.

We will then have to think about more questions: should this process
be run automatically by libxl, without special application request
(like the way that libxl runs qemu) ?

If not, how do we ensure that exactly one of these processes is
running for each guest ?

If your new design involves new behaviour in callers of libxl, do you
intend to send patches for libvirt to enable it ?


Looking at the code:

You handle errors by logging and continuing.  Why is that correct ?

If we are to keep the current API for the client, it needs to have
better doc comments.

Is the xenstore watch implementation vulnerable to unexpected paths
appearing in watch events ?

Why is the API not a never-completing ao ?  Or, why is it not an
evreg ?

But the fundamental design questions need answering first.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RESEND PATCH v5 0/2] Containing AER unrecoverable errors
@ 2018-04-02 16:25 Venu Busireddy
  2017-08-07 23:54 ` [PATCH v3 " Venu Busireddy
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Venu Busireddy @ 2018-04-02 16:25 UTC (permalink / raw)
  To: venu.busireddy, xen-devel
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Ian Jackson,
	Wim Ten Have, Jan Beulich

This patch set is part of a set of patches that together allow containment
of unrecoverable AER errors from PCIe devices assigned to guests in
passthrough mode. The containment is achieved by forcibly removing the
erring PCIe device from the guest.

The original xen-pciback patch corresponding to this patch set is:
https://lists.xen.org/archives/html/xen-devel/2017-06/msg03274.html.
It will be reposted after this patch set is accepted.

Changes in v5:
  * v4 worked only in the case of guests created using 'xl' command.
    Enhanced the fix to work for guests created using libvirt too.

Changes in v4:
  * Made the following changes suggested by Wei Liu.
    - Combine multiple LIBXL_HAVE_* definitions into one.
    - Use libxl__calloc() instead of malloc().

Changes in v3:
  * Made the following changes suggested by Wei Liu.
    - Added LIBXL_HAVE macros to libxl.h.
    - Don't hard-code dom0's domid to 0. Instead, use libxl__get_domid().
    - Corrected comments.
  * Made the following changes based on comments from Ian Jackson.
    - Got rid of the global variable aer_watch.
    - Added documentation (comments in code) for the new API calls.
    - Removed the unnecessary writes to xenstore.

Changes in v2:
  - Instead of killing the guest and hiding the device, forcibly remove
    the device from the guest.

Venu Busireddy (2):
  libxl: Implement the handler to handle unrecoverable AER errors
  xl: Register the AER event handler that handles AER errors

 tools/libxl/libxl.h          |   7 +++
 tools/libxl/libxl_create.c   |  11 +++-
 tools/libxl/libxl_domain.c   |   1 +
 tools/libxl/libxl_event.h    |   7 +++
 tools/libxl/libxl_internal.h |   8 +++
 tools/libxl/libxl_pci.c      | 123 +++++++++++++++++++++++++++++++++++++++++++
 tools/xl/xl_vmcontrol.c      |  14 ++++-
 7 files changed, 168 insertions(+), 3 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RESEND PATCH v5 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2018-04-02 16:25 [RESEND PATCH v5 0/2] Containing AER unrecoverable errors Venu Busireddy
  2017-08-07 23:54 ` [PATCH v3 " Venu Busireddy
@ 2018-04-02 16:25 ` Venu Busireddy
  2018-04-02 16:25 ` [RESEND PATCH v5 2/2] xl: Register the AER event handler that handles " Venu Busireddy
  2 siblings, 0 replies; 19+ messages in thread
From: Venu Busireddy @ 2018-04-02 16:25 UTC (permalink / raw)
  To: venu.busireddy, xen-devel
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Ian Jackson,
	Wim Ten Have, Jan Beulich

Implement the callback function to handle unrecoverable AER errors, and
also the public APIs that can be used to register/unregister the handler.
When an AER error occurs, the handler will forcibly remove the erring
PCIe device from the guest.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 tools/libxl/libxl.h          |   7 +++
 tools/libxl/libxl_event.h    |   7 +++
 tools/libxl/libxl_internal.h |   8 +++
 tools/libxl/libxl_pci.c      | 123 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 145 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index eca0ea2c50..99a3c8ae1f 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1120,6 +1120,13 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
  */
 #define LIBXL_HAVE_PV_SHIM 1
 
+/* LIBXL_HAVE_AER_EVENTS_HANDLER
+ *
+ * If this is defined, libxl has the library functions called
+ * libxl_reg_aer_events_handler and libxl_unreg_aer_events_handler.
+ */
+#define LIBXL_HAVE_AER_EVENTS_HANDLER 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 1ea789e231..63c29ae800 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -184,6 +184,13 @@ void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
    * may generate only a DEATH event.
    */
 
+typedef struct libxl__aer_watch libxl_aer_watch;
+int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t);
+  /*
+   * Registers a handler to handle the occurrence of unrecoverable AER errors.
+   */
+void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t);
+
 typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;
 int libxl_evenable_disk_eject(libxl_ctx *ctx, uint32_t domid, const char *vdev,
                         libxl_ev_user, libxl_evgen_disk_eject **evgen_out);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 506687fbe9..7972490050 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -356,6 +356,14 @@ struct libxl__ev_child {
     LIBXL_LIST_ENTRY(struct libxl__ev_child) entry;
 };
 
+/*
+ * Structure used for AER event handling.
+ */
+struct libxl__aer_watch {
+    uint32_t domid;
+    libxl__ev_xswatch watch;
+    struct libxl__aer_watch *next;
+};
 
 /*
  * evgen structures, which are the state we use for generating
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 4755a0c93c..c121c9f8cc 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1686,6 +1686,129 @@ static int libxl_device_pci_compare(libxl_device_pci *d1,
     return COMPARE_PCI(d1, d2);
 }
 
+static void aer_backend_watch_callback(libxl__egc *egc,
+                                       libxl__ev_xswatch *watch,
+                                       const char *watch_path,
+                                       const char *event_path)
+{
+    EGC_GC;
+    libxl_aer_watch *aer_ws = CONTAINER_OF(watch, *aer_ws, watch);
+    int rc;
+    uint32_t dom, bus, dev, fn;
+    uint32_t domid = aer_ws->domid;
+    char *p, *path;
+    const char *aerFailedSBDF;
+    libxl_device_pci pcidev;
+
+    /* Extract the backend directory. */
+    path = libxl__strdup(gc, event_path);
+    p = strrchr(path, '/');
+    if ((p == NULL) || (strcmp(p, "/aerFailedSBDF") != 0))
+        return;
+    /* Truncate the string so it points to the backend directory. */
+    *p = '\0';
+
+    /* Fetch the value of the failed PCI device. */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+            GCSPRINTF("%s/aerFailedSBDF", path), &aerFailedSBDF);
+    if (rc || !aerFailedSBDF)
+        return;
+    LOGD(ERROR, domid, " aerFailedSBDF = %s", aerFailedSBDF);
+    sscanf(aerFailedSBDF, "%x:%x:%x.%x", &dom, &bus, &dev, &fn);
+
+    libxl_device_pci_init(&pcidev);
+    pcidev_struct_fill(&pcidev, dom, bus, dev, fn, 0);
+    /* Forcibly remove the device from the guest */
+    rc = libxl__device_pci_remove_common(gc, domid, &pcidev, 1);
+    if (rc)
+        LOGD(ERROR, domid, " libxl__device_pci_remove_common() failed, rc=x%x",
+                (unsigned int)rc);
+
+    return;
+}
+
+static libxl_aer_watch *manage_aer_ws_list(libxl_aer_watch *in, uint32_t domid)
+{
+    static libxl_aer_watch *aer_ws = NULL;
+    libxl_aer_watch *iter, *prev = NULL;
+
+    if (in) {
+        if (aer_ws)
+            in->next = aer_ws;
+        iter = aer_ws = in;
+    } else {
+        iter = aer_ws;
+        while (iter) {
+            if (iter->domid == domid) {
+                if (prev)
+                    prev->next = iter->next;
+                else
+                    aer_ws = iter->next;
+                break;
+            }
+            prev = iter;
+            iter = iter->next;
+        }
+    }
+    return iter;
+}
+
+static void store_aer_ws(libxl_aer_watch *aer_ws)
+{
+    manage_aer_ws_list(aer_ws, 0);
+    return;
+}
+
+static libxl_aer_watch *retrieve_aer_ws(uint32_t domid)
+{
+    return manage_aer_ws_list(NULL, domid);
+}
+
+int libxl_reg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
+{
+    int rc = 0;
+    char *be_path;
+    uint32_t pciback_domid;
+    libxl_aer_watch *aer_ws;
+    GC_INIT(ctx);
+
+    rc = libxl__get_domid(gc, (uint32_t *)(&pciback_domid));
+    if (rc) {
+        LOGD(ERROR, domid, " libxl__get_domid() failed, rc = %d", rc);
+        goto out;
+    }
+
+    aer_ws = libxl__calloc(NOGC, 1, sizeof(libxl_aer_watch));
+    aer_ws->domid = domid;
+    aer_ws->next = NULL;
+    store_aer_ws(aer_ws);
+    be_path = GCSPRINTF("/local/domain/%u/backend/pci/%u/%u/%s",
+            pciback_domid, domid, pciback_domid, "aerFailedSBDF");
+    rc = libxl__ev_xswatch_register(gc, &aer_ws->watch,
+            aer_backend_watch_callback, be_path);
+
+out:
+    GC_FREE;
+    return rc;
+}
+
+void libxl_unreg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
+{
+    GC_INIT(ctx);
+    libxl_aer_watch *aer_ws;
+
+    aer_ws = retrieve_aer_ws(domid);
+    if (!aer_ws)
+        goto out;
+
+    libxl__ev_xswatch_deregister(gc, &aer_ws->watch);
+    free(aer_ws);
+
+out:
+    GC_FREE;
+    return;
+}
+
 #define libxl__device_pci_update_devid NULL
 
 DEFINE_DEVICE_TYPE_STRUCT_X(pcidev, pci, PCI);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RESEND PATCH v5 2/2] xl: Register the AER event handler that handles AER errors
  2018-04-02 16:25 [RESEND PATCH v5 0/2] Containing AER unrecoverable errors Venu Busireddy
  2017-08-07 23:54 ` [PATCH v3 " Venu Busireddy
  2018-04-02 16:25 ` [RESEND PATCH v5 1/2] libxl: Implement the handler to handle unrecoverable " Venu Busireddy
@ 2018-04-02 16:25 ` Venu Busireddy
  2 siblings, 0 replies; 19+ messages in thread
From: Venu Busireddy @ 2018-04-02 16:25 UTC (permalink / raw)
  To: venu.busireddy, xen-devel
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Ian Jackson,
	Wim Ten Have, Jan Beulich

When a guest is created, register the AER event handler to handle the
AER errors. When an AER error occurs, the handler will forcibly remove
the erring PCIe device from the guest.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Wim Ten Have <wim.ten.have@oracle.com>
---
 tools/libxl/libxl_create.c | 11 +++++++++--
 tools/libxl/libxl_domain.c |  1 +
 tools/xl/xl_vmcontrol.c    | 14 +++++++++++++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c498135246..2d247da5f0 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1663,7 +1663,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
 {
     AO_CREATE(ctx, 0, ao_how);
     libxl__app_domain_create_state *cdcs;
-    int rc;
+    int rc, ao_rc;
 
     GCNEW(cdcs);
     cdcs->dcs.ao = ao;
@@ -1698,7 +1698,14 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
 
     initiate_domain_create(egc, &cdcs->dcs);
 
-    return AO_INPROGRESS;
+    ao_rc = AO_INPROGRESS;
+    rc = libxl_reg_aer_events_handler(ctx, *domid);
+    if (rc) {
+        /* Log the error, and move on... */
+        LOGD(ERROR, *domid,
+                "libxl_reg_aer_events_handler() failed, rc = %d", rc);
+    }
+    return ao_rc;
 
  out_err:
     return AO_CREATE_FAIL(rc);
diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c
index 13b1c73d40..b8fb5e0349 100644
--- a/tools/libxl/libxl_domain.c
+++ b/tools/libxl/libxl_domain.c
@@ -906,6 +906,7 @@ void libxl__domain_destroy(libxl__egc *egc, libxl__domain_destroy_state *dds)
     STATE_AO_GC(dds->ao);
     uint32_t stubdomid = libxl_get_stubdom_id(CTX, dds->domid);
 
+    libxl_unreg_aer_events_handler(CTX, dds->domid);
     if (stubdomid) {
         dds->stubdom.ao = ao;
         dds->stubdom.domid = stubdomid;
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25ded..5bf415fa6e 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -945,8 +945,11 @@ start:
         libxl_domain_unpause(ctx, domid);
 
     ret = domid; /* caller gets success in parent */
-    if (!daemonize && !monitor)
+    if (!daemonize && !monitor) {
+        /* Unregister aer events handler before returning/exiting */
+        libxl_unreg_aer_events_handler(ctx, domid);
         goto out;
+    }
 
     if (dom_info->vnc)
         autoconnect_vncviewer(domid, vncautopass);
@@ -958,9 +961,17 @@ start:
         ret = do_daemonize(name, NULL);
         free(name);
         if (ret) {
+            /* Unregister aer events handler before returning/exiting */
+            libxl_unreg_aer_events_handler(ctx, domid);
             ret = (ret == 1) ? domid : ret;
             goto out;
         }
+        /* Child has new ctx. Re-register the events handler in child's ctx */
+        ret = libxl_reg_aer_events_handler(ctx, domid);
+        if (ret) {
+            /* Log the error, and move on... */
+            LOG("libxl_reg_aer_events_handler() failed, ret = %d", ret);
+        }
         need_daemon = 0;
     }
     LOG("Waiting for domain %s (domid %u) to die [pid %ld]",
@@ -1059,6 +1070,7 @@ start:
 
         case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
             LOG("Domain %u has been destroyed.", domid);
+            libxl_unreg_aer_events_handler(ctx, domid);
             libxl_event_free(ctx, event);
             ret = 0;
             goto out;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages]
  2017-09-21 17:12         ` Ian Jackson
@ 2018-04-03 15:06           ` Ian Jackson
  2018-04-03 15:28             ` Wim ten Have
  2018-04-03 15:29             ` Venu Busireddy
  2018-04-10 19:52           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
  1 sibling, 2 replies; 19+ messages in thread
From: Ian Jackson @ 2018-04-03 15:06 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Wim Ten Have, xen-devel,
	Jan Beulich

Venu Busireddy writes ("[RESEND PATCH v5 0/2] Containing AER unrecoverable errors"):
> This patch set is part of a set of patches that together allow containment
> of unrecoverable AER errors from PCIe devices assigned to guests in
> passthrough mode. The containment is achieved by forcibly removing the
> erring PCIe device from the guest.

But, in September, I wrote:

Ian Jackson writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"):
> I'm afraid that I still have reservations about the design questions.
> Evidently I didn't make my questions clear enough.
> 
> [ 64 lines of detailed discussion elided ]

I haven't seen a reply to that.

Also, from the patch v5:
> Changes in v5:
>   * v4 worked only in the case of guests created using 'xl' command.
>     Enhanced the fix to work for guests created using libvirt too.

I'm confused by the responses in the thread which relate to libvirt.
ISTM that a libvirt patch is also required.  Do you mean that in v5
there is also a libvirt patch ?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages]
  2018-04-03 15:06           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages] Ian Jackson
@ 2018-04-03 15:28             ` Wim ten Have
  2018-04-03 15:29             ` Venu Busireddy
  1 sibling, 0 replies; 19+ messages in thread
From: Wim ten Have @ 2018-04-03 15:28 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Wim ten Have, xen-devel,
	Venu Busireddy, Jan Beulich

On Tue, 3 Apr 2018 16:06:17 +0100
Ian Jackson <ian.jackson@citrix.com> wrote:

> Venu Busireddy writes ("[RESEND PATCH v5 0/2] Containing AER unrecoverable errors"):
> > This patch set is part of a set of patches that together allow containment
> > of unrecoverable AER errors from PCIe devices assigned to guests in
> > passthrough mode. The containment is achieved by forcibly removing the
> > erring PCIe device from the guest.  
> 
> But, in September, I wrote:
> 
> Ian Jackson writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"):
> > I'm afraid that I still have reservations about the design questions.
> > Evidently I didn't make my questions clear enough.
> > 
> > [ 64 lines of detailed discussion elided ]  
> 
> I haven't seen a reply to that.
> 
> Also, from the patch v5:
> > Changes in v5:
> >   * v4 worked only in the case of guests created using 'xl' command.
> >     Enhanced the fix to work for guests created using libvirt too.  
> 
> I'm confused by the responses in the thread which relate to libvirt.
> ISTM that a libvirt patch is also required.  Do you mean that in v5
> there is also a libvirt patch ?

  Hi Ian, the patch as proposed brings the specific support to libvirtd by
  libvirtd dependencies towards libxenlight...so (shared object) when Xen
  driver under specific toolstack is in effect.  

Rgds,
- Wim ten Have.

> Ian.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages]
  2018-04-03 15:06           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages] Ian Jackson
  2018-04-03 15:28             ` Wim ten Have
@ 2018-04-03 15:29             ` Venu Busireddy
  2018-04-03 16:51               ` Ian Jackson
  1 sibling, 1 reply; 19+ messages in thread
From: Venu Busireddy @ 2018-04-03 15:29 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Wim Ten Have, xen-devel,
	Jan Beulich

On 2018-04-03 16:06:17 +0100, Ian Jackson wrote:
> Venu Busireddy writes ("[RESEND PATCH v5 0/2] Containing AER unrecoverable errors"):
> > This patch set is part of a set of patches that together allow containment
> > of unrecoverable AER errors from PCIe devices assigned to guests in
> > passthrough mode. The containment is achieved by forcibly removing the
> > erring PCIe device from the guest.
> 
> But, in September, I wrote:
> 
> Ian Jackson writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"):
> > I'm afraid that I still have reservations about the design questions.
> > Evidently I didn't make my questions clear enough.
> > 
> > [ 64 lines of detailed discussion elided ]
> 
> I haven't seen a reply to that.

Reply to that is the v5 patch. Your concern in v4 was, "why is this
error handling done only in some cases?" Meaning, the error handling
happens only for guests created using xl, but it does not happen for
guests created using libvirt. I addressed that in the v5 patch. Please
see below for more details.

> Also, from the patch v5:
> > Changes in v5:
> >   * v4 worked only in the case of guests created using 'xl' command.
> >     Enhanced the fix to work for guests created using libvirt too.
> 
> I'm confused by the responses in the thread which relate to libvirt.
> ISTM that a libvirt patch is also required.  Do you mean that in v5
> there is also a libvirt patch ?

libvirt ends up calling do_domain_create() in tools/libxl/libxl_create.c,
and that is where I am registering the error handler. That change takes
care of guests created using xl command as well as libvirt. Hence there
is no change in libvirt.

Venu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages]
  2018-04-03 15:29             ` Venu Busireddy
@ 2018-04-03 16:51               ` Ian Jackson
  2018-04-10 19:50                 ` Venu Busireddy
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Jackson @ 2018-04-03 16:51 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Wim Ten Have, xen-devel,
	Jan Beulich

Venu Busireddy writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages]"):
> On 2018-04-03 16:06:17 +0100, Ian Jackson wrote:
> > Ian Jackson writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"):
> > > I'm afraid that I still have reservations about the design questions.
> > > Evidently I didn't make my questions clear enough.
> > > 
> > > [ 64 lines of detailed discussion elided ]
> > 
> > I haven't seen a reply to that.
> 
> Reply to that is the v5 patch. Your concern in v4 was, "why is this
> error handling done only in some cases?" Meaning, the error handling
> happens only for guests created using xl, but it does not happen for
> guests created using libvirt. I addressed that in the v5 patch. Please
> see below for more details.

Oh.  I see.

> > I'm confused by the responses in the thread which relate to libvirt.
> > ISTM that a libvirt patch is also required.  Do you mean that in v5
> > there is also a libvirt patch ?
> 
> libvirt ends up calling do_domain_create() in tools/libxl/libxl_create.c,
> and that is where I am registering the error handler. That change takes
> care of guests created using xl command as well as libvirt. Hence there
> is no change in libvirt.

I'm sorry to say that this is completely wrong.  I didn't spot that
hunk in the v5 2/2 patch.  I don't think your description in your v4
to v5 changes summary really highlights the substantial design change.

I think it would have been better to reply to my prose email.  We
would have been able to explore the design possibilities.

What you have done is wrong because:

 * You have removed the libxl__aer_watch from the
   libxl_reg_aer_events_handler API which means the effect is now
   global for the ctx.  This is not correct for a libxl event
   generation request function.  (Although this isn't one.)

 * Not all callers of libxl will necessarily retain the process, or
   the ctx, in which they called libxl_domain_create.  I think libvirt
   does (but I'm not sure), and xl usually does, but it's not
   guaranteed.

 * It's quite unclear why this function is a public one.

The entire approach is wrong, I'm afraid.

We need to go back to the design.  Please would you reply to my mail
from September.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages]
  2018-04-03 16:51               ` Ian Jackson
@ 2018-04-10 19:50                 ` Venu Busireddy
  0 siblings, 0 replies; 19+ messages in thread
From: Venu Busireddy @ 2018-04-10 19:50 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Elena Ufimtseva, Wei Liu, Andrew Cooper, Wim Ten Have, xen-devel,
	Jan Beulich

On 2018-04-03 17:51:50 +0100, Ian Jackson wrote:
> Venu Busireddy writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages]"):
> > On 2018-04-03 16:06:17 +0100, Ian Jackson wrote:
> > > Ian Jackson writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"):
> > > > I'm afraid that I still have reservations about the design questions.
> > > > Evidently I didn't make my questions clear enough.
> > > > 
> > > > [ 64 lines of detailed discussion elided ]
> > > 
> > > I haven't seen a reply to that.
> > 
> > Reply to that is the v5 patch. Your concern in v4 was, "why is this
> > error handling done only in some cases?" Meaning, the error handling
> > happens only for guests created using xl, but it does not happen for
> > guests created using libvirt. I addressed that in the v5 patch. Please
> > see below for more details.
> 
> Oh.  I see.
> 
> > > I'm confused by the responses in the thread which relate to libvirt.
> > > ISTM that a libvirt patch is also required.  Do you mean that in v5
> > > there is also a libvirt patch ?
> > 
> > libvirt ends up calling do_domain_create() in tools/libxl/libxl_create.c,
> > and that is where I am registering the error handler. That change takes
> > care of guests created using xl command as well as libvirt. Hence there
> > is no change in libvirt.
> 
> I'm sorry to say that this is completely wrong.  I didn't spot that
> hunk in the v5 2/2 patch.  I don't think your description in your v4
> to v5 changes summary really highlights the substantial design change.
> 
> I think it would have been better to reply to my prose email.  We
> would have been able to explore the design possibilities.
> 
> What you have done is wrong because:
> 
>  * You have removed the libxl__aer_watch from the
>    libxl_reg_aer_events_handler API which means the effect is now
>    global for the ctx.  This is not correct for a libxl event
>    generation request function.  (Although this isn't one.)
> 
>  * Not all callers of libxl will necessarily retain the process, or
>    the ctx, in which they called libxl_domain_create.  I think libvirt
>    does (but I'm not sure), and xl usually does, but it's not
>    guaranteed.
> 
>  * It's quite unclear why this function is a public one.
> 
> The entire approach is wrong, I'm afraid.
> 
> We need to go back to the design.  Please would you reply to my mail
> from September.

Sure. I will reply to your email from September.

Venu

> Thanks,
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors
  2017-09-21 17:12         ` Ian Jackson
  2018-04-03 15:06           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages] Ian Jackson
@ 2018-04-10 19:52           ` Venu Busireddy
  1 sibling, 0 replies; 19+ messages in thread
From: Venu Busireddy @ 2018-04-10 19:52 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

On 2017-09-21 18:12:54 +0100, Ian Jackson wrote:
> Venu Busireddy writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"):
> > On 2017-08-08 15:33:01 +0100, Wei Liu wrote:
> > > I think a bigger question is whether you agree with Ian's comments
> > > regarding API design and whether you have more questions?
> > 
> > Ian suggested that I document the use of the API (about the event loop),
> > and I believe I addressed it. I don't have any more questions. Just
> > waiting for Ian's "Ack", or more comments.
> 
> I'm afraid that I still have reservations about the design questions.
> Evidently I didn't make my questions clear enough.
> 
> The most important question that seems unanswered to me is this:
> 
>   Why is this only sometimes the right thing to do ?  On what basis
>   might a user choose ?
> 
> To which you answered:
> 
>   This is not an "only sometimes" thing. User doesn't choose it. We always
>   want to watch for AER errors.
> 
> But this leads to more fundamental questions.
> 
> If this behaviour is always required, why do we have an API call to
> request it ?  It sounds like not calling this new function of yours is
> always a mistake.  Ie this function (which has an obscure name) is

Yes, this behavior is always required. I will remove the API calls,
and in their place, create wrapper functions that can be called from
different places in libxenlight as needed.

libxl_reg_aer_events_handler() will be created as a wrapper function
that hides the internal details of calling libxl__ev_xswatch_register().

libxl_unreg_aer_events_handler() will be created as a wrapper function
that hides the internal details of calling libxl__ev_xswatch_deregister().

> like "IAC DONT RANDOMY-LOSE" (see RFC748, from 1st April 1978)
> except that you are making DO RANDOMLY-LOSE the default (in violation
> of the RFC, should anyone talk to the server over telnet...)
> 
> If you are inventing a new kind of monitoring process that must be run
> for all domains, that is a thing that libxl does not have right now.
> At least, it doesn't have it in this form.  (xl has the reboot
> monitor, and this is done differently in libvirt.)

I am not inventing a new kind of monitoring process. My understanding
is that each domain already has its own monitoring process.

If the toolstack used is xl, I am proposing to use the 'xl' command as
the monitoring process (whether daemonized or not).

If the toolstack used is libvirt/virsh, I am proposing to use the daemon
'libvirtd' as the monitoring process. That process always stays around.

> It was indeed a design principle of libxl that it should (at least,
> wherever possible) be possible to run a domain _without_ a monitoring
> process imposed by libxl.
> 
> So: why is what this API call requests, not done automatically by
> pciback or by Xen ?

When you say "Xen", I am assuming you meant "libxenlight." If so, the
answer is Yes. I am proposing to register/unregister the event handler
in libxenlight, so that all toolstacks can make use of the monitoring.

> And: if you are inventing a new monitoring process that must be run
> for every domain, you should call this out much more explicitly as a
> fundamental design change.
> 
> We will then have to think about more questions: should this process
> be run automatically by libxl, without special application request
> (like the way that libxl runs qemu) ?

I am not inventing a new monitoring process, and hence, this isn't
applicable.

> If not, how do we ensure that exactly one of these processes is
> running for each guest ?

As I mentioned in the earlier paragraphs above, there will only be one
process per domain. Never more than one process per domain. When using
the xl toolstack, that one process per domain is the 'xl' command. When
using the libvirt/virsh toolstack, then there is only one process (the
'libvirtd' daemon) that is monitoring all the domains. Therefore, there
will never be more than process per domain.

> If your new design involves new behaviour in callers of libxl, do you
> intend to send patches for libvirt to enable it ?

The new design does not involve any new behavior in the callers of
libxl. Registration of the event handler happens transparently, without
any explicit request from the libxl callers.

> Looking at the code:
> 
> You handle errors by logging and continuing.  Why is that correct ?

In the unlikely case of failure to register the event handler, we felt
that it is better to continue with the creation of the guest without
monitoring (which is what happens today). But if you would like to see the
creation of the guest fail, that is an easy change to accommodate. Would
you like to fail the creation of the guest?

> If we are to keep the current API for the client, it needs to have
> better doc comments.

I will be removing the API, and hence, no doc comments.

> Is the xenstore watch implementation vulnerable to unexpected paths
> appearing in watch events ?

The event handler looks for specific paths with specific formats to exist
in the xenstore. Unless an user with administrative privilege adds entries
to xenstore, such paths should not come into existence in xenstore.

> Why is the API not a never-completing ao ?  Or, why is it not an
> evreg ?

I will be removing the API, and hence, this isn't applicable.

> But the fundamental design questions need answering first.

In v5 of the patch, the design incorporates all of the above aspects. The
API is removed. Existing processes are used to monitor for the events. In
case of libvirtd where one process monitors all the domains, I am
maintaining a list of handlers within that one single 'ctx'. The new
design also takes care of the monitoring irrespective of which toolstack
is used to create the guest.

Regards,

Venu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 0/2] Containing AER unrecoverable errors
@ 2017-08-07 23:53 Venu Busireddy
  0 siblings, 0 replies; 19+ messages in thread
From: Venu Busireddy @ 2017-08-07 23:53 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu

This patch set is part of a set of patches that together allow containment
of unrecoverable AER errors from PCIe devices assigned to guests in
passthrough mode. The containment is achieved by forcibly removing the
erring PCIe device from the guest.

The original xen-pciback patch corresponding to this patch set is:
https://lists.xen.org/archives/html/xen-devel/2017-06/msg03274.html.
It will be reposted after this patch set is accepted.

Changes in v3:
  * Made the following changes suggested by Wei Liu.
    - Added LIBXL_HAVE macros to libxl.h.
    - Don't hard-code dom0's domid to 0. Instead, use libxl__get_domid().
    - Corrected comments.
  * Made the following changes based on comments from Ian Jackson.
    - Got rid of the global variable aer_watch.
    - Added documentation (comments in code) for the new API calls.
    - Removed the unnecessary writes to xenstore.

Changes in v2:
  - Instead of killing the guest and hiding the device, forcibly remove
    the device from the guest.

Venu Busireddy (2):
  libxl: Implement the handler to handle unrecoverable AER errors.
  xl: Register the AER event handler to handle AER errors.

 tools/libxl/libxl.h          | 14 +++++++
 tools/libxl/libxl_event.h    | 13 +++++++
 tools/libxl/libxl_internal.h |  7 ++++
 tools/libxl/libxl_pci.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
 tools/xl/xl_vmcontrol.c      |  9 +++++
 5 files changed, 133 insertions(+)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 0/2] Containing AER unrecoverable errors
@ 2017-08-07 23:50 Venu Busireddy
  0 siblings, 0 replies; 19+ messages in thread
From: Venu Busireddy @ 2017-08-07 23:50 UTC (permalink / raw)
  To: venu.busireddy, xen-devel, Ian Jackson, Wei Liu

This patch set is part of a set of patches that together allow containment
of unrecoverable AER errors from PCIe devices assigned to guests in
passthrough mode. The containment is achieved by forcibly removing the
erring PCIe device from the guest.

Changes in v3:
  * Made the following changes suggested by Wei Liu.
    - Added LIBXL_HAVE macros to libxl.h.
    - Don't hard-code dom0's domid to 0. Instead, use libxl__get_domid().
    - Corrected comments.
  * Made the following changes based on comments from Ian Jackson.
    - Got rid of the global variable aer_watch.
    - Added documentation (comments in code) for the new API calls.
    - Removed the unnecessary writes to xenstore.

Changes in v2:
  - Instead of killing the guest and hiding the device, forcibly remove
    the device from the guest.

Venu Busireddy (2):
  libxl: Implement the handler to handle unrecoverable AER errors.
  xl: Register the AER event handler to handle AER errors.

 tools/libxl/libxl.h          | 14 +++++++
 tools/libxl/libxl_event.h    | 13 +++++++
 tools/libxl/libxl_internal.h |  7 ++++
 tools/libxl/libxl_pci.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
 tools/xl/xl_vmcontrol.c      |  9 +++++
 5 files changed, 133 insertions(+)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-04-10 19:52 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-02 16:25 [RESEND PATCH v5 0/2] Containing AER unrecoverable errors Venu Busireddy
2017-08-07 23:54 ` [PATCH v3 " Venu Busireddy
2017-08-07 23:54   ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
2017-08-08 14:33     ` Wei Liu
2017-08-08 14:40       ` Wei Liu
2017-08-08 14:51       ` Venu Busireddy
2017-08-08 15:12         ` Wei Liu
2017-09-21 17:12         ` Ian Jackson
2018-04-03 15:06           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors [and 1 more messages] Ian Jackson
2018-04-03 15:28             ` Wim ten Have
2018-04-03 15:29             ` Venu Busireddy
2018-04-03 16:51               ` Ian Jackson
2018-04-10 19:50                 ` Venu Busireddy
2018-04-10 19:52           ` [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors Venu Busireddy
2017-08-07 23:54   ` [PATCH v3 2/2] xl: Register the AER event handler to handle " Venu Busireddy
2018-04-02 16:25 ` [RESEND PATCH v5 1/2] libxl: Implement the handler to handle unrecoverable " Venu Busireddy
2018-04-02 16:25 ` [RESEND PATCH v5 2/2] xl: Register the AER event handler that handles " Venu Busireddy
  -- strict thread matches above, loose matches on Subject: below --
2017-08-07 23:53 [PATCH v3 0/2] Containing AER unrecoverable errors Venu Busireddy
2017-08-07 23:50 Venu Busireddy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.