xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Xen Security Advisory 344 v4 (CVE-2020-25601) - lack of preemption in evtchn_reset() / evtchn_destroy()
@ 2020-09-22 13:37 Xen.org security team
  0 siblings, 0 replies; only message in thread
From: Xen.org security team @ 2020-09-22 13:37 UTC (permalink / raw)
  To: xen-announce, xen-devel, xen-users, oss-security; +Cc: Xen.org security team

[-- Attachment #1: Type: text/plain, Size: 4446 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2020-25601 / XSA-344
                               version 4

        lack of preemption in evtchn_reset() / evtchn_destroy()

UPDATES IN VERSION 4
====================

Public release.

ISSUE DESCRIPTION
=================

In particular the FIFO event channel model allows guests to have a large
number of event channels active at a time.  Closing all of these when
resetting all event channels or when cleaning up after the guest may
take extended periods of time.  So far there was no arrangement for
preemption at suitable intervals, allowing a CPU to spend an almost
unbounded amount of time in the processing of these operations.

IMPACT
======

Malicious or buggy guest kernels can mount a Denial of Service (DoS)
attack affecting the entire system.

VULNERABLE SYSTEMS
==================

All Xen versions are vulnerable in principle.  Whether versions 4.3
and older are vulnerable depends on underlying hardware characteristics.

MITIGATION
==========

The problem can be avoided by reducing the number of event channels
available to all guests to a suitably low limit.  For example, setting
"max_event_channels=256" in the xl domain configurations may be low
enough for all hardware Xen is able to run on.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

Applying the appropriate set of attached patches resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa344/xsa344-?.patch           Xen 4.14 - xen-unstable
xsa344/xsa344-4.13-?.patch      Xen 4.13
xsa344/xsa344-4.12-?.patch      Xen 4.12
xsa344/xsa344-4.11-?.patch      Xen 4.11
xsa344/xsa344-4.10-?.patch      Xen 4.10

$ sha256sum xsa344* xsa344*/*
74ae97a618a3680920bed131e69656d5a7c039efbbec99b55b99af772e3e87df  xsa344.meta
5f9dbdc48bed502d614a76e5819afa41a72cec603c5a2c9491d73873a991a5ed  xsa344/xsa344-1.patch
381ca5c51bc120bfd5c742be3988f570abb870c4b75c8a48cf49ae4fa1046d73  xsa344/xsa344-2.patch
b52e4ecd6db8c3c6ebc0ab6facbd0f4fa0859657d13491819c3279fe439f66ec  xsa344/xsa344-4.10-1.patch
53ca9c954fd73344968f40689b0d0ea583bd19ece72166fd2d4eaa125b82f26f  xsa344/xsa344-4.10-2.patch
7abea30b406b0a572f7cd76bd9768d12262344a8e255ddd29d2ad893724638a0  xsa344/xsa344-4.11-1.patch
f2b39146ac410154043efd09880277e4e821a1dd47a0bd3000545e5568253b97  xsa344/xsa344-4.11-2.patch
a654c99f5d1c25d9d12ba267d2db10b0a1e0da337ce334fb5aafa6b2061ebc3c  xsa344/xsa344-4.12-1.patch
6af4e05f8536b11a3dc4c70620b8ed973ecf09efd4c64eb500f6363d5f0402e7  xsa344/xsa344-4.12-2.patch
9b81c7cf3cd33f9d43c43222a0434a8d4e0acff74f339a6842f16bfa2f304cb5  xsa344/xsa344-4.13-1.patch
80a41b7e08cdb54a28dfc82630a0d8d89fc25e381bc4505ed41017a760addf09  xsa344/xsa344-4.13-2.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl9p/egMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZ114H/2QrxpADKwxDb2+aL8fhf46AJYwgxDa8SoI18INd
IVeHs8Lq4CQsfSFxBbXOWDGo82bUg43kwcdZ3ToSaX2JSC4R3r0us6tSdaRIqpNj
sQo56ozFXH63v4zTlB8gF58skm2n+CZQ5nKccnTUsN7KuqfPWm/2LfBnqnHYkYQ9
CVHBG5YXMnrHbASo+HglGqjgu6GyEsLoJpSQEj6oYF/UW86OYeAwZ2TFAFVZ/T04
XtxnH7aYCSMOeQRPU6BnCdoVKg/wn4ilSKyqYAin8uNFf7af3OSSCR4FTYkLX+VG
WYJnc27SUAb28+l9f65r8cwzs2+O5SlqhpqyS6xcM3A1248=
=UYAk
-----END PGP SIGNATURE-----

[-- Attachment #2: xsa344.meta --]
[-- Type: application/octet-stream, Size: 2543 bytes --]

{
  "XSA": 344,
  "SupportedVersions": [
    "master",
    "4.14",
    "4.13",
    "4.12",
    "4.11",
    "4.10"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.10": {
      "Recipes": {
        "xen": {
          "StableRef": "93be943e7d759015bd5db41a48f6dce58e580d5a",
          "Prereqs": [
            336,
            337,
            338,
            339,
            340,
            342,
            343
          ],
          "Patches": [
            "xsa344/xsa344-4.10-?.patch"
          ]
        }
      }
    },
    "4.11": {
      "Recipes": {
        "xen": {
          "StableRef": "ddaaccbbab6b19bf21ed2c097f3055a3c2544c8d",
          "Prereqs": [
            333,
            336,
            337,
            338,
            339,
            340,
            342,
            343
          ],
          "Patches": [
            "xsa344/xsa344-4.11-?.patch"
          ]
        }
      }
    },
    "4.12": {
      "Recipes": {
        "xen": {
          "StableRef": "1336ca17742471fc4a59879ae2f637a59530a933",
          "Prereqs": [
            333,
            334,
            336,
            337,
            338,
            339,
            340,
            342,
            343
          ],
          "Patches": [
            "xsa344/xsa344-4.12-?.patch"
          ]
        }
      }
    },
    "4.13": {
      "Recipes": {
        "xen": {
          "StableRef": "9b367b2b0b714f3ffb69ed6be0a118e8d3eac07f",
          "Prereqs": [
            333,
            334,
            336,
            337,
            338,
            339,
            340,
            342,
            343
          ],
          "Patches": [
            "xsa344/xsa344-4.13-?.patch"
          ]
        }
      }
    },
    "4.14": {
      "Recipes": {
        "xen": {
          "StableRef": "c3a0fc22af90ef28e68b116c6a49d9cec57f71cf",
          "Prereqs": [
            333,
            334,
            336,
            337,
            338,
            339,
            340,
            342,
            343
          ],
          "Patches": [
            "xsa344/xsa344-?.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "b11910082d90bb1597f6679524eb726a33306672",
          "Prereqs": [
            333,
            334,
            336,
            337,
            338,
            339,
            340,
            342,
            343
          ],
          "Patches": [
            "xsa344/xsa344-?.patch"
          ]
        }
      }
    }
  }
}

[-- Attachment #3: xsa344/xsa344-1.patch --]
[-- Type: application/octet-stream, Size: 4207 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_destroy()

Especially closing of fully established interdomain channels can take
quite some time, due to the locking involved. Therefore we shouldn't
assume we can clean up still active ports all in one go. Besides adding
the necessary preemption check, also avoid pointlessly starting from
(or now really ending at) 0; 1 is the lowest numbered port which may
need closing.

Since we're now reducing ->valid_evtchns, free_xen_event_channel(),
and (at least to be on the safe side) notify_via_xen_event_channel()
need to cope with attempts to close / unbind from / send through already
closed (and no longer valid, as per port_is_valid()) ports.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
v8: Drop ->active->evtchns part of main loop condition. Drop BUG_ON()
    conversion in evtchn_close().
v7: Comment the barriers added in v6.
v6: Add barriers ahead of new ->is_dying checks.
v5: Also adjust BUG_ON() in evtchn_close().
v4: New.

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -715,12 +715,14 @@ int domain_kill(struct domain *d)
             return domain_kill(d);
         d->is_dying = DOMDYING_dying;
         argo_destroy(d);
-        evtchn_destroy(d);
         gnttab_release_mappings(d);
         vnuma_destroy(d->vnuma);
         domain_set_outstanding_pages(d, 0);
         /* fallthrough */
     case DOMDYING_dying:
+        rc = evtchn_destroy(d);
+        if ( rc )
+            break;
         rc = domain_relinquish_resources(d);
         if ( rc != 0 )
             break;
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1297,7 +1297,16 @@ int alloc_unbound_xen_event_channel(
 
 void free_xen_event_channel(struct domain *d, int port)
 {
-    BUG_ON(!port_is_valid(d, port));
+    if ( !port_is_valid(d, port) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        BUG_ON(!d->is_dying);
+        return;
+    }
 
     evtchn_close(d, port, 0);
 }
@@ -1309,7 +1318,17 @@ void notify_via_xen_event_channel(struct
     struct domain *rd;
     unsigned long flags;
 
-    ASSERT(port_is_valid(ld, lport));
+    if ( !port_is_valid(ld, lport) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        ASSERT(ld->is_dying);
+        return;
+    }
+
     lchn = evtchn_from_port(ld, lport);
 
     spin_lock_irqsave(&lchn->lock, flags);
@@ -1380,8 +1399,7 @@ int evtchn_init(struct domain *d, unsign
     return 0;
 }
 
-
-void evtchn_destroy(struct domain *d)
+int evtchn_destroy(struct domain *d)
 {
     unsigned int i;
 
@@ -1390,14 +1408,29 @@ void evtchn_destroy(struct domain *d)
     spin_barrier(&d->event_lock);
 
     /* Close all existing event channels. */
-    for ( i = 0; port_is_valid(d, i); i++ )
+    for ( i = d->valid_evtchns; --i; )
+    {
         evtchn_close(d, i, 0);
 
+        /*
+         * Avoid preempting when called from domain_create()'s error path,
+         * and don't check too often (choice of frequency is arbitrary).
+         */
+        if ( i && !(i & 0x3f) && d->is_dying != DOMDYING_dead &&
+             hypercall_preempt_check() )
+        {
+            write_atomic(&d->valid_evtchns, i);
+            return -ERESTART;
+        }
+    }
+
     ASSERT(!d->active_evtchns);
 
     clear_global_virq_handlers(d);
 
     evtchn_fifo_destroy(d);
+
+    return 0;
 }
 
 
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -138,7 +138,7 @@ struct evtchn
 } __attribute__((aligned(64)));
 
 int  evtchn_init(struct domain *d, unsigned int max_port);
-void evtchn_destroy(struct domain *d); /* from domain_kill */
+int  evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;

[-- Attachment #4: xsa344/xsa344-2.patch --]
[-- Type: application/octet-stream, Size: 6902 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_reset()

Like for evtchn_destroy() looping over all possible event channels to
close them can take a significant amount of time. Unlike done there, we
can't alter domain properties (i.e. d->valid_evtchns) here. Borrow, in a
lightweight form, the paging domctl continuation concept, redirecting
the continuations to different sub-ops. Just like there this is to be
able to allow for predictable overall results of the involved sub-ops:
Racing requests should either complete or be refused.

Note that a domain can't interfere with an already started (by a remote
domain) reset, due to being paused. It can prevent a remote reset from
happening by leaving a reset unfinished, but that's only going to affect
itself.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
v9: Undo v7 changes.
v7: Move extension of loop condition in evtchn_reset() here, to match
    the earlier patch'es change to evtchn_destroy().
v6: Also protect the last write of d->next_evtchn. Re-base over changes
    to earlier patches.
v4: New.

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1145,7 +1145,7 @@ void domain_unpause_except_self(struct d
         domain_unpause(d);
 }
 
-int domain_soft_reset(struct domain *d)
+int domain_soft_reset(struct domain *d, bool resuming)
 {
     struct vcpu *v;
     int rc;
@@ -1159,7 +1159,7 @@ int domain_soft_reset(struct domain *d)
         }
     spin_unlock(&d->shutdown_lock);
 
-    rc = evtchn_reset(d);
+    rc = evtchn_reset(d, resuming);
     if ( rc )
         return rc;
 
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -477,12 +477,22 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
     }
 
     case XEN_DOMCTL_soft_reset:
+    case XEN_DOMCTL_soft_reset_cont:
         if ( d == current->domain ) /* no domain_pause() */
         {
             ret = -EINVAL;
             break;
         }
-        ret = domain_soft_reset(d);
+        ret = domain_soft_reset(d, op->cmd == XEN_DOMCTL_soft_reset_cont);
+        if ( ret == -ERESTART )
+        {
+            op->cmd = XEN_DOMCTL_soft_reset_cont;
+            if ( !__copy_field_to_guest(u_domctl, op, cmd) )
+                ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                    "h", u_domctl);
+            else
+                ret = -EFAULT;
+        }
         break;
 
     case XEN_DOMCTL_destroydomain:
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1057,7 +1057,7 @@ int evtchn_unmask(unsigned int port)
     return 0;
 }
 
-int evtchn_reset(struct domain *d)
+int evtchn_reset(struct domain *d, bool resuming)
 {
     unsigned int i;
     int rc = 0;
@@ -1065,11 +1065,40 @@ int evtchn_reset(struct domain *d)
     if ( d != current->domain && !d->controller_pause_count )
         return -EINVAL;
 
-    for ( i = 0; port_is_valid(d, i); i++ )
+    spin_lock(&d->event_lock);
+
+    /*
+     * If we are resuming, then start where we stopped. Otherwise, check
+     * that a reset operation is not already in progress, and if none is,
+     * record that this is now the case.
+     */
+    i = resuming ? d->next_evtchn : !d->next_evtchn;
+    if ( i > d->next_evtchn )
+        d->next_evtchn = i;
+
+    spin_unlock(&d->event_lock);
+
+    if ( !i )
+        return -EBUSY;
+
+    for ( ; port_is_valid(d, i); i++ )
+    {
         evtchn_close(d, i, 1);
 
+        /* NB: Choice of frequency is arbitrary. */
+        if ( !(i & 0x3f) && hypercall_preempt_check() )
+        {
+            spin_lock(&d->event_lock);
+            d->next_evtchn = i;
+            spin_unlock(&d->event_lock);
+            return -ERESTART;
+        }
+    }
+
     spin_lock(&d->event_lock);
 
+    d->next_evtchn = 0;
+
     if ( d->active_evtchns > d->xen_evtchns )
         rc = -EAGAIN;
     else if ( d->evtchn_fifo )
@@ -1204,7 +1233,8 @@ long do_event_channel_op(int cmd, XEN_GU
         break;
     }
 
-    case EVTCHNOP_reset: {
+    case EVTCHNOP_reset:
+    case EVTCHNOP_reset_cont: {
         struct evtchn_reset reset;
         struct domain *d;
 
@@ -1217,9 +1247,13 @@ long do_event_channel_op(int cmd, XEN_GU
 
         rc = xsm_evtchn_reset(XSM_TARGET, current->domain, d);
         if ( !rc )
-            rc = evtchn_reset(d);
+            rc = evtchn_reset(d, cmd == EVTCHNOP_reset_cont);
 
         rcu_unlock_domain(d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_event_channel_op,
+                                               "ih", EVTCHNOP_reset_cont, arg);
         break;
     }
 
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1159,7 +1159,10 @@ struct xen_domctl {
 #define XEN_DOMCTL_iomem_permission              20
 #define XEN_DOMCTL_ioport_permission             21
 #define XEN_DOMCTL_hypercall_init                22
-#define XEN_DOMCTL_arch_setup                    23 /* Obsolete IA64 only */
+#ifdef __XEN__
+/* #define XEN_DOMCTL_arch_setup                 23 Obsolete IA64 only */
+#define XEN_DOMCTL_soft_reset_cont               23
+#endif
 #define XEN_DOMCTL_settimeoffset                 24
 #define XEN_DOMCTL_getvcpuaffinity               25
 #define XEN_DOMCTL_real_mode_area                26 /* Obsolete PPC only */
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -74,6 +74,9 @@
 #define EVTCHNOP_init_control    11
 #define EVTCHNOP_expand_array    12
 #define EVTCHNOP_set_priority    13
+#ifdef __XEN__
+#define EVTCHNOP_reset_cont      14
+#endif
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -171,7 +171,7 @@ void evtchn_check_pollers(struct domain
 void evtchn_2l_init(struct domain *d);
 
 /* Close all event channels and reset to 2-level ABI. */
-int evtchn_reset(struct domain *d);
+int evtchn_reset(struct domain *d, bool resuming);
 
 /*
  * Low-level event channel port ops.
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -371,6 +371,8 @@ struct domain
      * EVTCHNOP_reset).  Read/write access like for active_evtchns.
      */
     unsigned int     xen_evtchns;
+    /* Port to resume from in evtchn_reset(), when in a continuation. */
+    unsigned int     next_evtchn;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
     struct evtchn_fifo_domain *evtchn_fifo;
@@ -663,7 +665,7 @@ int domain_kill(struct domain *d);
 int domain_shutdown(struct domain *d, u8 reason);
 void domain_resume(struct domain *d);
 
-int domain_soft_reset(struct domain *d);
+int domain_soft_reset(struct domain *d, bool resuming);
 
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);

[-- Attachment #5: xsa344/xsa344-4.10-1.patch --]
[-- Type: application/octet-stream, Size: 4036 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_destroy()

Especially closing of fully established interdomain channels can take
quite some time, due to the locking involved. Therefore we shouldn't
assume we can clean up still active ports all in one go. Besides adding
the necessary preemption check, also avoid pointlessly starting from
(or now really ending at) 0; 1 is the lowest numbered port which may
need closing.

Since we're now reducing ->valid_evtchns, free_xen_event_channel(),
and (at least to be on the safe side) notify_via_xen_event_channel()
need to cope with attempts to close / unbind from / send through already
closed (and no longer valid, as per port_is_valid()) ports.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -626,7 +626,6 @@ int domain_kill(struct domain *d)
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
         spin_barrier(&d->domain_lock);
-        evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem_client);
         vnuma_destroy(d->vnuma);
@@ -634,6 +633,9 @@ int domain_kill(struct domain *d)
         d->tmem_client = NULL;
         /* fallthrough */
     case DOMDYING_dying:
+        rc = evtchn_destroy(d);
+        if ( rc )
+            break;
         rc = domain_relinquish_resources(d);
         if ( rc != 0 )
             break;
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1299,7 +1299,16 @@ int alloc_unbound_xen_event_channel(
 
 void free_xen_event_channel(struct domain *d, int port)
 {
-    BUG_ON(!port_is_valid(d, port));
+    if ( !port_is_valid(d, port) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        BUG_ON(!d->is_dying);
+        return;
+    }
 
     evtchn_close(d, port, 0);
 }
@@ -1311,7 +1320,17 @@ void notify_via_xen_event_channel(struct
     struct domain *rd;
     unsigned long flags;
 
-    ASSERT(port_is_valid(ld, lport));
+    if ( !port_is_valid(ld, lport) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        ASSERT(ld->is_dying);
+        return;
+    }
+
     lchn = evtchn_from_port(ld, lport);
 
     spin_lock_irqsave(&lchn->lock, flags);
@@ -1383,8 +1402,7 @@ int evtchn_init(struct domain *d)
     return 0;
 }
 
-
-void evtchn_destroy(struct domain *d)
+int evtchn_destroy(struct domain *d)
 {
     unsigned int i;
 
@@ -1393,14 +1411,29 @@ void evtchn_destroy(struct domain *d)
     spin_barrier(&d->event_lock);
 
     /* Close all existing event channels. */
-    for ( i = 0; port_is_valid(d, i); i++ )
+    for ( i = d->valid_evtchns; --i; )
+    {
         evtchn_close(d, i, 0);
 
+        /*
+         * Avoid preempting when called from domain_create()'s error path,
+         * and don't check too often (choice of frequency is arbitrary).
+         */
+        if ( i && !(i & 0x3f) && d->is_dying != DOMDYING_dead &&
+             hypercall_preempt_check() )
+        {
+            write_atomic(&d->valid_evtchns, i);
+            return -ERESTART;
+        }
+    }
+
     ASSERT(!d->active_evtchns);
 
     clear_global_virq_handlers(d);
 
     evtchn_fifo_destroy(d);
+
+    return 0;
 }
 
 
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -134,7 +134,7 @@ struct evtchn
 } __attribute__((aligned(64)));
 
 int  evtchn_init(struct domain *d); /* from domain_create */
-void evtchn_destroy(struct domain *d); /* from domain_kill */
+int  evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;

[-- Attachment #6: xsa344/xsa344-4.10-2.patch --]
[-- Type: application/octet-stream, Size: 6643 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_reset()

Like for evtchn_destroy() looping over all possible event channels to
close them can take a significant amount of time. Unlike done there, we
can't alter domain properties (i.e. d->valid_evtchns) here. Borrow, in a
lightweight form, the paging domctl continuation concept, redirecting
the continuations to different sub-ops. Just like there this is to be
able to allow for predictable overall results of the involved sub-ops:
Racing requests should either complete or be refused.

Note that a domain can't interfere with an already started (by a remote
domain) reset, due to being paused. It can prevent a remote reset from
happening by leaving a reset unfinished, but that's only going to affect
itself.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1086,7 +1086,7 @@ void domain_unpause_except_self(struct d
         domain_unpause(d);
 }
 
-int domain_soft_reset(struct domain *d)
+int domain_soft_reset(struct domain *d, bool resuming)
 {
     struct vcpu *v;
     int rc;
@@ -1100,7 +1100,7 @@ int domain_soft_reset(struct domain *d)
         }
     spin_unlock(&d->shutdown_lock);
 
-    rc = evtchn_reset(d);
+    rc = evtchn_reset(d, resuming);
     if ( rc )
         return rc;
 
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -666,12 +666,22 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
     }
 
     case XEN_DOMCTL_soft_reset:
+    case XEN_DOMCTL_soft_reset_cont:
         if ( d == current->domain ) /* no domain_pause() */
         {
             ret = -EINVAL;
             break;
         }
-        ret = domain_soft_reset(d);
+        ret = domain_soft_reset(d, op->cmd == XEN_DOMCTL_soft_reset_cont);
+        if ( ret == -ERESTART )
+        {
+            op->cmd = XEN_DOMCTL_soft_reset_cont;
+            if ( !__copy_field_to_guest(u_domctl, op, cmd) )
+                ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                    "h", u_domctl);
+            else
+                ret = -EFAULT;
+        }
         break;
 
     case XEN_DOMCTL_destroydomain:
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1059,7 +1059,7 @@ int evtchn_unmask(unsigned int port)
     return 0;
 }
 
-int evtchn_reset(struct domain *d)
+int evtchn_reset(struct domain *d, bool resuming)
 {
     unsigned int i;
     int rc = 0;
@@ -1067,11 +1067,40 @@ int evtchn_reset(struct domain *d)
     if ( d != current->domain && !d->controller_pause_count )
         return -EINVAL;
 
-    for ( i = 0; port_is_valid(d, i); i++ )
+    spin_lock(&d->event_lock);
+
+    /*
+     * If we are resuming, then start where we stopped. Otherwise, check
+     * that a reset operation is not already in progress, and if none is,
+     * record that this is now the case.
+     */
+    i = resuming ? d->next_evtchn : !d->next_evtchn;
+    if ( i > d->next_evtchn )
+        d->next_evtchn = i;
+
+    spin_unlock(&d->event_lock);
+
+    if ( !i )
+        return -EBUSY;
+
+    for ( ; port_is_valid(d, i); i++ )
+    {
         evtchn_close(d, i, 1);
 
+        /* NB: Choice of frequency is arbitrary. */
+        if ( !(i & 0x3f) && hypercall_preempt_check() )
+        {
+            spin_lock(&d->event_lock);
+            d->next_evtchn = i;
+            spin_unlock(&d->event_lock);
+            return -ERESTART;
+        }
+    }
+
     spin_lock(&d->event_lock);
 
+    d->next_evtchn = 0;
+
     if ( d->active_evtchns > d->xen_evtchns )
         rc = -EAGAIN;
     else if ( d->evtchn_fifo )
@@ -1206,7 +1235,8 @@ long do_event_channel_op(int cmd, XEN_GU
         break;
     }
 
-    case EVTCHNOP_reset: {
+    case EVTCHNOP_reset:
+    case EVTCHNOP_reset_cont: {
         struct evtchn_reset reset;
         struct domain *d;
 
@@ -1219,9 +1249,13 @@ long do_event_channel_op(int cmd, XEN_GU
 
         rc = xsm_evtchn_reset(XSM_TARGET, current->domain, d);
         if ( !rc )
-            rc = evtchn_reset(d);
+            rc = evtchn_reset(d, cmd == EVTCHNOP_reset_cont);
 
         rcu_unlock_domain(d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_event_channel_op,
+                                               "ih", EVTCHNOP_reset_cont, arg);
         break;
     }
 
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1130,7 +1130,10 @@ struct xen_domctl {
 #define XEN_DOMCTL_iomem_permission              20
 #define XEN_DOMCTL_ioport_permission             21
 #define XEN_DOMCTL_hypercall_init                22
-#define XEN_DOMCTL_arch_setup                    23 /* Obsolete IA64 only */
+#ifdef __XEN__
+/* #define XEN_DOMCTL_arch_setup                 23 Obsolete IA64 only */
+#define XEN_DOMCTL_soft_reset_cont               23
+#endif
 #define XEN_DOMCTL_settimeoffset                 24
 #define XEN_DOMCTL_getvcpuaffinity               25
 #define XEN_DOMCTL_real_mode_area                26 /* Obsolete PPC only */
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -74,6 +74,9 @@
 #define EVTCHNOP_init_control    11
 #define EVTCHNOP_expand_array    12
 #define EVTCHNOP_set_priority    13
+#ifdef __XEN__
+#define EVTCHNOP_reset_cont      14
+#endif
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -163,7 +163,7 @@ void evtchn_check_pollers(struct domain
 void evtchn_2l_init(struct domain *d);
 
 /* Close all event channels and reset to 2-level ABI. */
-int evtchn_reset(struct domain *d);
+int evtchn_reset(struct domain *d, bool resuming);
 
 /*
  * Low-level event channel port ops.
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -348,6 +348,8 @@ struct domain
      * EVTCHNOP_reset).  Read/write access like for active_evtchns.
      */
     unsigned int     xen_evtchns;
+    /* Port to resume from in evtchn_reset(), when in a continuation. */
+    unsigned int     next_evtchn;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
     struct evtchn_fifo_domain *evtchn_fifo;
@@ -616,7 +618,7 @@ int domain_shutdown(struct domain *d, u8
 void domain_resume(struct domain *d);
 void domain_pause_for_debugger(void);
 
-int domain_soft_reset(struct domain *d);
+int domain_soft_reset(struct domain *d, bool resuming);
 
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);

[-- Attachment #7: xsa344/xsa344-4.11-1.patch --]
[-- Type: application/octet-stream, Size: 4052 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_destroy()

Especially closing of fully established interdomain channels can take
quite some time, due to the locking involved. Therefore we shouldn't
assume we can clean up still active ports all in one go. Besides adding
the necessary preemption check, also avoid pointlessly starting from
(or now really ending at) 0; 1 is the lowest numbered port which may
need closing.

Since we're now reducing ->valid_evtchns, free_xen_event_channel(),
and (at least to be on the safe side) notify_via_xen_event_channel()
need to cope with attempts to close / unbind from / send through already
closed (and no longer valid, as per port_is_valid()) ports.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -646,7 +646,6 @@ int domain_kill(struct domain *d)
         if ( d->is_dying != DOMDYING_alive )
             return domain_kill(d);
         d->is_dying = DOMDYING_dying;
-        evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem_client);
         vnuma_destroy(d->vnuma);
@@ -654,6 +653,9 @@ int domain_kill(struct domain *d)
         d->tmem_client = NULL;
         /* fallthrough */
     case DOMDYING_dying:
+        rc = evtchn_destroy(d);
+        if ( rc )
+            break;
         rc = domain_relinquish_resources(d);
         if ( rc != 0 )
             break;
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1291,7 +1291,16 @@ int alloc_unbound_xen_event_channel(
 
 void free_xen_event_channel(struct domain *d, int port)
 {
-    BUG_ON(!port_is_valid(d, port));
+    if ( !port_is_valid(d, port) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        BUG_ON(!d->is_dying);
+        return;
+    }
 
     evtchn_close(d, port, 0);
 }
@@ -1303,7 +1312,17 @@ void notify_via_xen_event_channel(struct
     struct domain *rd;
     unsigned long flags;
 
-    ASSERT(port_is_valid(ld, lport));
+    if ( !port_is_valid(ld, lport) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        ASSERT(ld->is_dying);
+        return;
+    }
+
     lchn = evtchn_from_port(ld, lport);
 
     spin_lock_irqsave(&lchn->lock, flags);
@@ -1375,8 +1394,7 @@ int evtchn_init(struct domain *d)
     return 0;
 }
 
-
-void evtchn_destroy(struct domain *d)
+int evtchn_destroy(struct domain *d)
 {
     unsigned int i;
 
@@ -1385,14 +1403,29 @@ void evtchn_destroy(struct domain *d)
     spin_barrier(&d->event_lock);
 
     /* Close all existing event channels. */
-    for ( i = 0; port_is_valid(d, i); i++ )
+    for ( i = d->valid_evtchns; --i; )
+    {
         evtchn_close(d, i, 0);
 
+        /*
+         * Avoid preempting when called from domain_create()'s error path,
+         * and don't check too often (choice of frequency is arbitrary).
+         */
+        if ( i && !(i & 0x3f) && d->is_dying != DOMDYING_dead &&
+             hypercall_preempt_check() )
+        {
+            write_atomic(&d->valid_evtchns, i);
+            return -ERESTART;
+        }
+    }
+
     ASSERT(!d->active_evtchns);
 
     clear_global_virq_handlers(d);
 
     evtchn_fifo_destroy(d);
+
+    return 0;
 }
 
 
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -135,7 +135,7 @@ struct evtchn
 } __attribute__((aligned(64)));
 
 int  evtchn_init(struct domain *d); /* from domain_create */
-void evtchn_destroy(struct domain *d); /* from domain_kill */
+int  evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;

[-- Attachment #8: xsa344/xsa344-4.11-2.patch --]
[-- Type: application/octet-stream, Size: 6643 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_reset()

Like for evtchn_destroy() looping over all possible event channels to
close them can take a significant amount of time. Unlike done there, we
can't alter domain properties (i.e. d->valid_evtchns) here. Borrow, in a
lightweight form, the paging domctl continuation concept, redirecting
the continuations to different sub-ops. Just like there this is to be
able to allow for predictable overall results of the involved sub-ops:
Racing requests should either complete or be refused.

Note that a domain can't interfere with an already started (by a remote
domain) reset, due to being paused. It can prevent a remote reset from
happening by leaving a reset unfinished, but that's only going to affect
itself.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1105,7 +1105,7 @@ void domain_unpause_except_self(struct d
         domain_unpause(d);
 }
 
-int domain_soft_reset(struct domain *d)
+int domain_soft_reset(struct domain *d, bool resuming)
 {
     struct vcpu *v;
     int rc;
@@ -1119,7 +1119,7 @@ int domain_soft_reset(struct domain *d)
         }
     spin_unlock(&d->shutdown_lock);
 
-    rc = evtchn_reset(d);
+    rc = evtchn_reset(d, resuming);
     if ( rc )
         return rc;
 
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -648,12 +648,22 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
     }
 
     case XEN_DOMCTL_soft_reset:
+    case XEN_DOMCTL_soft_reset_cont:
         if ( d == current->domain ) /* no domain_pause() */
         {
             ret = -EINVAL;
             break;
         }
-        ret = domain_soft_reset(d);
+        ret = domain_soft_reset(d, op->cmd == XEN_DOMCTL_soft_reset_cont);
+        if ( ret == -ERESTART )
+        {
+            op->cmd = XEN_DOMCTL_soft_reset_cont;
+            if ( !__copy_field_to_guest(u_domctl, op, cmd) )
+                ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                    "h", u_domctl);
+            else
+                ret = -EFAULT;
+        }
         break;
 
     case XEN_DOMCTL_destroydomain:
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1051,7 +1051,7 @@ int evtchn_unmask(unsigned int port)
     return 0;
 }
 
-int evtchn_reset(struct domain *d)
+int evtchn_reset(struct domain *d, bool resuming)
 {
     unsigned int i;
     int rc = 0;
@@ -1059,11 +1059,40 @@ int evtchn_reset(struct domain *d)
     if ( d != current->domain && !d->controller_pause_count )
         return -EINVAL;
 
-    for ( i = 0; port_is_valid(d, i); i++ )
+    spin_lock(&d->event_lock);
+
+    /*
+     * If we are resuming, then start where we stopped. Otherwise, check
+     * that a reset operation is not already in progress, and if none is,
+     * record that this is now the case.
+     */
+    i = resuming ? d->next_evtchn : !d->next_evtchn;
+    if ( i > d->next_evtchn )
+        d->next_evtchn = i;
+
+    spin_unlock(&d->event_lock);
+
+    if ( !i )
+        return -EBUSY;
+
+    for ( ; port_is_valid(d, i); i++ )
+    {
         evtchn_close(d, i, 1);
 
+        /* NB: Choice of frequency is arbitrary. */
+        if ( !(i & 0x3f) && hypercall_preempt_check() )
+        {
+            spin_lock(&d->event_lock);
+            d->next_evtchn = i;
+            spin_unlock(&d->event_lock);
+            return -ERESTART;
+        }
+    }
+
     spin_lock(&d->event_lock);
 
+    d->next_evtchn = 0;
+
     if ( d->active_evtchns > d->xen_evtchns )
         rc = -EAGAIN;
     else if ( d->evtchn_fifo )
@@ -1198,7 +1227,8 @@ long do_event_channel_op(int cmd, XEN_GU
         break;
     }
 
-    case EVTCHNOP_reset: {
+    case EVTCHNOP_reset:
+    case EVTCHNOP_reset_cont: {
         struct evtchn_reset reset;
         struct domain *d;
 
@@ -1211,9 +1241,13 @@ long do_event_channel_op(int cmd, XEN_GU
 
         rc = xsm_evtchn_reset(XSM_TARGET, current->domain, d);
         if ( !rc )
-            rc = evtchn_reset(d);
+            rc = evtchn_reset(d, cmd == EVTCHNOP_reset_cont);
 
         rcu_unlock_domain(d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_event_channel_op,
+                                               "ih", EVTCHNOP_reset_cont, arg);
         break;
     }
 
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1121,7 +1121,10 @@ struct xen_domctl {
 #define XEN_DOMCTL_iomem_permission              20
 #define XEN_DOMCTL_ioport_permission             21
 #define XEN_DOMCTL_hypercall_init                22
-#define XEN_DOMCTL_arch_setup                    23 /* Obsolete IA64 only */
+#ifdef __XEN__
+/* #define XEN_DOMCTL_arch_setup                 23 Obsolete IA64 only */
+#define XEN_DOMCTL_soft_reset_cont               23
+#endif
 #define XEN_DOMCTL_settimeoffset                 24
 #define XEN_DOMCTL_getvcpuaffinity               25
 #define XEN_DOMCTL_real_mode_area                26 /* Obsolete PPC only */
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -74,6 +74,9 @@
 #define EVTCHNOP_init_control    11
 #define EVTCHNOP_expand_array    12
 #define EVTCHNOP_set_priority    13
+#ifdef __XEN__
+#define EVTCHNOP_reset_cont      14
+#endif
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -163,7 +163,7 @@ void evtchn_check_pollers(struct domain
 void evtchn_2l_init(struct domain *d);
 
 /* Close all event channels and reset to 2-level ABI. */
-int evtchn_reset(struct domain *d);
+int evtchn_reset(struct domain *d, bool resuming);
 
 /*
  * Low-level event channel port ops.
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -355,6 +355,8 @@ struct domain
      * EVTCHNOP_reset).  Read/write access like for active_evtchns.
      */
     unsigned int     xen_evtchns;
+    /* Port to resume from in evtchn_reset(), when in a continuation. */
+    unsigned int     next_evtchn;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
     struct evtchn_fifo_domain *evtchn_fifo;
@@ -608,7 +610,7 @@ int domain_shutdown(struct domain *d, u8
 void domain_resume(struct domain *d);
 void domain_pause_for_debugger(void);
 
-int domain_soft_reset(struct domain *d);
+int domain_soft_reset(struct domain *d, bool resuming);
 
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);

[-- Attachment #9: xsa344/xsa344-4.12-1.patch --]
[-- Type: application/octet-stream, Size: 4037 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_destroy()

Especially closing of fully established interdomain channels can take
quite some time, due to the locking involved. Therefore we shouldn't
assume we can clean up still active ports all in one go. Besides adding
the necessary preemption check, also avoid pointlessly starting from
(or now really ending at) 0; 1 is the lowest numbered port which may
need closing.

Since we're now reducing ->valid_evtchns, free_xen_event_channel(),
and (at least to be on the safe side) notify_via_xen_event_channel()
need to cope with attempts to close / unbind from / send through already
closed (and no longer valid, as per port_is_valid()) ports.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -724,7 +724,6 @@ int domain_kill(struct domain *d)
             return domain_kill(d);
         d->is_dying = DOMDYING_dying;
         argo_destroy(d);
-        evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem_client);
         vnuma_destroy(d->vnuma);
@@ -732,6 +731,9 @@ int domain_kill(struct domain *d)
         d->tmem_client = NULL;
         /* fallthrough */
     case DOMDYING_dying:
+        rc = evtchn_destroy(d);
+        if ( rc )
+            break;
         rc = domain_relinquish_resources(d);
         if ( rc != 0 )
             break;
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1297,7 +1297,16 @@ int alloc_unbound_xen_event_channel(
 
 void free_xen_event_channel(struct domain *d, int port)
 {
-    BUG_ON(!port_is_valid(d, port));
+    if ( !port_is_valid(d, port) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        BUG_ON(!d->is_dying);
+        return;
+    }
 
     evtchn_close(d, port, 0);
 }
@@ -1309,7 +1318,17 @@ void notify_via_xen_event_channel(struct
     struct domain *rd;
     unsigned long flags;
 
-    ASSERT(port_is_valid(ld, lport));
+    if ( !port_is_valid(ld, lport) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        ASSERT(ld->is_dying);
+        return;
+    }
+
     lchn = evtchn_from_port(ld, lport);
 
     spin_lock_irqsave(&lchn->lock, flags);
@@ -1380,8 +1399,7 @@ int evtchn_init(struct domain *d, unsign
     return 0;
 }
 
-
-void evtchn_destroy(struct domain *d)
+int evtchn_destroy(struct domain *d)
 {
     unsigned int i;
 
@@ -1390,14 +1408,29 @@ void evtchn_destroy(struct domain *d)
     spin_barrier(&d->event_lock);
 
     /* Close all existing event channels. */
-    for ( i = 0; port_is_valid(d, i); i++ )
+    for ( i = d->valid_evtchns; --i; )
+    {
         evtchn_close(d, i, 0);
 
+        /*
+         * Avoid preempting when called from domain_create()'s error path,
+         * and don't check too often (choice of frequency is arbitrary).
+         */
+        if ( i && !(i & 0x3f) && d->is_dying != DOMDYING_dead &&
+             hypercall_preempt_check() )
+        {
+            write_atomic(&d->valid_evtchns, i);
+            return -ERESTART;
+        }
+    }
+
     ASSERT(!d->active_evtchns);
 
     clear_global_virq_handlers(d);
 
     evtchn_fifo_destroy(d);
+
+    return 0;
 }
 
 
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -136,7 +136,7 @@ struct evtchn
 } __attribute__((aligned(64)));
 
 int  evtchn_init(struct domain *d, unsigned int max_port);
-void evtchn_destroy(struct domain *d); /* from domain_kill */
+int  evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;

[-- Attachment #10: xsa344/xsa344-4.12-2.patch --]
[-- Type: application/octet-stream, Size: 6643 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_reset()

Like for evtchn_destroy() looping over all possible event channels to
close them can take a significant amount of time. Unlike done there, we
can't alter domain properties (i.e. d->valid_evtchns) here. Borrow, in a
lightweight form, the paging domctl continuation concept, redirecting
the continuations to different sub-ops. Just like there this is to be
able to allow for predictable overall results of the involved sub-ops:
Racing requests should either complete or be refused.

Note that a domain can't interfere with an already started (by a remote
domain) reset, due to being paused. It can prevent a remote reset from
happening by leaving a reset unfinished, but that's only going to affect
itself.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1170,7 +1170,7 @@ void domain_unpause_except_self(struct d
         domain_unpause(d);
 }
 
-int domain_soft_reset(struct domain *d)
+int domain_soft_reset(struct domain *d, bool resuming)
 {
     struct vcpu *v;
     int rc;
@@ -1184,7 +1184,7 @@ int domain_soft_reset(struct domain *d)
         }
     spin_unlock(&d->shutdown_lock);
 
-    rc = evtchn_reset(d);
+    rc = evtchn_reset(d, resuming);
     if ( rc )
         return rc;
 
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -585,12 +585,22 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
     }
 
     case XEN_DOMCTL_soft_reset:
+    case XEN_DOMCTL_soft_reset_cont:
         if ( d == current->domain ) /* no domain_pause() */
         {
             ret = -EINVAL;
             break;
         }
-        ret = domain_soft_reset(d);
+        ret = domain_soft_reset(d, op->cmd == XEN_DOMCTL_soft_reset_cont);
+        if ( ret == -ERESTART )
+        {
+            op->cmd = XEN_DOMCTL_soft_reset_cont;
+            if ( !__copy_field_to_guest(u_domctl, op, cmd) )
+                ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                    "h", u_domctl);
+            else
+                ret = -EFAULT;
+        }
         break;
 
     case XEN_DOMCTL_destroydomain:
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1057,7 +1057,7 @@ int evtchn_unmask(unsigned int port)
     return 0;
 }
 
-int evtchn_reset(struct domain *d)
+int evtchn_reset(struct domain *d, bool resuming)
 {
     unsigned int i;
     int rc = 0;
@@ -1065,11 +1065,40 @@ int evtchn_reset(struct domain *d)
     if ( d != current->domain && !d->controller_pause_count )
         return -EINVAL;
 
-    for ( i = 0; port_is_valid(d, i); i++ )
+    spin_lock(&d->event_lock);
+
+    /*
+     * If we are resuming, then start where we stopped. Otherwise, check
+     * that a reset operation is not already in progress, and if none is,
+     * record that this is now the case.
+     */
+    i = resuming ? d->next_evtchn : !d->next_evtchn;
+    if ( i > d->next_evtchn )
+        d->next_evtchn = i;
+
+    spin_unlock(&d->event_lock);
+
+    if ( !i )
+        return -EBUSY;
+
+    for ( ; port_is_valid(d, i); i++ )
+    {
         evtchn_close(d, i, 1);
 
+        /* NB: Choice of frequency is arbitrary. */
+        if ( !(i & 0x3f) && hypercall_preempt_check() )
+        {
+            spin_lock(&d->event_lock);
+            d->next_evtchn = i;
+            spin_unlock(&d->event_lock);
+            return -ERESTART;
+        }
+    }
+
     spin_lock(&d->event_lock);
 
+    d->next_evtchn = 0;
+
     if ( d->active_evtchns > d->xen_evtchns )
         rc = -EAGAIN;
     else if ( d->evtchn_fifo )
@@ -1204,7 +1233,8 @@ long do_event_channel_op(int cmd, XEN_GU
         break;
     }
 
-    case EVTCHNOP_reset: {
+    case EVTCHNOP_reset:
+    case EVTCHNOP_reset_cont: {
         struct evtchn_reset reset;
         struct domain *d;
 
@@ -1217,9 +1247,13 @@ long do_event_channel_op(int cmd, XEN_GU
 
         rc = xsm_evtchn_reset(XSM_TARGET, current->domain, d);
         if ( !rc )
-            rc = evtchn_reset(d);
+            rc = evtchn_reset(d, cmd == EVTCHNOP_reset_cont);
 
         rcu_unlock_domain(d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_event_channel_op,
+                                               "ih", EVTCHNOP_reset_cont, arg);
         break;
     }
 
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1144,7 +1144,10 @@ struct xen_domctl {
 #define XEN_DOMCTL_iomem_permission              20
 #define XEN_DOMCTL_ioport_permission             21
 #define XEN_DOMCTL_hypercall_init                22
-#define XEN_DOMCTL_arch_setup                    23 /* Obsolete IA64 only */
+#ifdef __XEN__
+/* #define XEN_DOMCTL_arch_setup                 23 Obsolete IA64 only */
+#define XEN_DOMCTL_soft_reset_cont               23
+#endif
 #define XEN_DOMCTL_settimeoffset                 24
 #define XEN_DOMCTL_getvcpuaffinity               25
 #define XEN_DOMCTL_real_mode_area                26 /* Obsolete PPC only */
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -74,6 +74,9 @@
 #define EVTCHNOP_init_control    11
 #define EVTCHNOP_expand_array    12
 #define EVTCHNOP_set_priority    13
+#ifdef __XEN__
+#define EVTCHNOP_reset_cont      14
+#endif
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -171,7 +171,7 @@ void evtchn_check_pollers(struct domain
 void evtchn_2l_init(struct domain *d);
 
 /* Close all event channels and reset to 2-level ABI. */
-int evtchn_reset(struct domain *d);
+int evtchn_reset(struct domain *d, bool resuming);
 
 /*
  * Low-level event channel port ops.
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -356,6 +356,8 @@ struct domain
      * EVTCHNOP_reset).  Read/write access like for active_evtchns.
      */
     unsigned int     xen_evtchns;
+    /* Port to resume from in evtchn_reset(), when in a continuation. */
+    unsigned int     next_evtchn;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
     struct evtchn_fifo_domain *evtchn_fifo;
@@ -628,7 +630,7 @@ int domain_shutdown(struct domain *d, u8
 void domain_resume(struct domain *d);
 void domain_pause_for_debugger(void);
 
-int domain_soft_reset(struct domain *d);
+int domain_soft_reset(struct domain *d, bool resuming);
 
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);

[-- Attachment #11: xsa344/xsa344-4.13-1.patch --]
[-- Type: application/octet-stream, Size: 3959 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_destroy()

Especially closing of fully established interdomain channels can take
quite some time, due to the locking involved. Therefore we shouldn't
assume we can clean up still active ports all in one go. Besides adding
the necessary preemption check, also avoid pointlessly starting from
(or now really ending at) 0; 1 is the lowest numbered port which may
need closing.

Since we're now reducing ->valid_evtchns, free_xen_event_channel(),
and (at least to be on the safe side) notify_via_xen_event_channel()
need to cope with attempts to close / unbind from / send through already
closed (and no longer valid, as per port_is_valid()) ports.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -770,12 +770,14 @@ int domain_kill(struct domain *d)
             return domain_kill(d);
         d->is_dying = DOMDYING_dying;
         argo_destroy(d);
-        evtchn_destroy(d);
         gnttab_release_mappings(d);
         vnuma_destroy(d->vnuma);
         domain_set_outstanding_pages(d, 0);
         /* fallthrough */
     case DOMDYING_dying:
+        rc = evtchn_destroy(d);
+        if ( rc )
+            break;
         rc = domain_relinquish_resources(d);
         if ( rc != 0 )
             break;
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1297,7 +1297,16 @@ int alloc_unbound_xen_event_channel(
 
 void free_xen_event_channel(struct domain *d, int port)
 {
-    BUG_ON(!port_is_valid(d, port));
+    if ( !port_is_valid(d, port) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        BUG_ON(!d->is_dying);
+        return;
+    }
 
     evtchn_close(d, port, 0);
 }
@@ -1309,7 +1318,17 @@ void notify_via_xen_event_channel(struct
     struct domain *rd;
     unsigned long flags;
 
-    ASSERT(port_is_valid(ld, lport));
+    if ( !port_is_valid(ld, lport) )
+    {
+        /*
+         * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
+         * with the spin_barrier() and BUG_ON() in evtchn_destroy().
+         */
+        smp_rmb();
+        ASSERT(ld->is_dying);
+        return;
+    }
+
     lchn = evtchn_from_port(ld, lport);
 
     spin_lock_irqsave(&lchn->lock, flags);
@@ -1380,8 +1399,7 @@ int evtchn_init(struct domain *d, unsign
     return 0;
 }
 
-
-void evtchn_destroy(struct domain *d)
+int evtchn_destroy(struct domain *d)
 {
     unsigned int i;
 
@@ -1390,14 +1408,29 @@ void evtchn_destroy(struct domain *d)
     spin_barrier(&d->event_lock);
 
     /* Close all existing event channels. */
-    for ( i = 0; port_is_valid(d, i); i++ )
+    for ( i = d->valid_evtchns; --i; )
+    {
         evtchn_close(d, i, 0);
 
+        /*
+         * Avoid preempting when called from domain_create()'s error path,
+         * and don't check too often (choice of frequency is arbitrary).
+         */
+        if ( i && !(i & 0x3f) && d->is_dying != DOMDYING_dead &&
+             hypercall_preempt_check() )
+        {
+            write_atomic(&d->valid_evtchns, i);
+            return -ERESTART;
+        }
+    }
+
     ASSERT(!d->active_evtchns);
 
     clear_global_virq_handlers(d);
 
     evtchn_fifo_destroy(d);
+
+    return 0;
 }
 
 
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -136,7 +136,7 @@ struct evtchn
 } __attribute__((aligned(64)));
 
 int  evtchn_init(struct domain *d, unsigned int max_port);
-void evtchn_destroy(struct domain *d); /* from domain_kill */
+int  evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;

[-- Attachment #12: xsa344/xsa344-4.13-2.patch --]
[-- Type: application/octet-stream, Size: 6643 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: evtchn: arrange for preemption in evtchn_reset()

Like for evtchn_destroy() looping over all possible event channels to
close them can take a significant amount of time. Unlike done there, we
can't alter domain properties (i.e. d->valid_evtchns) here. Borrow, in a
lightweight form, the paging domctl continuation concept, redirecting
the continuations to different sub-ops. Just like there this is to be
able to allow for predictable overall results of the involved sub-ops:
Racing requests should either complete or be refused.

Note that a domain can't interfere with an already started (by a remote
domain) reset, due to being paused. It can prevent a remote reset from
happening by leaving a reset unfinished, but that's only going to affect
itself.

This is part of XSA-344.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1214,7 +1214,7 @@ void domain_unpause_except_self(struct d
         domain_unpause(d);
 }
 
-int domain_soft_reset(struct domain *d)
+int domain_soft_reset(struct domain *d, bool resuming)
 {
     struct vcpu *v;
     int rc;
@@ -1228,7 +1228,7 @@ int domain_soft_reset(struct domain *d)
         }
     spin_unlock(&d->shutdown_lock);
 
-    rc = evtchn_reset(d);
+    rc = evtchn_reset(d, resuming);
     if ( rc )
         return rc;
 
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -572,12 +572,22 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
     }
 
     case XEN_DOMCTL_soft_reset:
+    case XEN_DOMCTL_soft_reset_cont:
         if ( d == current->domain ) /* no domain_pause() */
         {
             ret = -EINVAL;
             break;
         }
-        ret = domain_soft_reset(d);
+        ret = domain_soft_reset(d, op->cmd == XEN_DOMCTL_soft_reset_cont);
+        if ( ret == -ERESTART )
+        {
+            op->cmd = XEN_DOMCTL_soft_reset_cont;
+            if ( !__copy_field_to_guest(u_domctl, op, cmd) )
+                ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                    "h", u_domctl);
+            else
+                ret = -EFAULT;
+        }
         break;
 
     case XEN_DOMCTL_destroydomain:
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1057,7 +1057,7 @@ int evtchn_unmask(unsigned int port)
     return 0;
 }
 
-int evtchn_reset(struct domain *d)
+int evtchn_reset(struct domain *d, bool resuming)
 {
     unsigned int i;
     int rc = 0;
@@ -1065,11 +1065,40 @@ int evtchn_reset(struct domain *d)
     if ( d != current->domain && !d->controller_pause_count )
         return -EINVAL;
 
-    for ( i = 0; port_is_valid(d, i); i++ )
+    spin_lock(&d->event_lock);
+
+    /*
+     * If we are resuming, then start where we stopped. Otherwise, check
+     * that a reset operation is not already in progress, and if none is,
+     * record that this is now the case.
+     */
+    i = resuming ? d->next_evtchn : !d->next_evtchn;
+    if ( i > d->next_evtchn )
+        d->next_evtchn = i;
+
+    spin_unlock(&d->event_lock);
+
+    if ( !i )
+        return -EBUSY;
+
+    for ( ; port_is_valid(d, i); i++ )
+    {
         evtchn_close(d, i, 1);
 
+        /* NB: Choice of frequency is arbitrary. */
+        if ( !(i & 0x3f) && hypercall_preempt_check() )
+        {
+            spin_lock(&d->event_lock);
+            d->next_evtchn = i;
+            spin_unlock(&d->event_lock);
+            return -ERESTART;
+        }
+    }
+
     spin_lock(&d->event_lock);
 
+    d->next_evtchn = 0;
+
     if ( d->active_evtchns > d->xen_evtchns )
         rc = -EAGAIN;
     else if ( d->evtchn_fifo )
@@ -1204,7 +1233,8 @@ long do_event_channel_op(int cmd, XEN_GU
         break;
     }
 
-    case EVTCHNOP_reset: {
+    case EVTCHNOP_reset:
+    case EVTCHNOP_reset_cont: {
         struct evtchn_reset reset;
         struct domain *d;
 
@@ -1217,9 +1247,13 @@ long do_event_channel_op(int cmd, XEN_GU
 
         rc = xsm_evtchn_reset(XSM_TARGET, current->domain, d);
         if ( !rc )
-            rc = evtchn_reset(d);
+            rc = evtchn_reset(d, cmd == EVTCHNOP_reset_cont);
 
         rcu_unlock_domain(d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_event_channel_op,
+                                               "ih", EVTCHNOP_reset_cont, arg);
         break;
     }
 
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1152,7 +1152,10 @@ struct xen_domctl {
 #define XEN_DOMCTL_iomem_permission              20
 #define XEN_DOMCTL_ioport_permission             21
 #define XEN_DOMCTL_hypercall_init                22
-#define XEN_DOMCTL_arch_setup                    23 /* Obsolete IA64 only */
+#ifdef __XEN__
+/* #define XEN_DOMCTL_arch_setup                 23 Obsolete IA64 only */
+#define XEN_DOMCTL_soft_reset_cont               23
+#endif
 #define XEN_DOMCTL_settimeoffset                 24
 #define XEN_DOMCTL_getvcpuaffinity               25
 #define XEN_DOMCTL_real_mode_area                26 /* Obsolete PPC only */
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -74,6 +74,9 @@
 #define EVTCHNOP_init_control    11
 #define EVTCHNOP_expand_array    12
 #define EVTCHNOP_set_priority    13
+#ifdef __XEN__
+#define EVTCHNOP_reset_cont      14
+#endif
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -171,7 +171,7 @@ void evtchn_check_pollers(struct domain
 void evtchn_2l_init(struct domain *d);
 
 /* Close all event channels and reset to 2-level ABI. */
-int evtchn_reset(struct domain *d);
+int evtchn_reset(struct domain *d, bool resuming);
 
 /*
  * Low-level event channel port ops.
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -394,6 +394,8 @@ struct domain
      * EVTCHNOP_reset).  Read/write access like for active_evtchns.
      */
     unsigned int     xen_evtchns;
+    /* Port to resume from in evtchn_reset(), when in a continuation. */
+    unsigned int     next_evtchn;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;
     struct evtchn_fifo_domain *evtchn_fifo;
@@ -663,7 +665,7 @@ int domain_shutdown(struct domain *d, u8
 void domain_resume(struct domain *d);
 void domain_pause_for_debugger(void);
 
-int domain_soft_reset(struct domain *d);
+int domain_soft_reset(struct domain *d, bool resuming);
 
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-09-22 13:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22 13:37 Xen Security Advisory 344 v4 (CVE-2020-25601) - lack of preemption in evtchn_reset() / evtchn_destroy() Xen.org security team

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).