All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] xen: domain-tracked allocations, and fault injection
@ 2020-12-23 16:34 Andrew Cooper
  2020-12-23 16:34 ` [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs Andrew Cooper
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Andrew Cooper @ 2020-12-23 16:34 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel

This was not the christmas hacking project that I was planning to do, but it
has had some exciting results.

After some discussion on an earlier thread, Tamas has successfully got fuzzing
of Xen working via kfx, and this series is a prototype for providing better
testing infrastructure.

And to prove a point, this series has already found a memory leak in ARM's
dom0less smoke test.

From https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/929518792

  (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
  (XEN) Freed 328kB init memory.
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER4
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER8
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER12
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER16
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER20
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER24
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER28
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER32
  (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER0
  (XEN) physdev.c:16:d0v0 PHYSDEVOP cmd=25: not implemented
  (XEN) physdev.c:16:d0v0 PHYSDEVOP cmd=15: not implemented
  (XEN) physdev.c:16:d0v0 PHYSDEVOP cmd=15: not implemented
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) d1 has 2 outstanding heap allocations
  (XEN) ****************************************
  (XEN)
  (XEN) Reboot in five seconds...

For some reason, neither of the evtchn default memory allocations are freed,
but it's not clear why d1 shut down to being with.  Stefano - any ideas?

Andrew Cooper (4):
  xen/dmalloc: Introduce dmalloc() APIs
  xen/evtchn: Switch to dmalloc
  xen/domctl: Introduce fault_ttl
  tools/misc: Test for fault injection

 tools/misc/.gitignore       |  1 +
 tools/misc/Makefile         |  5 ++++
 tools/misc/xen-fault-ttl.c  | 56 +++++++++++++++++++++++++++++++++++++++++++++
 xen/common/Makefile         |  1 +
 xen/common/dmalloc.c        | 25 ++++++++++++++++++++
 xen/common/domain.c         | 14 ++++++++++--
 xen/common/event_channel.c  | 14 ++++++------
 xen/include/public/domctl.h |  1 +
 xen/include/xen/dmalloc.h   | 29 +++++++++++++++++++++++
 xen/include/xen/sched.h     |  3 +++
 10 files changed, 140 insertions(+), 9 deletions(-)
 create mode 100644 tools/misc/xen-fault-ttl.c
 create mode 100644 xen/common/dmalloc.c
 create mode 100644 xen/include/xen/dmalloc.h

-- 
2.11.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
  2020-12-23 16:34 [PATCH 0/4] xen: domain-tracked allocations, and fault injection Andrew Cooper
@ 2020-12-23 16:34 ` Andrew Cooper
  2021-01-05 15:56   ` Jan Beulich
                     ` (2 more replies)
  2020-12-23 16:34 ` [PATCH 2/4] xen/evtchn: Switch to dmalloc Andrew Cooper
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 17+ messages in thread
From: Andrew Cooper @ 2020-12-23 16:34 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel

Wrappers for xmalloc() and friends, which track allocations tied to a specific
domain.

Check for any leaked memory at domain destruction time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>

RFC:
 * This probably wants to be less fatal in release builds
 * In an ideal world, we'd also want to count the total number of bytes
   allocated from the xmalloc heap, which would be interesting to print in the
   'q' debugkey.  However, that data is fairly invasive to obtain.
 * More complicated logic could track the origins of each allocation, and be
   able to identify which one(s) leaked.
---
 xen/common/Makefile       |  1 +
 xen/common/dmalloc.c      | 19 +++++++++++++++++++
 xen/common/domain.c       |  6 ++++++
 xen/include/xen/dmalloc.h | 29 +++++++++++++++++++++++++++++
 xen/include/xen/sched.h   |  2 ++
 5 files changed, 57 insertions(+)
 create mode 100644 xen/common/dmalloc.c
 create mode 100644 xen/include/xen/dmalloc.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 7a4e652b57..c5d9c23fd1 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_CORE_PARKING) += core_parking.o
 obj-y += cpu.o
 obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
 obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
+obj-y += dmalloc.o
 obj-y += domain.o
 obj-y += event_2l.o
 obj-y += event_channel.o
diff --git a/xen/common/dmalloc.c b/xen/common/dmalloc.c
new file mode 100644
index 0000000000..e3a0e546c2
--- /dev/null
+++ b/xen/common/dmalloc.c
@@ -0,0 +1,19 @@
+#include <xen/dmalloc.h>
+#include <xen/sched.h>
+#include <xen/xmalloc.h>
+
+void dfree(struct domain *d, void *ptr)
+{
+    atomic_dec(&d->dalloc_heap);
+    xfree(ptr);
+}
+
+void *_dzalloc(struct domain *d, size_t size, size_t align)
+{
+    void *ptr = _xmalloc(size, align);
+
+    if ( ptr )
+        atomic_inc(&d->dalloc_heap);
+
+    return ptr;
+}
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d151be3f36..1db1c0e70a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -332,6 +332,8 @@ static int domain_teardown(struct domain *d)
  */
 static void _domain_destroy(struct domain *d)
 {
+    int outstanding;
+
     BUG_ON(!d->is_dying);
     BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED);
 
@@ -347,6 +349,10 @@ static void _domain_destroy(struct domain *d)
 
     lock_profile_deregister_struct(LOCKPROF_TYPE_PERDOM, d);
 
+    outstanding = atomic_read(&d->dalloc_heap);
+    if ( outstanding )
+        panic("%pd has %d outstanding heap allocations\n", d, outstanding);
+
     free_domain_struct(d);
 }
 
diff --git a/xen/include/xen/dmalloc.h b/xen/include/xen/dmalloc.h
new file mode 100644
index 0000000000..a90cf0259c
--- /dev/null
+++ b/xen/include/xen/dmalloc.h
@@ -0,0 +1,29 @@
+#ifndef XEN_DMALLOC_H
+#define XEN_DMALLOC_H
+
+#include <xen/types.h>
+
+struct domain;
+
+#define dzalloc_array(d, _type, _num)                                   \
+    ((_type *)_dzalloc_array(d, sizeof(_type), __alignof__(_type), _num))
+
+
+void dfree(struct domain *d, void *ptr);
+
+#define DFREE(d, p)                             \
+    do {                                        \
+        dfree(d, p);                            \
+        (p) = NULL;                             \
+    } while ( 0 )
+
+
+void *_dzalloc(struct domain *d, size_t size, size_t align);
+
+static inline void *_dzalloc_array(struct domain *d, size_t size,
+                                   size_t align, size_t num)
+{
+    return _dzalloc(d, size * num, align);
+}
+
+#endif /* XEN_DMALLOC_H */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3e46384a3c..8ed8b55a1e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -349,6 +349,8 @@ struct domain
     atomic_t         shr_pages;         /* shared pages */
     atomic_t         paged_pages;       /* paged-out pages */
 
+    atomic_t         dalloc_heap;       /* Number of xmalloc-like allocations. */
+
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
     struct sched_unit *sched_unit_list;
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] xen/evtchn: Switch to dmalloc
  2020-12-23 16:34 [PATCH 0/4] xen: domain-tracked allocations, and fault injection Andrew Cooper
  2020-12-23 16:34 ` [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs Andrew Cooper
@ 2020-12-23 16:34 ` Andrew Cooper
  2021-01-05 16:09   ` Jan Beulich
  2020-12-23 16:34 ` [PATCH 3/4] xen/domctl: Introduce fault_ttl Andrew Cooper
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2020-12-23 16:34 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel

This causes memory allocations to be tied to domain in question.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>

RFC: Likely more to convert.  This is just a minimal attempt.
---
 xen/common/event_channel.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 4a48094356..f0ca0933e3 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -16,6 +16,7 @@
 
 #include "event_channel.h"
 
+#include <xen/dmalloc.h>
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/errno.h>
@@ -153,7 +154,7 @@ static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port)
     struct evtchn *chn;
     unsigned int i;
 
-    chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET);
+    chn = dzalloc_array(d, struct evtchn, EVTCHNS_PER_BUCKET);
     if ( !chn )
         return NULL;
 
@@ -182,7 +183,7 @@ static void free_evtchn_bucket(struct domain *d, struct evtchn *bucket)
     for ( i = 0; i < EVTCHNS_PER_BUCKET; i++ )
         xsm_free_security_evtchn(bucket + i);
 
-    xfree(bucket);
+    dfree(d, bucket);
 }
 
 int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
@@ -204,7 +205,7 @@ int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
 
         if ( !group_from_port(d, port) )
         {
-            grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
+            grp = dzalloc_array(d, struct evtchn *, BUCKETS_PER_GROUP);
             if ( !grp )
                 return -ENOMEM;
             group_from_port(d, port) = grp;
@@ -1488,7 +1489,7 @@ int evtchn_init(struct domain *d, unsigned int max_port)
     write_atomic(&d->active_evtchns, 0);
 
 #if MAX_VIRT_CPUS > BITS_PER_LONG
-    d->poll_mask = xzalloc_array(unsigned long, BITS_TO_LONGS(d->max_vcpus));
+    d->poll_mask = dzalloc_array(d, unsigned long, BITS_TO_LONGS(d->max_vcpus));
     if ( !d->poll_mask )
     {
         free_evtchn_bucket(d, d->evtchn);
@@ -1545,13 +1546,12 @@ void evtchn_destroy_final(struct domain *d)
             continue;
         for ( j = 0; j < BUCKETS_PER_GROUP; j++ )
             free_evtchn_bucket(d, d->evtchn_group[i][j]);
-        xfree(d->evtchn_group[i]);
+        dfree(d, d->evtchn_group[i]);
     }
     free_evtchn_bucket(d, d->evtchn);
 
 #if MAX_VIRT_CPUS > BITS_PER_LONG
-    xfree(d->poll_mask);
-    d->poll_mask = NULL;
+    DFREE(d, d->poll_mask);
 #endif
 }
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] xen/domctl: Introduce fault_ttl
  2020-12-23 16:34 [PATCH 0/4] xen: domain-tracked allocations, and fault injection Andrew Cooper
  2020-12-23 16:34 ` [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs Andrew Cooper
  2020-12-23 16:34 ` [PATCH 2/4] xen/evtchn: Switch to dmalloc Andrew Cooper
@ 2020-12-23 16:34 ` Andrew Cooper
  2021-01-05 16:39   ` Jan Beulich
  2020-12-23 16:34 ` [PATCH 4/4] tools/misc: Test for fault injection Andrew Cooper
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2020-12-23 16:34 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel

To inject a simulated resource failure, for testing purposes.

Given a specific set of hypercall parameters, the failure is in a repeatable
position, for the currently booted Xen.  The exact position of failures is
highly dependent on the build of Xen, and hardware support.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>

RFC:
 * Probably wants to be Kconfig'd
 * I'm thinking of dropping handle from xen_domctl_createdomain because it's a
   waste of valuable space.
---
 xen/common/dmalloc.c        | 8 +++++++-
 xen/common/domain.c         | 8 ++++++--
 xen/include/public/domctl.h | 1 +
 xen/include/xen/sched.h     | 1 +
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/xen/common/dmalloc.c b/xen/common/dmalloc.c
index e3a0e546c2..1f5d0f5627 100644
--- a/xen/common/dmalloc.c
+++ b/xen/common/dmalloc.c
@@ -10,7 +10,13 @@ void dfree(struct domain *d, void *ptr)
 
 void *_dzalloc(struct domain *d, size_t size, size_t align)
 {
-    void *ptr = _xmalloc(size, align);
+    void *ptr;
+
+    if ( atomic_read(&d->fault_ttl) &&
+         atomic_dec_and_test(&d->fault_ttl) )
+        return NULL;
+
+    ptr = _xmalloc(size, align);
 
     if ( ptr )
         atomic_inc(&d->dalloc_heap);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 1db1c0e70a..cd73321980 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -427,14 +427,18 @@ struct domain *domain_create(domid_t domid,
     if ( (d = alloc_domain_struct()) == NULL )
         return ERR_PTR(-ENOMEM);
 
-    d->options = config ? config->flags : 0;
-
     /* Sort out our idea of is_system_domain(). */
     d->domain_id = domid;
 
     /* Debug sanity. */
     ASSERT(is_system_domain(d) ? config == NULL : config != NULL);
 
+    if ( config )
+    {
+        d->options = config->flags;
+        atomic_set(&d->fault_ttl, config->fault_ttl);
+    }
+
     /* Sort out our idea of is_control_domain(). */
     d->is_privileged = is_priv;
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 666aeb71bf..aaa3d66616 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -48,6 +48,7 @@
 /* XEN_DOMCTL_createdomain */
 struct xen_domctl_createdomain {
     /* IN parameters */
+    uint32_t fault_ttl;
     uint32_t ssidref;
     xen_domain_handle_t handle;
  /* Is this an HVM guest (as opposed to a PV guest)? */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 8ed8b55a1e..620a9f20e5 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -349,6 +349,7 @@ struct domain
     atomic_t         shr_pages;         /* shared pages */
     atomic_t         paged_pages;       /* paged-out pages */
 
+    atomic_t         fault_ttl;         /* Time until a simulated resource failure. */
     atomic_t         dalloc_heap;       /* Number of xmalloc-like allocations. */
 
     /* Scheduling. */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] tools/misc: Test for fault injection
  2020-12-23 16:34 [PATCH 0/4] xen: domain-tracked allocations, and fault injection Andrew Cooper
                   ` (2 preceding siblings ...)
  2020-12-23 16:34 ` [PATCH 3/4] xen/domctl: Introduce fault_ttl Andrew Cooper
@ 2020-12-23 16:34 ` Andrew Cooper
  2020-12-23 16:41   ` Jan Beulich
  2021-01-08  1:49 ` [PATCH 0/4] xen: domain-tracked allocations, and " Stefano Stabellini
  2022-12-11 17:21 ` Julien Grall
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2020-12-23 16:34 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>

RFC: This wants expanding to a few more "default" configurations, and then
some thought needs putting torwards automating it.
---
 tools/misc/.gitignore      |  1 +
 tools/misc/Makefile        |  5 +++++
 tools/misc/xen-fault-ttl.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+)
 create mode 100644 tools/misc/xen-fault-ttl.c

diff --git a/tools/misc/.gitignore b/tools/misc/.gitignore
index c5fe2cfccd..8d117c3b7d 100644
--- a/tools/misc/.gitignore
+++ b/tools/misc/.gitignore
@@ -1 +1,2 @@
+xen-fault-ttl
 xen-ucode
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 7d37f297a9..5c1ed9a284 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -9,6 +9,7 @@ CFLAGS += $(CFLAGS_libxenctrl)
 CFLAGS += $(CFLAGS_libxenguest)
 CFLAGS += $(CFLAGS_xeninclude)
 CFLAGS += $(CFLAGS_libxenstore)
+CFLAGS += -Wno-declaration-after-statement
 
 # Everything to be installed in regular bin/
 INSTALL_BIN-$(CONFIG_X86)      += xen-cpuid
@@ -25,6 +26,7 @@ INSTALL_SBIN-$(CONFIG_X86)     += xen-lowmemd
 INSTALL_SBIN-$(CONFIG_X86)     += xen-mfndump
 INSTALL_SBIN-$(CONFIG_X86)     += xen-ucode
 INSTALL_SBIN                   += xencov
+INSTALL_SBIN                   += xen-fault-ttl
 INSTALL_SBIN                   += xenhypfs
 INSTALL_SBIN                   += xenlockprof
 INSTALL_SBIN                   += xenperf
@@ -76,6 +78,9 @@ distclean: clean
 xen-cpuid: xen-cpuid.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
 
+xen-fault-ttl: xen-fault-ttl.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
+
 xen-hvmctx: xen-hvmctx.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-fault-ttl.c b/tools/misc/xen-fault-ttl.c
new file mode 100644
index 0000000000..e7953443da
--- /dev/null
+++ b/tools/misc/xen-fault-ttl.c
@@ -0,0 +1,56 @@
+#include <stdio.h>
+#include <err.h>
+#include <string.h>
+#include <errno.h>
+
+#include <xenctrl.h>
+
+int main(int argc, char **argv)
+{
+    int rc;
+    xc_interface *xch = xc_interface_open(NULL, NULL, 0);
+
+    if ( !xch )
+        err(1, "xc_interface_open");
+
+    struct xen_domctl_createdomain create = {
+        .fault_ttl = 1,
+        .flags = (XEN_DOMCTL_CDF_hvm |
+                  XEN_DOMCTL_CDF_hap),
+        .max_vcpus = 1,
+        .max_evtchn_port = -1,
+        .max_grant_frames = 64,
+        .max_maptrack_frames = 1024,
+        .arch = {
+            .emulation_flags = XEN_X86_EMU_LAPIC,
+        },
+    };
+    uint32_t domid = 0;
+
+    for ( rc = 1; rc && errno == ENOMEM; create.fault_ttl++ )
+        rc = xc_domain_create(xch, &domid, &create);
+
+    if ( rc == 0 )
+    {
+        printf("Created d%u with fault_ttl of %u\n", domid, create.fault_ttl);
+
+        xc_domain_destroy(xch, domid);
+    }
+    else
+        printf("Domain creation failed: %d: %s\n",
+               -errno, strerror(errno));
+
+    xc_interface_close(xch);
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] tools/misc: Test for fault injection
  2020-12-23 16:34 ` [PATCH 4/4] tools/misc: Test for fault injection Andrew Cooper
@ 2020-12-23 16:41   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-12-23 16:41 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 23.12.2020 17:34, Andrew Cooper wrote:
> --- /dev/null
> +++ b/tools/misc/xen-fault-ttl.c
> @@ -0,0 +1,56 @@
> +#include <stdio.h>
> +#include <err.h>
> +#include <string.h>
> +#include <errno.h>
> +
> +#include <xenctrl.h>
> +
> +int main(int argc, char **argv)
> +{
> +    int rc;
> +    xc_interface *xch = xc_interface_open(NULL, NULL, 0);
> +
> +    if ( !xch )
> +        err(1, "xc_interface_open");
> +
> +    struct xen_domctl_createdomain create = {
> +        .fault_ttl = 1,
> +        .flags = (XEN_DOMCTL_CDF_hvm |
> +                  XEN_DOMCTL_CDF_hap),
> +        .max_vcpus = 1,
> +        .max_evtchn_port = -1,
> +        .max_grant_frames = 64,
> +        .max_maptrack_frames = 1024,
> +        .arch = {
> +            .emulation_flags = XEN_X86_EMU_LAPIC,
> +        },
> +    };
> +    uint32_t domid = 0;
> +
> +    for ( rc = 1; rc && errno == ENOMEM; create.fault_ttl++ )
> +        rc = xc_domain_create(xch, &domid, &create);

The very first time you get here, how does errno end up being
set to ENOMEM? To me

    do {
        create.fault_ttl++;
        rc = xc_domain_create(xch, &domid, &create);
    } while ( rc && errno == ENOMEM );

would seem more natural (and also avoid the somewhat odd
"rc = 1").

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
  2020-12-23 16:34 ` [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs Andrew Cooper
@ 2021-01-05 15:56   ` Jan Beulich
  2021-01-13 23:16     ` Andrew Cooper
  2021-01-05 16:01   ` Jan Beulich
  2022-12-11 17:24   ` Julien Grall
  2 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-01-05 15:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 23.12.2020 17:34, Andrew Cooper wrote:
> RFC:
>  * This probably wants to be less fatal in release builds

I'm not even convinced this wants to be a panic() in debug builds.

>  * In an ideal world, we'd also want to count the total number of bytes
>    allocated from the xmalloc heap, which would be interesting to print in the
>    'q' debugkey.  However, that data is fairly invasive to obtain.

Unless we used an xmem_pool rather than the new interface being
a thin layer around xmalloc(). (This would then also provide
better locality of the allocations, i.e. different domains
wouldn't share allocations from the same page.) And even without
doing so, adding a function to retrieve the actual size
shouldn't be all that difficult - internally xmalloc_tlsf.c
knows the size, after all, for e.g. xrealloc() to work right.

> --- /dev/null
> +++ b/xen/include/xen/dmalloc.h
> @@ -0,0 +1,29 @@
> +#ifndef XEN_DMALLOC_H
> +#define XEN_DMALLOC_H
> +
> +#include <xen/types.h>
> +
> +struct domain;
> +
> +#define dzalloc_array(d, _type, _num)                                   \

While I realize I'll get bashed again, the inconsistency of using
(or not) leading underscores is too odd to not comment upon. I
don't see what use they are here, irrespective of my general view
on the topic.

> +    ((_type *)_dzalloc_array(d, sizeof(_type), __alignof__(_type), _num))
> +
> +
> +void dfree(struct domain *d, void *ptr);

May I ask to avoid double blank lines?

> +#define DFREE(d, p)                             \
> +    do {                                        \
> +        dfree(d, p);                            \
> +        (p) = NULL;                             \
> +    } while ( 0 )
> +
> +
> +void *_dzalloc(struct domain *d, size_t size, size_t align);
> +
> +static inline void *_dzalloc_array(struct domain *d, size_t size,
> +                                   size_t align, size_t num)
> +{
> +    return _dzalloc(d, size * num, align);

No protection at all against the multiplication overflowing?

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
  2020-12-23 16:34 ` [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs Andrew Cooper
  2021-01-05 15:56   ` Jan Beulich
@ 2021-01-05 16:01   ` Jan Beulich
  2022-12-11 17:24   ` Julien Grall
  2 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-01-05 16:01 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 23.12.2020 17:34, Andrew Cooper wrote:
> --- /dev/null
> +++ b/xen/common/dmalloc.c
> @@ -0,0 +1,19 @@
> +#include <xen/dmalloc.h>
> +#include <xen/sched.h>
> +#include <xen/xmalloc.h>
> +
> +void dfree(struct domain *d, void *ptr)
> +{
> +    atomic_dec(&d->dalloc_heap);
> +    xfree(ptr);
> +}
> +
> +void *_dzalloc(struct domain *d, size_t size, size_t align)
> +{
> +    void *ptr = _xmalloc(size, align);
> +
> +    if ( ptr )
> +        atomic_inc(&d->dalloc_heap);

While this is properly conditional, the freeing side also needs to
tolerate NULL coming in (noticed only while looking at patch 2).
I also wonder whether ZERO_BLOCK_PTR wants special casing.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] xen/evtchn: Switch to dmalloc
  2020-12-23 16:34 ` [PATCH 2/4] xen/evtchn: Switch to dmalloc Andrew Cooper
@ 2021-01-05 16:09   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-01-05 16:09 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 23.12.2020 17:34, Andrew Cooper wrote:
> RFC: Likely more to convert.  This is just a minimal attempt.

Looks complete as far as this one file goes. More can be found
in evtchn_fifo.c of course.

> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -16,6 +16,7 @@
>  
>  #include "event_channel.h"
>  
> +#include <xen/dmalloc.h>
>  #include <xen/init.h>
>  #include <xen/lib.h>
>  #include <xen/errno.h>
> @@ -153,7 +154,7 @@ static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port)
>      struct evtchn *chn;
>      unsigned int i;
>  
> -    chn = xzalloc_array(struct evtchn, EVTCHNS_PER_BUCKET);
> +    chn = dzalloc_array(d, struct evtchn, EVTCHNS_PER_BUCKET);
>      if ( !chn )
>          return NULL;

There's an error path down from here which also needs converting.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] xen/domctl: Introduce fault_ttl
  2020-12-23 16:34 ` [PATCH 3/4] xen/domctl: Introduce fault_ttl Andrew Cooper
@ 2021-01-05 16:39   ` Jan Beulich
  2021-01-13 23:58     ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-01-05 16:39 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 23.12.2020 17:34, Andrew Cooper wrote:
> To inject a simulated resource failure, for testing purposes.
> 
> Given a specific set of hypercall parameters, the failure is in a repeatable
> position, for the currently booted Xen.  The exact position of failures is
> highly dependent on the build of Xen, and hardware support.

What about other kinds of resources, or ones only indirectly
related to memory allocations (e.g. where we don't mean to
associate them with the domain)?

> RFC:
>  * Probably wants to be Kconfig'd

Yes.

>  * I'm thinking of dropping handle from xen_domctl_createdomain because it's a
>    waste of valuable space.

Looks entirely unrelated, but yes - as long as Xen itself has no
consumer of the field. The more that there already is
XEN_DOMCTL_setdomainhandle.

> --- a/xen/common/dmalloc.c
> +++ b/xen/common/dmalloc.c
> @@ -10,7 +10,13 @@ void dfree(struct domain *d, void *ptr)
>  
>  void *_dzalloc(struct domain *d, size_t size, size_t align)
>  {
> -    void *ptr = _xmalloc(size, align);
> +    void *ptr;
> +
> +    if ( atomic_read(&d->fault_ttl) &&
> +         atomic_dec_and_test(&d->fault_ttl) )
> +        return NULL;

Perhaps we want to introduce Linux'es atomic_add_unless()?

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] xen: domain-tracked allocations, and fault injection
  2020-12-23 16:34 [PATCH 0/4] xen: domain-tracked allocations, and fault injection Andrew Cooper
                   ` (3 preceding siblings ...)
  2020-12-23 16:34 ` [PATCH 4/4] tools/misc: Test for fault injection Andrew Cooper
@ 2021-01-08  1:49 ` Stefano Stabellini
  2022-12-11 17:21 ` Julien Grall
  5 siblings, 0 replies; 17+ messages in thread
From: Stefano Stabellini @ 2021-01-08  1:49 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel

On Wed, 23 Dec 2020, Andrew Cooper wrote:
> This was not the christmas hacking project that I was planning to do, but it
> has had some exciting results.
> 
> After some discussion on an earlier thread, Tamas has successfully got fuzzing
> of Xen working via kfx, and this series is a prototype for providing better
> testing infrastructure.
> 
> And to prove a point, this series has already found a memory leak in ARM's
> dom0less smoke test.
> 
> >From https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/929518792
> 
>   (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
>   (XEN) Freed 328kB init memory.
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER4
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER8
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER12
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER16
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER20
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER24
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER28
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER32
>   (XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER0
>   (XEN) physdev.c:16:d0v0 PHYSDEVOP cmd=25: not implemented
>   (XEN) physdev.c:16:d0v0 PHYSDEVOP cmd=15: not implemented
>   (XEN) physdev.c:16:d0v0 PHYSDEVOP cmd=15: not implemented
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 0:
>   (XEN) d1 has 2 outstanding heap allocations
>   (XEN) ****************************************
>   (XEN)
>   (XEN) Reboot in five seconds...
> 
> For some reason, neither of the evtchn default memory allocations are freed,
> but it's not clear why d1 shut down to being with.  Stefano - any ideas?

Right, this is confusing. It is not hard to believe that memory leaks
exist on the dom0less shutdown path because dom0less domains are not
really shutdown today. I imagine there could be issues.

But I don't understand why _domain_destroy gets called in the first
place for d1. Maybe a domain_create failure leads to goto fail and
_domain_destroy(). I wanted to run a test but ran out of time.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
  2021-01-05 15:56   ` Jan Beulich
@ 2021-01-13 23:16     ` Andrew Cooper
  2021-01-14 10:14       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2021-01-13 23:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 05/01/2021 15:56, Jan Beulich wrote:
> On 23.12.2020 17:34, Andrew Cooper wrote:
>> RFC:
>>  * This probably wants to be less fatal in release builds
> I'm not even convinced this wants to be a panic() in debug builds.

Any memory leak spotted by this is an XSA, except in the narrow case of
being due exclusively to a weird and non-default order of hypercalls.

It absolutely needs to be something fatal in debug builds, for avoid
going unnoticed by testing.  Patch 4 is my proposed solution for this,
which will hopefully prevent bugs from escaping staging.

For release builds, a real memory leak is less bad behaviour than taking
the host down, but it certainly shouldn't shouldn't go unnoticed.

>>  * In an ideal world, we'd also want to count the total number of bytes
>>    allocated from the xmalloc heap, which would be interesting to print in the
>>    'q' debugkey.  However, that data is fairly invasive to obtain.
> Unless we used an xmem_pool rather than the new interface being
> a thin layer around xmalloc(). (This would then also provide
> better locality of the allocations, i.e. different domains
> wouldn't share allocations from the same page.)

I'm not sure if using a separate memory pool is a clear cut improvement.

For an attacker which has a corruption primitive, a single shared pool
will reduce the chance of the exploit being stable across different
systems.  Improved locality of allocations is an advantage from the
attackers point of view, but the proper way to combat that is with a
real randomised heap allocator.

Sharing within the same page isn't an issue, so long as we respect a
cache line minimum allocation size.

More importantly however, until we know exactly how much memory we're
talking about here, switching to a per-domain pool might be a massive
waste.  The xmalloc() allocations are in the noise compared to RAM, and
might only be a small multiple of the pool metadata to begin with.

> And even without doing so, adding a function to retrieve the actual size
> shouldn't be all that difficult - internally xmalloc_tlsf.c
> knows the size, after all, for e.g. xrealloc() to work right.

Yeah - the internals of the pool can calculate this.  I was considering
doing just this, but wasn't sure how exposing an API for this would go down.

For maximum flexibility, it would be my preferred way forward.

>> --- /dev/null
>> +++ b/xen/include/xen/dmalloc.h
>> @@ -0,0 +1,29 @@
>> +#ifndef XEN_DMALLOC_H
>> +#define XEN_DMALLOC_H
>> +
>> +#include <xen/types.h>
>> +
>> +struct domain;
>> +
>> +#define dzalloc_array(d, _type, _num)                                   \
> While I realize I'll get bashed again, the inconsistency of using
> (or not) leading underscores is too odd to not comment upon. I
> don't see what use they are here, irrespective of my general view
> on the topic.
>
>> +    ((_type *)_dzalloc_array(d, sizeof(_type), __alignof__(_type), _num))
>> +
>> +
>> +void dfree(struct domain *d, void *ptr);
> May I ask to avoid double blank lines?

Both of these were just copied from xmalloc.h without any real thought. 
What I did forget was to plaster this series with RFC.

>> +#define DFREE(d, p)                             \
>> +    do {                                        \
>> +        dfree(d, p);                            \
>> +        (p) = NULL;                             \
>> +    } while ( 0 )
>> +
>> +
>> +void *_dzalloc(struct domain *d, size_t size, size_t align);
>> +
>> +static inline void *_dzalloc_array(struct domain *d, size_t size,
>> +                                   size_t align, size_t num)
>> +{
>> +    return _dzalloc(d, size * num, align);
> No protection at all against the multiplication overflowing?

Well - xmalloc doesn't have any either.  But yes - both want some
compiler-aided overflow detection, rather than messing around with
arbitrary limits pretending to be an overflow check.

~Andrew


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] xen/domctl: Introduce fault_ttl
  2021-01-05 16:39   ` Jan Beulich
@ 2021-01-13 23:58     ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2021-01-13 23:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 05/01/2021 16:39, Jan Beulich wrote:
> On 23.12.2020 17:34, Andrew Cooper wrote:
>> To inject a simulated resource failure, for testing purposes.
>>
>> Given a specific set of hypercall parameters, the failure is in a repeatable
>> position, for the currently booted Xen.  The exact position of failures is
>> highly dependent on the build of Xen, and hardware support.
> What about other kinds of resources, or ones only indirectly
> related to memory allocations (e.g. where we don't mean to
> associate them with the domain)?

I intend this for "any fail-able operation", not just plain memory
allocation.

It's merely that memory allocation is the simple first resource to go
after.  And after all, these 4 patches alone have already identified a
real bug in ARM's dom0less logic.

Perhaps what we actually want is a general "bool simulate_failure(d)"
which dalloc() and others can use.

>>  * I'm thinking of dropping handle from xen_domctl_createdomain because it's a
>>    waste of valuable space.
> Looks entirely unrelated, but yes - as long as Xen itself has no
> consumer of the field. The more that there already is
> XEN_DOMCTL_setdomainhandle.

It's purely for pretty printing in 'q' and handing back to userspace for
xentop, et al.  Nothing in Xen acts meaningfully upon the value.

>> --- a/xen/common/dmalloc.c
>> +++ b/xen/common/dmalloc.c
>> @@ -10,7 +10,13 @@ void dfree(struct domain *d, void *ptr)
>>  
>>  void *_dzalloc(struct domain *d, size_t size, size_t align)
>>  {
>> -    void *ptr = _xmalloc(size, align);
>> +    void *ptr;
>> +
>> +    if ( atomic_read(&d->fault_ttl) &&
>> +         atomic_dec_and_test(&d->fault_ttl) )
>> +        return NULL;
> Perhaps we want to introduce Linux'es atomic_add_unless()?

Possibly.  I definitely got caught out by the semantics of
atomic_dec_and_test() seeing as it has an implicit "!= 0".

~Andrew


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
  2021-01-13 23:16     ` Andrew Cooper
@ 2021-01-14 10:14       ` Jan Beulich
  2021-01-14 15:30         ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-01-14 10:14 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 14.01.2021 00:16, Andrew Cooper wrote:
> On 05/01/2021 15:56, Jan Beulich wrote:
>> On 23.12.2020 17:34, Andrew Cooper wrote:
>>> RFC:
>>>  * This probably wants to be less fatal in release builds
>> I'm not even convinced this wants to be a panic() in debug builds.
> 
> Any memory leak spotted by this is an XSA, except in the narrow case of
> being due exclusively to a weird and non-default order of hypercalls.
> 
> It absolutely needs to be something fatal in debug builds, for avoid
> going unnoticed by testing.

This is a perfectly valid position to take, but I'm afraid once
again isn't the only possible one. Since I do routinely look at
logs, I'm personally in favor of avoiding crashing the host
even for debug builds. The more that I've had pretty bad fallout
from crashes in the past, due to (afaict) bugs in a file system
driver in Linux that persisted over a longer period of time.

>  Patch 4 is my proposed solution for this,
> which will hopefully prevent bugs from escaping staging.
> 
> For release builds, a real memory leak is less bad behaviour than taking
> the host down, but it certainly shouldn't shouldn't go unnoticed.

Of course - it absolutely needs logging.

>>>  * In an ideal world, we'd also want to count the total number of bytes
>>>    allocated from the xmalloc heap, which would be interesting to print in the
>>>    'q' debugkey.  However, that data is fairly invasive to obtain.
>> Unless we used an xmem_pool rather than the new interface being
>> a thin layer around xmalloc(). (This would then also provide
>> better locality of the allocations, i.e. different domains
>> wouldn't share allocations from the same page.)
> 
> I'm not sure if using a separate memory pool is a clear cut improvement.

Neither am I, but it's an option to at least consider.

> For an attacker which has a corruption primitive, a single shared pool
> will reduce the chance of the exploit being stable across different
> systems.  Improved locality of allocations is an advantage from the
> attackers point of view, but the proper way to combat that is with a
> real randomised heap allocator.
> 
> Sharing within the same page isn't an issue, so long as we respect a
> cache line minimum allocation size.
> 
> More importantly however, until we know exactly how much memory we're
> talking about here, switching to a per-domain pool might be a massive
> waste.  The xmalloc() allocations are in the noise compared to RAM, and
> might only be a small multiple of the pool metadata to begin with.
> 
>> And even without doing so, adding a function to retrieve the actual size
>> shouldn't be all that difficult - internally xmalloc_tlsf.c
>> knows the size, after all, for e.g. xrealloc() to work right.
> 
> Yeah - the internals of the pool can calculate this.  I was considering
> doing just this, but wasn't sure how exposing an API for this would go down.
> 
> For maximum flexibility, it would be my preferred way forward.

I don't seen an issue with exposing such an API, so long as it's
made clear what purposes we want to permit it to be used for.

>>> +#define DFREE(d, p)                             \
>>> +    do {                                        \
>>> +        dfree(d, p);                            \
>>> +        (p) = NULL;                             \
>>> +    } while ( 0 )
>>> +
>>> +
>>> +void *_dzalloc(struct domain *d, size_t size, size_t align);
>>> +
>>> +static inline void *_dzalloc_array(struct domain *d, size_t size,
>>> +                                   size_t align, size_t num)
>>> +{
>>> +    return _dzalloc(d, size * num, align);
>> No protection at all against the multiplication overflowing?
> 
> Well - xmalloc doesn't have any either.  But yes - both want some
> compiler-aided overflow detection, rather than messing around with
> arbitrary limits pretending to be an overflow check.

You did suggest this previously, for xmalloc(). And I did look
into doing so, but either ran into some issue or simply didn't
see the point, considering the code we already have. Yes, the
use of UINT_MAX may seem somewhat arbitrary. We could make this
less arbitrary by deriving from MAX_ORDER instead, or by adding
a suitable BUILD_BUG_ON(UINT_MAX < ...). After all there's no
point even just trying allocations exceeding MAX_ORDER.

The reason for my comment was that the functions here are
specifically _not_ a simple copy of their xmalloc()
counterparts.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
  2021-01-14 10:14       ` Jan Beulich
@ 2021-01-14 15:30         ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2021-01-14 15:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Tamas K Lengyel, Xen-devel

On 14/01/2021 10:14, Jan Beulich wrote:
> On 14.01.2021 00:16, Andrew Cooper wrote:
>> On 05/01/2021 15:56, Jan Beulich wrote:
>>> On 23.12.2020 17:34, Andrew Cooper wrote:
>>>> RFC:
>>>>  * This probably wants to be less fatal in release builds
>>> I'm not even convinced this wants to be a panic() in debug builds.
>> Any memory leak spotted by this is an XSA, except in the narrow case of
>> being due exclusively to a weird and non-default order of hypercalls.
>>
>> It absolutely needs to be something fatal in debug builds, for avoid
>> going unnoticed by testing.
> This is a perfectly valid position to take, but I'm afraid once
> again isn't the only possible one. Since I do routinely look at
> logs, I'm personally in favor of avoiding crashing the host
> even for debug builds. The more that I've had pretty bad fallout
> from crashes in the past, due to (afaict) bugs in a file system
> driver in Linux that persisted over a longer period of time.

By that logic, we should replace most assertions with printk()s.  The
only reason I didn't use assert in this patch is because the
backtrace/etc is totally useless (we're in an RCU callback).

No-one, not even you, reviews all the logs from OSSTest.  It's sometimes
hard enough to get anyone to look at the logs even when there are emails
saying "$X looks broken".

The point at which we can spot the resource leak is too late to leave a
zombie domain around (we've already removed the domain from the
domlist), and a printk() is not an acceptably robust way of signalling
the problem.  Any change in the wording will render detection useless,
and some testing scenarios (stress in particular) will manage to wrap
the console ring in short order which risks hiding all traces of the
problem.

We're talking about something which will only occur on staging for
ill-reviewed patches, along with a (hopefully) reliable test developers
can use themselves, *and* in due course, this test being run pre-push
when we sort out the Gitlab CI.

The bottom line is that making this fatal is the only way we have of
getting people to pay attention to the severity of the issue, and I
don't think it reasonable to hamper automated testing's ability to spot
these issues, simply because you believe you're better than average at
reading your log files.

~Andrew


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] xen: domain-tracked allocations, and fault injection
  2020-12-23 16:34 [PATCH 0/4] xen: domain-tracked allocations, and fault injection Andrew Cooper
                   ` (4 preceding siblings ...)
  2021-01-08  1:49 ` [PATCH 0/4] xen: domain-tracked allocations, and " Stefano Stabellini
@ 2022-12-11 17:21 ` Julien Grall
  5 siblings, 0 replies; 17+ messages in thread
From: Julien Grall @ 2022-12-11 17:21 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Volodymyr Babchuk, Tamas K Lengyel

Hi Andrew,

On 23/12/2020 16:34, Andrew Cooper wrote:
> This was not the christmas hacking project that I was planning to do, but it
> has had some exciting results.
> 
> After some discussion on an earlier thread, Tamas has successfully got fuzzing
> of Xen working via kfx, and this series is a prototype for providing better
> testing infrastructure.
> 
> And to prove a point, this series has already found a memory leak in ARM's
> dom0less smoke test.

You mention this series recently on the ML. So I decided to give a try 
and manage to reproduce your "memory leak".

I put it in quote because the problem is not Arm and instead your code. 
If you look at the implementation of _dzalloc() you are using 
_xmalloc(). So the memory is not guaranteed to be zeroed after been 
allocation.

This is breaking the expectation of the callers. What you want is using 
"_xzalloc()'.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
  2020-12-23 16:34 ` [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs Andrew Cooper
  2021-01-05 15:56   ` Jan Beulich
  2021-01-05 16:01   ` Jan Beulich
@ 2022-12-11 17:24   ` Julien Grall
  2 siblings, 0 replies; 17+ messages in thread
From: Julien Grall @ 2022-12-11 17:24 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Volodymyr Babchuk, Tamas K Lengyel

Hi Andrew,

On 23/12/2020 16:34, Andrew Cooper wrote:
> Wrappers for xmalloc() and friends, which track allocations tied to a specific
> domain.
> 
> Check for any leaked memory at domain destruction time.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
> 
> RFC:
>   * This probably wants to be less fatal in release builds
>   * In an ideal world, we'd also want to count the total number of bytes
>     allocated from the xmalloc heap, which would be interesting to print in the
>     'q' debugkey.  However, that data is fairly invasive to obtain.
>   * More complicated logic could track the origins of each allocation, and be
>     able to identify which one(s) leaked.
> ---
>   xen/common/Makefile       |  1 +
>   xen/common/dmalloc.c      | 19 +++++++++++++++++++
>   xen/common/domain.c       |  6 ++++++
>   xen/include/xen/dmalloc.h | 29 +++++++++++++++++++++++++++++
>   xen/include/xen/sched.h   |  2 ++
>   5 files changed, 57 insertions(+)
>   create mode 100644 xen/common/dmalloc.c
>   create mode 100644 xen/include/xen/dmalloc.h
> 
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 7a4e652b57..c5d9c23fd1 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -5,6 +5,7 @@ obj-$(CONFIG_CORE_PARKING) += core_parking.o
>   obj-y += cpu.o
>   obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
>   obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
> +obj-y += dmalloc.o
>   obj-y += domain.o
>   obj-y += event_2l.o
>   obj-y += event_channel.o
> diff --git a/xen/common/dmalloc.c b/xen/common/dmalloc.c
> new file mode 100644
> index 0000000000..e3a0e546c2
> --- /dev/null
> +++ b/xen/common/dmalloc.c
> @@ -0,0 +1,19 @@
> +#include <xen/dmalloc.h>
> +#include <xen/sched.h>
> +#include <xen/xmalloc.h>
> +
> +void dfree(struct domain *d, void *ptr)
> +{
> +    atomic_dec(&d->dalloc_heap);
> +    xfree(ptr);
> +}
> +
> +void *_dzalloc(struct domain *d, size_t size, size_t align)
> +{
> +    void *ptr = _xmalloc(size, align);

The 'z' in _dzalloc() implies the memory will want to be zeroed. But 
here you are using _xmalloc().

This also explains the "memory leak" you reported. By switching to 
"_xzalloc()" and the problem disappears (at least on my setup).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-12-11 17:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-23 16:34 [PATCH 0/4] xen: domain-tracked allocations, and fault injection Andrew Cooper
2020-12-23 16:34 ` [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs Andrew Cooper
2021-01-05 15:56   ` Jan Beulich
2021-01-13 23:16     ` Andrew Cooper
2021-01-14 10:14       ` Jan Beulich
2021-01-14 15:30         ` Andrew Cooper
2021-01-05 16:01   ` Jan Beulich
2022-12-11 17:24   ` Julien Grall
2020-12-23 16:34 ` [PATCH 2/4] xen/evtchn: Switch to dmalloc Andrew Cooper
2021-01-05 16:09   ` Jan Beulich
2020-12-23 16:34 ` [PATCH 3/4] xen/domctl: Introduce fault_ttl Andrew Cooper
2021-01-05 16:39   ` Jan Beulich
2021-01-13 23:58     ` Andrew Cooper
2020-12-23 16:34 ` [PATCH 4/4] tools/misc: Test for fault injection Andrew Cooper
2020-12-23 16:41   ` Jan Beulich
2021-01-08  1:49 ` [PATCH 0/4] xen: domain-tracked allocations, and " Stefano Stabellini
2022-12-11 17:21 ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.