All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
@ 2021-10-04 12:02 David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 1/7] util/oslib-posix: Let touch_all_pages() return an error David Hildenbrand
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini

#1 is a preparation for improved error reporting, #2 adds support for
MADV_POPULATE_WRITE, #3 cleans up the code to avoid global variables and
prepare for concurrency, #4 and #5 optimize thread handling, #6 makes
os_mem_prealloc() safe to be called from multiple threads concurrently and
#7 makes the SIGBUS handler coexist cleanly with the MCE SIGBUS handler
under Linux.

Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
Linux commits 4ca9b3859dac ("mm/madvise: introduce
MADV_POPULATE_(READ|WRITE) to prefault page tables") and eb2faa513c24
("mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)"),
and in the man page update [1].

v3 -> v4:
- Added ACKs
- "util/oslib-posix: Support concurrent os_mem_prealloc() invocation"
-- Remove stale comment from patch description

v2 -> v3:
- "util/oslib-posix: Let touch_all_pages() return an error"
-- Added
- Added ACKs/RBs
- "util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()"
-- Set error code accordingly

v1 -> v2:
- "util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()"
-- Handle thread with no data to initialize
-- Always set use_madv_populate_write properly
-- Add comment regarding future fallocate() optimization
- "util/oslib-posix: Don't create too many threads with small memory or
   little pages"
-- Added
- "util/oslib-posix: Avoid creating a single thread with
   MADV_POPULATE_WRITE"
-- Added
- "util/oslib-posix: Support concurrent os_mem_prealloc() invocation"
-- Add missing g_once_init_leave()
-- Move g_once_init_enter() to the place where it is actually needed
- "util/oslib-posix: Forward SIGBUS to MCE handler under Linux"
-- Added

[1] https://lkml.kernel.org/r/20210823120645.8223-1-david@redhat.com

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>

David Hildenbrand (7):
  util/oslib-posix: Let touch_all_pages() return an error
  util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
  util/oslib-posix: Introduce and use MemsetContext for
    touch_all_pages()
  util/oslib-posix: Don't create too many threads with small memory or
    little pages
  util/oslib-posix: Avoid creating a single thread with
    MADV_POPULATE_WRITE
  util/oslib-posix: Support concurrent os_mem_prealloc() invocation
  util/oslib-posix: Forward SIGBUS to MCE handler under Linux

 include/qemu/osdep.h |   7 ++
 softmmu/cpus.c       |   4 +
 util/oslib-posix.c   | 231 ++++++++++++++++++++++++++++++++-----------
 3 files changed, 187 insertions(+), 55 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/7] util/oslib-posix: Let touch_all_pages() return an error
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
@ 2021-10-04 12:02 ` David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 2/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini

Let's prepare touch_all_pages() for returning differing errors. Return
an error from the thread and report the last processed error.

Translate SIGBUS to -EFAULT, as a SIGBUS can mean all different kind of
things (memory error, read error, out of memory). When allocating memory
fails via the current SIGBUS-based mechanism, we'll get:
    os_mem_prealloc: preallocating memory failed: Bad address

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/oslib-posix.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e8bdb02e1d..b146beef78 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -84,7 +84,6 @@ typedef struct MemsetThread MemsetThread;
 
 static MemsetThread *memset_thread;
 static int memset_num_threads;
-static bool memset_thread_failed;
 
 static QemuMutex page_mutex;
 static QemuCond page_cond;
@@ -452,6 +451,7 @@ static void *do_touch_pages(void *arg)
 {
     MemsetThread *memset_args = (MemsetThread *)arg;
     sigset_t set, oldset;
+    int ret = 0;
 
     /*
      * On Linux, the page faults from the loop below can cause mmap_sem
@@ -470,7 +470,7 @@ static void *do_touch_pages(void *arg)
     pthread_sigmask(SIG_UNBLOCK, &set, &oldset);
 
     if (sigsetjmp(memset_args->env, 1)) {
-        memset_thread_failed = true;
+        ret = -EFAULT;
     } else {
         char *addr = memset_args->addr;
         size_t numpages = memset_args->numpages;
@@ -494,7 +494,7 @@ static void *do_touch_pages(void *arg)
         }
     }
     pthread_sigmask(SIG_SETMASK, &oldset, NULL);
-    return NULL;
+    return (void *)(uintptr_t)ret;
 }
 
 static inline int get_memset_num_threads(int smp_cpus)
@@ -509,13 +509,13 @@ static inline int get_memset_num_threads(int smp_cpus)
     return ret;
 }
 
-static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
-                            int smp_cpus)
+static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
+                           int smp_cpus)
 {
     static gsize initialized = 0;
     size_t numpages_per_thread, leftover;
+    int ret = 0, i = 0;
     char *addr = area;
-    int i = 0;
 
     if (g_once_init_enter(&initialized)) {
         qemu_mutex_init(&page_mutex);
@@ -523,7 +523,6 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
         g_once_init_leave(&initialized, 1);
     }
 
-    memset_thread_failed = false;
     threads_created_flag = false;
     memset_num_threads = get_memset_num_threads(smp_cpus);
     memset_thread = g_new0(MemsetThread, memset_num_threads);
@@ -545,12 +544,16 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
     qemu_mutex_unlock(&page_mutex);
 
     for (i = 0; i < memset_num_threads; i++) {
-        qemu_thread_join(&memset_thread[i].pgthread);
+        int tmp = (uintptr_t)qemu_thread_join(&memset_thread[i].pgthread);
+
+        if (tmp) {
+            ret = tmp;
+        }
     }
     g_free(memset_thread);
     memset_thread = NULL;
 
-    return memset_thread_failed;
+    return ret;
 }
 
 void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
@@ -573,9 +576,10 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
     }
 
     /* touch pages simultaneously */
-    if (touch_all_pages(area, hpagesize, numpages, smp_cpus)) {
-        error_setg(errp, "os_mem_prealloc: Insufficient free host memory "
-            "pages available to allocate guest RAM");
+    ret = touch_all_pages(area, hpagesize, numpages, smp_cpus);
+    if (ret) {
+        error_setg_errno(errp, -ret,
+                         "os_mem_prealloc: preallocating memory failed");
     }
 
     ret = sigaction(SIGBUS, &oldact, NULL);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 1/7] util/oslib-posix: Let touch_all_pages() return an error David Hildenbrand
@ 2021-10-04 12:02 ` David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 3/7] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() David Hildenbrand
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Pankaj Gupta, Igor Mammedov,
	Paolo Bonzini

Let's sense support and use it for preallocation. MADV_POPULATE_WRITE
does not require a SIGBUS handler, doesn't actually touch page content,
and avoids context switches; it is, therefore, faster and easier to handle
than our current approach.

While MADV_POPULATE_WRITE is, in general, faster than manual
prefaulting, and especially faster with 4k pages, there is still value in
prefaulting using multiple threads to speed up preallocation.

More details on MADV_POPULATE_WRITE can be found in the Linux commits
4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault
page tables") and eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for
MADV_POPULATE_(READ|WRITE)"), and in the man page proposal [1].

This resolves the TODO in do_touch_pages().

In the future, we might want to look into using fallocate(), eventually
combined with MADV_POPULATE_READ, when dealing with shared file/fd
mappings and not caring about memory bindings.

[1] https://lkml.kernel.org/r/20210816081922.5155-1-david@redhat.com

Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/osdep.h |  7 ++++
 util/oslib-posix.c   | 83 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 60718fc342..d1660d67fa 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -471,6 +471,11 @@ static inline void qemu_cleanup_generic_vfree(void *p)
 #else
 #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED
 #endif
+#ifdef MADV_POPULATE_WRITE
+#define QEMU_MADV_POPULATE_WRITE MADV_POPULATE_WRITE
+#else
+#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID
+#endif
 
 #elif defined(CONFIG_POSIX_MADVISE)
 
@@ -484,6 +489,7 @@ static inline void qemu_cleanup_generic_vfree(void *p)
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED
+#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID
 
 #else /* no-op */
 
@@ -497,6 +503,7 @@ static inline void qemu_cleanup_generic_vfree(void *p)
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID
 
 #endif
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index b146beef78..cb89e07770 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -484,10 +484,6 @@ static void *do_touch_pages(void *arg)
              *
              * 'volatile' to stop compiler optimizing this away
              * to a no-op
-             *
-             * TODO: get a better solution from kernel so we
-             * don't need to write at all so we don't cause
-             * wear on the storage backing the region...
              */
             *(volatile char *)addr = *addr;
             addr += hpagesize;
@@ -497,6 +493,26 @@ static void *do_touch_pages(void *arg)
     return (void *)(uintptr_t)ret;
 }
 
+static void *do_madv_populate_write_pages(void *arg)
+{
+    MemsetThread *memset_args = (MemsetThread *)arg;
+    const size_t size = memset_args->numpages * memset_args->hpagesize;
+    char * const addr = memset_args->addr;
+    int ret = 0;
+
+    /* See do_touch_pages(). */
+    qemu_mutex_lock(&page_mutex);
+    while (!threads_created_flag) {
+        qemu_cond_wait(&page_cond, &page_mutex);
+    }
+    qemu_mutex_unlock(&page_mutex);
+
+    if (size && qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE)) {
+        ret = -errno;
+    }
+    return (void *)(uintptr_t)ret;
+}
+
 static inline int get_memset_num_threads(int smp_cpus)
 {
     long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
@@ -510,10 +526,11 @@ static inline int get_memset_num_threads(int smp_cpus)
 }
 
 static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
-                           int smp_cpus)
+                           int smp_cpus, bool use_madv_populate_write)
 {
     static gsize initialized = 0;
     size_t numpages_per_thread, leftover;
+    void *(*touch_fn)(void *);
     int ret = 0, i = 0;
     char *addr = area;
 
@@ -523,6 +540,12 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
         g_once_init_leave(&initialized, 1);
     }
 
+    if (use_madv_populate_write) {
+        touch_fn = do_madv_populate_write_pages;
+    } else {
+        touch_fn = do_touch_pages;
+    }
+
     threads_created_flag = false;
     memset_num_threads = get_memset_num_threads(smp_cpus);
     memset_thread = g_new0(MemsetThread, memset_num_threads);
@@ -533,7 +556,7 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
         memset_thread[i].numpages = numpages_per_thread + (i < leftover);
         memset_thread[i].hpagesize = hpagesize;
         qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
-                           do_touch_pages, &memset_thread[i],
+                           touch_fn, &memset_thread[i],
                            QEMU_THREAD_JOINABLE);
         addr += memset_thread[i].numpages * hpagesize;
     }
@@ -556,6 +579,12 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
     return ret;
 }
 
+static bool madv_populate_write_possible(char *area, size_t pagesize)
+{
+    return !qemu_madvise(area, pagesize, QEMU_MADV_POPULATE_WRITE) ||
+           errno != EINVAL;
+}
+
 void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
                      Error **errp)
 {
@@ -563,30 +592,42 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
     struct sigaction act, oldact;
     size_t hpagesize = qemu_fd_getpagesize(fd);
     size_t numpages = DIV_ROUND_UP(memory, hpagesize);
+    bool use_madv_populate_write;
 
-    memset(&act, 0, sizeof(act));
-    act.sa_handler = &sigbus_handler;
-    act.sa_flags = 0;
-
-    ret = sigaction(SIGBUS, &act, &oldact);
-    if (ret) {
-        error_setg_errno(errp, errno,
-            "os_mem_prealloc: failed to install signal handler");
-        return;
+    /*
+     * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
+     * some special mappings, such as mapping /dev/mem.
+     */
+    use_madv_populate_write = madv_populate_write_possible(area, hpagesize);
+
+    if (!use_madv_populate_write) {
+        memset(&act, 0, sizeof(act));
+        act.sa_handler = &sigbus_handler;
+        act.sa_flags = 0;
+
+        ret = sigaction(SIGBUS, &act, &oldact);
+        if (ret) {
+            error_setg_errno(errp, errno,
+                "os_mem_prealloc: failed to install signal handler");
+            return;
+        }
     }
 
     /* touch pages simultaneously */
-    ret = touch_all_pages(area, hpagesize, numpages, smp_cpus);
+    ret = touch_all_pages(area, hpagesize, numpages, smp_cpus,
+                          use_madv_populate_write);
     if (ret) {
         error_setg_errno(errp, -ret,
                          "os_mem_prealloc: preallocating memory failed");
     }
 
-    ret = sigaction(SIGBUS, &oldact, NULL);
-    if (ret) {
-        /* Terminate QEMU since it can't recover from error */
-        perror("os_mem_prealloc: failed to reinstall signal handler");
-        exit(1);
+    if (!use_madv_populate_write) {
+        ret = sigaction(SIGBUS, &oldact, NULL);
+        if (ret) {
+            /* Terminate QEMU since it can't recover from error */
+            perror("os_mem_prealloc: failed to reinstall signal handler");
+            exit(1);
+        }
     }
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/7] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages()
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 1/7] util/oslib-posix: Let touch_all_pages() return an error David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 2/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
@ 2021-10-04 12:02 ` David Hildenbrand
  2021-10-07 10:05   ` Dr. David Alan Gilbert
  2021-10-04 12:02 ` [PATCH v4 4/7] util/oslib-posix: Don't create too many threads with small memory or little pages David Hildenbrand
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini

Let's minimize the number of global variables to prepare for
os_mem_prealloc() getting called concurrently and make the code a bit
easier to read.

The only consumer that really needs a global variable is the sigbus
handler, which will require protection via a mutex in the future either way
as we cannot concurrently mess with the SIGBUS handler.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/oslib-posix.c | 73 +++++++++++++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index cb89e07770..cf2ead54ad 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -73,21 +73,30 @@
 
 #define MAX_MEM_PREALLOC_THREAD_COUNT 16
 
+struct MemsetThread;
+
+typedef struct MemsetContext {
+    bool all_threads_created;
+    bool any_thread_failed;
+    struct MemsetThread *threads;
+    int num_threads;
+} MemsetContext;
+
 struct MemsetThread {
     char *addr;
     size_t numpages;
     size_t hpagesize;
     QemuThread pgthread;
     sigjmp_buf env;
+    MemsetContext *context;
 };
 typedef struct MemsetThread MemsetThread;
 
-static MemsetThread *memset_thread;
-static int memset_num_threads;
+/* used by sigbus_handler() */
+static MemsetContext *sigbus_memset_context;
 
 static QemuMutex page_mutex;
 static QemuCond page_cond;
-static bool threads_created_flag;
 
 int qemu_get_thread_id(void)
 {
@@ -438,10 +447,13 @@ const char *qemu_get_exec_dir(void)
 static void sigbus_handler(int signal)
 {
     int i;
-    if (memset_thread) {
-        for (i = 0; i < memset_num_threads; i++) {
-            if (qemu_thread_is_self(&memset_thread[i].pgthread)) {
-                siglongjmp(memset_thread[i].env, 1);
+
+    if (sigbus_memset_context) {
+        for (i = 0; i < sigbus_memset_context->num_threads; i++) {
+            MemsetThread *thread = &sigbus_memset_context->threads[i];
+
+            if (qemu_thread_is_self(&thread->pgthread)) {
+                siglongjmp(thread->env, 1);
             }
         }
     }
@@ -459,7 +471,7 @@ static void *do_touch_pages(void *arg)
      * clearing until all threads have been created.
      */
     qemu_mutex_lock(&page_mutex);
-    while(!threads_created_flag){
+    while (!memset_args->context->all_threads_created) {
         qemu_cond_wait(&page_cond, &page_mutex);
     }
     qemu_mutex_unlock(&page_mutex);
@@ -502,7 +514,7 @@ static void *do_madv_populate_write_pages(void *arg)
 
     /* See do_touch_pages(). */
     qemu_mutex_lock(&page_mutex);
-    while (!threads_created_flag) {
+    while (!memset_args->context->all_threads_created) {
         qemu_cond_wait(&page_cond, &page_mutex);
     }
     qemu_mutex_unlock(&page_mutex);
@@ -529,6 +541,9 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
                            int smp_cpus, bool use_madv_populate_write)
 {
     static gsize initialized = 0;
+    MemsetContext context = {
+        .num_threads = get_memset_num_threads(smp_cpus),
+    };
     size_t numpages_per_thread, leftover;
     void *(*touch_fn)(void *);
     int ret = 0, i = 0;
@@ -546,35 +561,41 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
         touch_fn = do_touch_pages;
     }
 
-    threads_created_flag = false;
-    memset_num_threads = get_memset_num_threads(smp_cpus);
-    memset_thread = g_new0(MemsetThread, memset_num_threads);
-    numpages_per_thread = numpages / memset_num_threads;
-    leftover = numpages % memset_num_threads;
-    for (i = 0; i < memset_num_threads; i++) {
-        memset_thread[i].addr = addr;
-        memset_thread[i].numpages = numpages_per_thread + (i < leftover);
-        memset_thread[i].hpagesize = hpagesize;
-        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
-                           touch_fn, &memset_thread[i],
+    context.threads = g_new0(MemsetThread, context.num_threads);
+    numpages_per_thread = numpages / context.num_threads;
+    leftover = numpages % context.num_threads;
+    for (i = 0; i < context.num_threads; i++) {
+        context.threads[i].addr = addr;
+        context.threads[i].numpages = numpages_per_thread + (i < leftover);
+        context.threads[i].hpagesize = hpagesize;
+        context.threads[i].context = &context;
+        qemu_thread_create(&context.threads[i].pgthread, "touch_pages",
+                           touch_fn, &context.threads[i],
                            QEMU_THREAD_JOINABLE);
-        addr += memset_thread[i].numpages * hpagesize;
+        addr += context.threads[i].numpages * hpagesize;
+    }
+
+    if (!use_madv_populate_write) {
+        sigbus_memset_context = &context;
     }
 
     qemu_mutex_lock(&page_mutex);
-    threads_created_flag = true;
+    context.all_threads_created = true;
     qemu_cond_broadcast(&page_cond);
     qemu_mutex_unlock(&page_mutex);
 
-    for (i = 0; i < memset_num_threads; i++) {
-        int tmp = (uintptr_t)qemu_thread_join(&memset_thread[i].pgthread);
+    for (i = 0; i < context.num_threads; i++) {
+        int tmp = (uintptr_t)qemu_thread_join(&context.threads[i].pgthread);
 
         if (tmp) {
             ret = tmp;
         }
     }
-    g_free(memset_thread);
-    memset_thread = NULL;
+
+    if (!use_madv_populate_write) {
+        sigbus_memset_context = NULL;
+    }
+    g_free(context.threads);
 
     return ret;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/7] util/oslib-posix: Don't create too many threads with small memory or little pages
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-10-04 12:02 ` [PATCH v4 3/7] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() David Hildenbrand
@ 2021-10-04 12:02 ` David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 5/7] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE David Hildenbrand
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Pankaj Gupta, Igor Mammedov,
	Paolo Bonzini

Let's limit the number of threads to something sane, especially that
- We don't have more threads than the number of pages we have
- We don't have threads that initialize small (< 64 MiB) memory

Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/oslib-posix.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index cf2ead54ad..67c08a425e 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -40,6 +40,7 @@
 #include <libgen.h>
 #include "qemu/cutils.h"
 #include "qemu/compiler.h"
+#include "qemu/units.h"
 
 #ifdef CONFIG_LINUX
 #include <sys/syscall.h>
@@ -525,7 +526,8 @@ static void *do_madv_populate_write_pages(void *arg)
     return (void *)(uintptr_t)ret;
 }
 
-static inline int get_memset_num_threads(int smp_cpus)
+static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
+                                         int smp_cpus)
 {
     long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
     int ret = 1;
@@ -533,6 +535,12 @@ static inline int get_memset_num_threads(int smp_cpus)
     if (host_procs > 0) {
         ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus);
     }
+
+    /* Especially with gigantic pages, don't create more threads than pages. */
+    ret = MIN(ret, numpages);
+    /* Don't start threads to prealloc comparatively little memory. */
+    ret = MIN(ret, MAX(1, hpagesize * numpages / (64 * MiB)));
+
     /* In case sysconf() fails, we fall back to single threaded */
     return ret;
 }
@@ -542,7 +550,7 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
 {
     static gsize initialized = 0;
     MemsetContext context = {
-        .num_threads = get_memset_num_threads(smp_cpus),
+        .num_threads = get_memset_num_threads(hpagesize, numpages, smp_cpus),
     };
     size_t numpages_per_thread, leftover;
     void *(*touch_fn)(void *);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/7] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
                   ` (3 preceding siblings ...)
  2021-10-04 12:02 ` [PATCH v4 4/7] util/oslib-posix: Don't create too many threads with small memory or little pages David Hildenbrand
@ 2021-10-04 12:02 ` David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 6/7] util/oslib-posix: Support concurrent os_mem_prealloc() invocation David Hildenbrand
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Pankaj Gupta, Igor Mammedov,
	Paolo Bonzini

Let's simplify the case when we only want a single thread and don't have
to mess with signal handlers.

Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/oslib-posix.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 67c08a425e..efa4f96d56 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -564,6 +564,14 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
     }
 
     if (use_madv_populate_write) {
+        /* Avoid creating a single thread for MADV_POPULATE_WRITE */
+        if (context.num_threads == 1) {
+            if (qemu_madvise(area, hpagesize * numpages,
+                             QEMU_MADV_POPULATE_WRITE)) {
+                return -errno;
+            }
+            return 0;
+        }
         touch_fn = do_madv_populate_write_pages;
     } else {
         touch_fn = do_touch_pages;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 6/7] util/oslib-posix: Support concurrent os_mem_prealloc() invocation
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
                   ` (4 preceding siblings ...)
  2021-10-04 12:02 ` [PATCH v4 5/7] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE David Hildenbrand
@ 2021-10-04 12:02 ` David Hildenbrand
  2021-10-04 12:02 ` [PATCH v4 7/7] util/oslib-posix: Forward SIGBUS to MCE handler under Linux David Hildenbrand
  2021-10-11 16:46 ` [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
  7 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Pankaj Gupta, Igor Mammedov,
	Paolo Bonzini

Add a mutex to protect the SIGBUS case, as we cannot mess concurrently
with the sigbus handler and we have to manage the global variable
sigbus_memset_context. The MADV_POPULATE_WRITE path can run
concurrently.

Note that page_mutex and page_cond are shared between concurrent
invocations, which shouldn't be a problem.

This is a preparation for future virtio-mem prealloc code, which will call
os_mem_prealloc() asynchronously from an iothread when handling guest
requests.

Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/oslib-posix.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index efa4f96d56..9829149e4b 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -95,6 +95,7 @@ typedef struct MemsetThread MemsetThread;
 
 /* used by sigbus_handler() */
 static MemsetContext *sigbus_memset_context;
+static QemuMutex sigbus_mutex;
 
 static QemuMutex page_mutex;
 static QemuCond page_cond;
@@ -625,6 +626,7 @@ static bool madv_populate_write_possible(char *area, size_t pagesize)
 void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
                      Error **errp)
 {
+    static gsize initialized;
     int ret;
     struct sigaction act, oldact;
     size_t hpagesize = qemu_fd_getpagesize(fd);
@@ -638,6 +640,12 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
     use_madv_populate_write = madv_populate_write_possible(area, hpagesize);
 
     if (!use_madv_populate_write) {
+        if (g_once_init_enter(&initialized)) {
+            qemu_mutex_init(&sigbus_mutex);
+            g_once_init_leave(&initialized, 1);
+        }
+
+        qemu_mutex_lock(&sigbus_mutex);
         memset(&act, 0, sizeof(act));
         act.sa_handler = &sigbus_handler;
         act.sa_flags = 0;
@@ -665,6 +673,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
             perror("os_mem_prealloc: failed to reinstall signal handler");
             exit(1);
         }
+        qemu_mutex_unlock(&sigbus_mutex);
     }
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 7/7] util/oslib-posix: Forward SIGBUS to MCE handler under Linux
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
                   ` (5 preceding siblings ...)
  2021-10-04 12:02 ` [PATCH v4 6/7] util/oslib-posix: Support concurrent os_mem_prealloc() invocation David Hildenbrand
@ 2021-10-04 12:02 ` David Hildenbrand
  2021-10-11 16:46 ` [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
  7 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-04 12:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini

Temporarily modifying the SIGBUS handler is really nasty, as we might be
unlucky and receive an MCE SIGBUS while having our handler registered.
Unfortunately, there is no way around messing with SIGBUS when
MADV_POPULATE_WRITE is not applicable or not around.

Let's forward SIGBUS that don't belong to us to the already registered
handler and document the situation.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/cpus.c     |  4 ++++
 util/oslib-posix.c | 36 +++++++++++++++++++++++++++++++++---
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 071085f840..23bca46b07 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -352,6 +352,10 @@ static void qemu_init_sigbus(void)
 {
     struct sigaction action;
 
+    /*
+     * ALERT: when modifying this, take care that SIGBUS forwarding in
+     * os_mem_prealloc() will continue working as expected.
+     */
     memset(&action, 0, sizeof(action));
     action.sa_flags = SA_SIGINFO;
     action.sa_sigaction = sigbus_handler;
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 9829149e4b..5c47aa9cb7 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -95,6 +95,7 @@ typedef struct MemsetThread MemsetThread;
 
 /* used by sigbus_handler() */
 static MemsetContext *sigbus_memset_context;
+struct sigaction sigbus_oldact;
 static QemuMutex sigbus_mutex;
 
 static QemuMutex page_mutex;
@@ -446,7 +447,11 @@ const char *qemu_get_exec_dir(void)
     return exec_dir;
 }
 
+#ifdef CONFIG_LINUX
+static void sigbus_handler(int signal, siginfo_t *siginfo, void *ctx)
+#else /* CONFIG_LINUX */
 static void sigbus_handler(int signal)
+#endif /* CONFIG_LINUX */
 {
     int i;
 
@@ -459,6 +464,26 @@ static void sigbus_handler(int signal)
             }
         }
     }
+
+#ifdef CONFIG_LINUX
+    /*
+     * We assume that the MCE SIGBUS handler could have been registered. We
+     * should never receive BUS_MCEERR_AO on any of our threads, but only on
+     * the main thread registered for PR_MCE_KILL_EARLY. Further, we should not
+     * receive BUS_MCEERR_AR triggered by action of other threads on one of
+     * our threads. So, no need to check for unrelated SIGBUS when seeing one
+     * for our threads.
+     *
+     * We will forward to the MCE handler, which will either handle the SIGBUS
+     * or reinstall the default SIGBUS handler and reraise the SIGBUS. The
+     * default SIGBUS handler will crash the process, so we don't care.
+     */
+    if (sigbus_oldact.sa_flags & SA_SIGINFO) {
+        sigbus_oldact.sa_sigaction(signal, siginfo, ctx);
+        return;
+    }
+#endif /* CONFIG_LINUX */
+    warn_report("os_mem_prealloc: unrelated SIGBUS detected and ignored");
 }
 
 static void *do_touch_pages(void *arg)
@@ -628,10 +653,10 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
 {
     static gsize initialized;
     int ret;
-    struct sigaction act, oldact;
     size_t hpagesize = qemu_fd_getpagesize(fd);
     size_t numpages = DIV_ROUND_UP(memory, hpagesize);
     bool use_madv_populate_write;
+    struct sigaction act;
 
     /*
      * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
@@ -647,10 +672,15 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
 
         qemu_mutex_lock(&sigbus_mutex);
         memset(&act, 0, sizeof(act));
+#ifdef CONFIG_LINUX
+        act.sa_sigaction = &sigbus_handler;
+        act.sa_flags = SA_SIGINFO;
+#else /* CONFIG_LINUX */
         act.sa_handler = &sigbus_handler;
         act.sa_flags = 0;
+#endif /* CONFIG_LINUX */
 
-        ret = sigaction(SIGBUS, &act, &oldact);
+        ret = sigaction(SIGBUS, &act, &sigbus_oldact);
         if (ret) {
             error_setg_errno(errp, errno,
                 "os_mem_prealloc: failed to install signal handler");
@@ -667,7 +697,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
     }
 
     if (!use_madv_populate_write) {
-        ret = sigaction(SIGBUS, &oldact, NULL);
+        ret = sigaction(SIGBUS, &sigbus_oldact, NULL);
         if (ret) {
             /* Terminate QEMU since it can't recover from error */
             perror("os_mem_prealloc: failed to reinstall signal handler");
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/7] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages()
  2021-10-04 12:02 ` [PATCH v4 3/7] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() David Hildenbrand
@ 2021-10-07 10:05   ` Dr. David Alan Gilbert
  2021-10-07 10:12     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2021-10-07 10:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Igor Mammedov,
	Paolo Bonzini

* David Hildenbrand (david@redhat.com) wrote:
> Let's minimize the number of global variables to prepare for
> os_mem_prealloc() getting called concurrently and make the code a bit
> easier to read.
> 
> The only consumer that really needs a global variable is the sigbus
> handler, which will require protection via a mutex in the future either way
> as we cannot concurrently mess with the SIGBUS handler.
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  util/oslib-posix.c | 73 +++++++++++++++++++++++++++++-----------------
>  1 file changed, 47 insertions(+), 26 deletions(-)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index cb89e07770..cf2ead54ad 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -73,21 +73,30 @@
>  
>  #define MAX_MEM_PREALLOC_THREAD_COUNT 16
>  
> +struct MemsetThread;
> +
> +typedef struct MemsetContext {
> +    bool all_threads_created;
> +    bool any_thread_failed;
> +    struct MemsetThread *threads;
> +    int num_threads;
> +} MemsetContext;
> +
>  struct MemsetThread {
>      char *addr;
>      size_t numpages;
>      size_t hpagesize;
>      QemuThread pgthread;
>      sigjmp_buf env;
> +    MemsetContext *context;
>  };
>  typedef struct MemsetThread MemsetThread;
>  
> -static MemsetThread *memset_thread;
> -static int memset_num_threads;
> +/* used by sigbus_handler() */
> +static MemsetContext *sigbus_memset_context;
>  
>  static QemuMutex page_mutex;
>  static QemuCond page_cond;
> -static bool threads_created_flag;

Is there a reason you didn't lift page_mutex and page_cond into the
MemsetContext ?

(You don't need to change it for this series, just a thought;
another thought is that I think we hav ea few threadpools like this
with hooks to check they've all been created and to do something if one
fails).

Dave

>  int qemu_get_thread_id(void)
>  {
> @@ -438,10 +447,13 @@ const char *qemu_get_exec_dir(void)
>  static void sigbus_handler(int signal)
>  {
>      int i;
> -    if (memset_thread) {
> -        for (i = 0; i < memset_num_threads; i++) {
> -            if (qemu_thread_is_self(&memset_thread[i].pgthread)) {
> -                siglongjmp(memset_thread[i].env, 1);
> +
> +    if (sigbus_memset_context) {
> +        for (i = 0; i < sigbus_memset_context->num_threads; i++) {
> +            MemsetThread *thread = &sigbus_memset_context->threads[i];
> +
> +            if (qemu_thread_is_self(&thread->pgthread)) {
> +                siglongjmp(thread->env, 1);
>              }
>          }
>      }
> @@ -459,7 +471,7 @@ static void *do_touch_pages(void *arg)
>       * clearing until all threads have been created.
>       */
>      qemu_mutex_lock(&page_mutex);
> -    while(!threads_created_flag){
> +    while (!memset_args->context->all_threads_created) {
>          qemu_cond_wait(&page_cond, &page_mutex);
>      }
>      qemu_mutex_unlock(&page_mutex);
> @@ -502,7 +514,7 @@ static void *do_madv_populate_write_pages(void *arg)
>  
>      /* See do_touch_pages(). */
>      qemu_mutex_lock(&page_mutex);
> -    while (!threads_created_flag) {
> +    while (!memset_args->context->all_threads_created) {
>          qemu_cond_wait(&page_cond, &page_mutex);
>      }
>      qemu_mutex_unlock(&page_mutex);
> @@ -529,6 +541,9 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>                             int smp_cpus, bool use_madv_populate_write)
>  {
>      static gsize initialized = 0;
> +    MemsetContext context = {
> +        .num_threads = get_memset_num_threads(smp_cpus),
> +    };
>      size_t numpages_per_thread, leftover;
>      void *(*touch_fn)(void *);
>      int ret = 0, i = 0;
> @@ -546,35 +561,41 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>          touch_fn = do_touch_pages;
>      }
>  
> -    threads_created_flag = false;
> -    memset_num_threads = get_memset_num_threads(smp_cpus);
> -    memset_thread = g_new0(MemsetThread, memset_num_threads);
> -    numpages_per_thread = numpages / memset_num_threads;
> -    leftover = numpages % memset_num_threads;
> -    for (i = 0; i < memset_num_threads; i++) {
> -        memset_thread[i].addr = addr;
> -        memset_thread[i].numpages = numpages_per_thread + (i < leftover);
> -        memset_thread[i].hpagesize = hpagesize;
> -        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
> -                           touch_fn, &memset_thread[i],
> +    context.threads = g_new0(MemsetThread, context.num_threads);
> +    numpages_per_thread = numpages / context.num_threads;
> +    leftover = numpages % context.num_threads;
> +    for (i = 0; i < context.num_threads; i++) {
> +        context.threads[i].addr = addr;
> +        context.threads[i].numpages = numpages_per_thread + (i < leftover);
> +        context.threads[i].hpagesize = hpagesize;
> +        context.threads[i].context = &context;
> +        qemu_thread_create(&context.threads[i].pgthread, "touch_pages",
> +                           touch_fn, &context.threads[i],
>                             QEMU_THREAD_JOINABLE);
> -        addr += memset_thread[i].numpages * hpagesize;
> +        addr += context.threads[i].numpages * hpagesize;
> +    }
> +
> +    if (!use_madv_populate_write) {
> +        sigbus_memset_context = &context;
>      }
>  
>      qemu_mutex_lock(&page_mutex);
> -    threads_created_flag = true;
> +    context.all_threads_created = true;
>      qemu_cond_broadcast(&page_cond);
>      qemu_mutex_unlock(&page_mutex);
>  
> -    for (i = 0; i < memset_num_threads; i++) {
> -        int tmp = (uintptr_t)qemu_thread_join(&memset_thread[i].pgthread);
> +    for (i = 0; i < context.num_threads; i++) {
> +        int tmp = (uintptr_t)qemu_thread_join(&context.threads[i].pgthread);
>  
>          if (tmp) {
>              ret = tmp;
>          }
>      }
> -    g_free(memset_thread);
> -    memset_thread = NULL;
> +
> +    if (!use_madv_populate_write) {
> +        sigbus_memset_context = NULL;
> +    }
> +    g_free(context.threads);
>  
>      return ret;
>  }
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/7] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages()
  2021-10-07 10:05   ` Dr. David Alan Gilbert
@ 2021-10-07 10:12     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-07 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Pankaj Gupta, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Igor Mammedov,
	Paolo Bonzini

On 07.10.21 12:05, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> Let's minimize the number of global variables to prepare for
>> os_mem_prealloc() getting called concurrently and make the code a bit
>> easier to read.
>>
>> The only consumer that really needs a global variable is the sigbus
>> handler, which will require protection via a mutex in the future either way
>> as we cannot concurrently mess with the SIGBUS handler.
>>
>> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   util/oslib-posix.c | 73 +++++++++++++++++++++++++++++-----------------
>>   1 file changed, 47 insertions(+), 26 deletions(-)
>>
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index cb89e07770..cf2ead54ad 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -73,21 +73,30 @@
>>   
>>   #define MAX_MEM_PREALLOC_THREAD_COUNT 16
>>   
>> +struct MemsetThread;
>> +
>> +typedef struct MemsetContext {
>> +    bool all_threads_created;
>> +    bool any_thread_failed;
>> +    struct MemsetThread *threads;
>> +    int num_threads;
>> +} MemsetContext;
>> +
>>   struct MemsetThread {
>>       char *addr;
>>       size_t numpages;
>>       size_t hpagesize;
>>       QemuThread pgthread;
>>       sigjmp_buf env;
>> +    MemsetContext *context;
>>   };
>>   typedef struct MemsetThread MemsetThread;
>>   
>> -static MemsetThread *memset_thread;
>> -static int memset_num_threads;
>> +/* used by sigbus_handler() */
>> +static MemsetContext *sigbus_memset_context;
>>   
>>   static QemuMutex page_mutex;
>>   static QemuCond page_cond;
>> -static bool threads_created_flag;
> 
> Is there a reason you didn't lift page_mutex and page_cond into the
> MemsetContext ?

Mostly for simplicity and I didn't think that it will really make a 
difference in practice.

In patch #6 I spelled out that this was done: "Note that page_mutex and 
page_cond are shared between concurrent invocations, which shouldn't be 
a problem."

> 
> (You don't need to change it for this series, just a thought;
> another thought is that I think we hav ea few threadpools like this
> with hooks to check they've all been created and to do something if one
> fails).

I can look into that as an add-on series once I have some spare cycles.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
  2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
                   ` (6 preceding siblings ...)
  2021-10-04 12:02 ` [PATCH v4 7/7] util/oslib-posix: Forward SIGBUS to MCE handler under Linux David Hildenbrand
@ 2021-10-11 16:46 ` David Hildenbrand
  7 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-10-11 16:46 UTC (permalink / raw)
  To: qemu-devel, Paolo Bonzini
  Cc: Pankaj Gupta, Daniel P . Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Dr . David Alan Gilbert,
	Igor Mammedov

On 04.10.21 14:02, David Hildenbrand wrote:
> #1 is a preparation for improved error reporting, #2 adds support for
> MADV_POPULATE_WRITE, #3 cleans up the code to avoid global variables and
> prepare for concurrency, #4 and #5 optimize thread handling, #6 makes
> os_mem_prealloc() safe to be called from multiple threads concurrently and
> #7 makes the SIGBUS handler coexist cleanly with the MCE SIGBUS handler
> under Linux.
> 
> Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
> Linux commits 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables") and eb2faa513c24
> ("mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)"),
> and in the man page update [1].

Paolo, are you planning on taking this via your tree (POSIX)? Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-11 16:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04 12:02 [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
2021-10-04 12:02 ` [PATCH v4 1/7] util/oslib-posix: Let touch_all_pages() return an error David Hildenbrand
2021-10-04 12:02 ` [PATCH v4 2/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand
2021-10-04 12:02 ` [PATCH v4 3/7] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() David Hildenbrand
2021-10-07 10:05   ` Dr. David Alan Gilbert
2021-10-07 10:12     ` David Hildenbrand
2021-10-04 12:02 ` [PATCH v4 4/7] util/oslib-posix: Don't create too many threads with small memory or little pages David Hildenbrand
2021-10-04 12:02 ` [PATCH v4 5/7] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE David Hildenbrand
2021-10-04 12:02 ` [PATCH v4 6/7] util/oslib-posix: Support concurrent os_mem_prealloc() invocation David Hildenbrand
2021-10-04 12:02 ` [PATCH v4 7/7] util/oslib-posix: Forward SIGBUS to MCE handler under Linux David Hildenbrand
2021-10-11 16:46 ` [PATCH v4 0/7] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.