qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes
@ 2021-04-14 20:02 Stefan Hajnoczi
  2021-04-14 20:02 ` [PATCH 1/2] util/async: add a human-readable name to BHs for debugging Stefan Hajnoczi
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2021-04-14 20:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, eric.g.ernst, Stefan Hajnoczi,
	Paolo Bonzini, Max Reitz

Eric Ernst and I debugged a BH leak and it was more involved than it should be.
The problem is that BHs don't have a human-readable identifier, so low-level
debugging techniques and inferences about the code are required to figure out
which BH was leaked in production environments without easy debug access.

The leak ended up already being fixed upstream but let's improve diagnostics
for leaked BHs so that this becomes quick and easy in the future.

Stefan Hajnoczi (2):
  util/async: add a human-readable name to BHs for debugging
  util/async: print leaked BH name when AioContext finalizes

 include/block/aio.h            | 31 ++++++++++++++++++++++++++++---
 include/qemu/main-loop.h       |  4 +++-
 tests/unit/ptimer-test-stubs.c |  2 +-
 util/async.c                   | 25 +++++++++++++++++++++----
 util/main-loop.c               |  4 ++--
 5 files changed, 55 insertions(+), 11 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] util/async: add a human-readable name to BHs for debugging
  2021-04-14 20:02 [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
@ 2021-04-14 20:02 ` Stefan Hajnoczi
  2021-04-15  9:08   ` Philippe Mathieu-Daudé
  2021-04-14 20:02 ` [PATCH 2/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2021-04-14 20:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, eric.g.ernst, Stefan Hajnoczi,
	Paolo Bonzini, Max Reitz

It can be difficult to debug issues with BHs in production environments.
Although BHs can usually be identified by looking up their ->cb()
function pointer, this requires debug information for the program. It is
also not possible to print human-readable diagnostics about BHs because
they have no identifier.

This patch adds a name to each BH. The name is not unique per instance
but differentiates between cb() functions, which is usually enough. It's
done by changing aio_bh_new() and friends to macros that stringify cb.

The next patch will use the name field when reporting leaked BHs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio.h            | 31 ++++++++++++++++++++++++++++---
 include/qemu/main-loop.h       |  4 +++-
 tests/unit/ptimer-test-stubs.c |  2 +-
 util/async.c                   |  9 +++++++--
 util/main-loop.c               |  4 ++--
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 5f342267d5..499668fef5 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -291,20 +291,45 @@ void aio_context_acquire(AioContext *ctx);
 /* Relinquish ownership of the AioContext. */
 void aio_context_release(AioContext *ctx);
 
+/**
+ * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
+ * run only once and as soon as possible.
+ *
+ * @name: A human-readable identifier for debugging purposes.
+ */
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+                                  const char *name);
+
 /**
  * aio_bh_schedule_oneshot: Allocate a new bottom half structure that will run
  * only once and as soon as possible.
+ *
+ * A convenience wrapper for aio_bh_schedule_oneshot_full() that uses cb as the
+ * name string.
  */
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+#define aio_bh_schedule_oneshot(ctx, cb, opaque) \
+    aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)))
 
 /**
- * aio_bh_new: Allocate a new bottom half structure.
+ * aio_bh_new_full: Allocate a new bottom half structure.
  *
  * Bottom halves are lightweight callbacks whose invocation is guaranteed
  * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
  * is opaque and must be allocated prior to its use.
+ *
+ * @name: A human-readable identifier for debugging purposes.
  */
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+                        const char *name);
+
+/**
+ * aio_bh_new: Allocate a new bottom half structure
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new(ctx, cb, opaque) \
+    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index d6892fd208..c7e8a21b5d 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -312,7 +312,9 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 void qemu_fd_register(int fd);
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
+#define qemu_bh_new(cb, opaque) \
+    qemu_bh_new_full((cb), (opaque), (stringify(cb)))
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index 7f801a4d09..2a3ef58799 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -108,7 +108,7 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
     return deadline;
 }
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
 {
     QEMUBH *bh = g_new(QEMUBH, 1);
 
diff --git a/util/async.c b/util/async.c
index 674dbefb7c..b6acb86520 100644
--- a/util/async.c
+++ b/util/async.c
@@ -57,6 +57,7 @@ enum {
 
 struct QEMUBH {
     AioContext *ctx;
+    const char *name;
     QEMUBHFunc *cb;
     void *opaque;
     QSLIST_ENTRY(QEMUBH) next;
@@ -107,7 +108,8 @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
     return bh;
 }
 
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
+                                  void *opaque, const char *name)
 {
     QEMUBH *bh;
     bh = g_new(QEMUBH, 1);
@@ -115,11 +117,13 @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
         .ctx = ctx,
         .cb = cb,
         .opaque = opaque,
+        .name = name,
     };
     aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
 }
 
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+                        const char *name)
 {
     QEMUBH *bh;
     bh = g_new(QEMUBH, 1);
@@ -127,6 +131,7 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
         .ctx = ctx,
         .cb = cb,
         .opaque = opaque,
+        .name = name,
     };
     return bh;
 }
diff --git a/util/main-loop.c b/util/main-loop.c
index 5188ff6540..2c6a9a9d87 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -543,9 +543,9 @@ void main_loop_wait(int nonblocking)
 
 /* Functions to operate on the main QEMU AioContext.  */
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
 {
-    return aio_bh_new(qemu_aio_context, cb, opaque);
+    return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] util/async: print leaked BH name when AioContext finalizes
  2021-04-14 20:02 [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
  2021-04-14 20:02 ` [PATCH 1/2] util/async: add a human-readable name to BHs for debugging Stefan Hajnoczi
@ 2021-04-14 20:02 ` Stefan Hajnoczi
  2021-04-15  8:09 ` [PATCH 0/2] " Fam Zheng
  2021-07-05 10:55 ` Stefan Hajnoczi
  3 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2021-04-14 20:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, eric.g.ernst, Stefan Hajnoczi,
	Paolo Bonzini, Max Reitz

BHs must be deleted before the AioContext is finalized. If not, it's a
bug and probably indicates that some part of the program still expects
the BH to run in the future. That can lead to memory leaks, inconsistent
state, or just hangs.

Unfortunately the assert(flags & BH_DELETED) call in aio_ctx_finalize()
is difficult to debug because the assertion failure contains no
information about the BH!

Use the QEMUBH name field added in the previous patch to show a useful
error when a leaked BH is detected.

Suggested-by: Eric Ernst <eric.g.ernst@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/async.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index b6acb86520..2584fca249 100644
--- a/util/async.c
+++ b/util/async.c
@@ -344,8 +344,20 @@ aio_ctx_finalize(GSource     *source)
     assert(QSIMPLEQ_EMPTY(&ctx->bh_slice_list));
 
     while ((bh = aio_bh_dequeue(&ctx->bh_list, &flags))) {
-        /* qemu_bh_delete() must have been called on BHs in this AioContext */
-        assert(flags & BH_DELETED);
+        /*
+         * qemu_bh_delete() must have been called on BHs in this AioContext. In
+         * many cases memory leaks, hangs, or inconsistent state occur when a
+         * BH is leaked because something still expects it to run.
+         *
+         * If you hit this, fix the lifecycle of the BH so that
+         * qemu_bh_delete() and any associated cleanup is called before the
+         * AioContext is finalized.
+         */
+        if (unlikely(!(flags & BH_DELETED))) {
+            fprintf(stderr, "%s: BH '%s' leaked, aborting...\n",
+                    __func__, bh->name);
+            abort();
+        }
 
         g_free(bh);
     }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes
  2021-04-14 20:02 [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
  2021-04-14 20:02 ` [PATCH 1/2] util/async: add a human-readable name to BHs for debugging Stefan Hajnoczi
  2021-04-14 20:02 ` [PATCH 2/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
@ 2021-04-15  8:09 ` Fam Zheng
  2021-07-05 10:55 ` Stefan Hajnoczi
  3 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2021-04-15  8:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Paolo Bonzini,
	eric.g.ernst

On 2021-04-14 21:02, Stefan Hajnoczi wrote:
> Eric Ernst and I debugged a BH leak and it was more involved than it should be.
> The problem is that BHs don't have a human-readable identifier, so low-level
> debugging techniques and inferences about the code are required to figure out
> which BH was leaked in production environments without easy debug access.
> 
> The leak ended up already being fixed upstream but let's improve diagnostics
> for leaked BHs so that this becomes quick and easy in the future.
> 
> Stefan Hajnoczi (2):
>   util/async: add a human-readable name to BHs for debugging
>   util/async: print leaked BH name when AioContext finalizes
> 
>  include/block/aio.h            | 31 ++++++++++++++++++++++++++++---
>  include/qemu/main-loop.h       |  4 +++-
>  tests/unit/ptimer-test-stubs.c |  2 +-
>  util/async.c                   | 25 +++++++++++++++++++++----
>  util/main-loop.c               |  4 ++--
>  5 files changed, 55 insertions(+), 11 deletions(-)
> 
> -- 
> 2.30.2
> 

Reviewed-by: Fam Zheng <fam@euphon.net>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] util/async: add a human-readable name to BHs for debugging
  2021-04-14 20:02 ` [PATCH 1/2] util/async: add a human-readable name to BHs for debugging Stefan Hajnoczi
@ 2021-04-15  9:08   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 6+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-04-15  9:08 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Paolo Bonzini,
	eric.g.ernst

On 4/14/21 10:02 PM, Stefan Hajnoczi wrote:
> It can be difficult to debug issues with BHs in production environments.
> Although BHs can usually be identified by looking up their ->cb()
> function pointer, this requires debug information for the program. It is
> also not possible to print human-readable diagnostics about BHs because
> they have no identifier.
> 
> This patch adds a name to each BH. The name is not unique per instance
> but differentiates between cb() functions, which is usually enough. It's
> done by changing aio_bh_new() and friends to macros that stringify cb.
> 
> The next patch will use the name field when reporting leaked BHs.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/block/aio.h            | 31 ++++++++++++++++++++++++++++---
>  include/qemu/main-loop.h       |  4 +++-
>  tests/unit/ptimer-test-stubs.c |  2 +-
>  util/async.c                   |  9 +++++++--
>  util/main-loop.c               |  4 ++--
>  5 files changed, 41 insertions(+), 9 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes
  2021-04-14 20:02 [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2021-04-15  8:09 ` [PATCH 0/2] " Fam Zheng
@ 2021-07-05 10:55 ` Stefan Hajnoczi
  3 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2021-07-05 10:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, eric.g.ernst, Paolo Bonzini,
	Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

On Wed, Apr 14, 2021 at 09:02:45PM +0100, Stefan Hajnoczi wrote:
> Eric Ernst and I debugged a BH leak and it was more involved than it should be.
> The problem is that BHs don't have a human-readable identifier, so low-level
> debugging techniques and inferences about the code are required to figure out
> which BH was leaked in production environments without easy debug access.
> 
> The leak ended up already being fixed upstream but let's improve diagnostics
> for leaked BHs so that this becomes quick and easy in the future.
> 
> Stefan Hajnoczi (2):
>   util/async: add a human-readable name to BHs for debugging
>   util/async: print leaked BH name when AioContext finalizes
> 
>  include/block/aio.h            | 31 ++++++++++++++++++++++++++++---
>  include/qemu/main-loop.h       |  4 +++-
>  tests/unit/ptimer-test-stubs.c |  2 +-
>  util/async.c                   | 25 +++++++++++++++++++++----
>  util/main-loop.c               |  4 ++--
>  5 files changed, 55 insertions(+), 11 deletions(-)
> 
> -- 
> 2.30.2
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-05 10:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14 20:02 [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
2021-04-14 20:02 ` [PATCH 1/2] util/async: add a human-readable name to BHs for debugging Stefan Hajnoczi
2021-04-15  9:08   ` Philippe Mathieu-Daudé
2021-04-14 20:02 ` [PATCH 2/2] util/async: print leaked BH name when AioContext finalizes Stefan Hajnoczi
2021-04-15  8:09 ` [PATCH 0/2] " Fam Zheng
2021-07-05 10:55 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).