All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Trace2 stopwatch timers and global counters
@ 2021-12-20 15:01 Jeff Hostetler via GitGitGadget
  2021-12-20 15:01 ` [PATCH 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler

Extend Trace2 to provide multiple "stopwatch timers" and "global counters".

 1. Stopwatch Timers

A stopwatch timer is a thread-safe timer that may be repeatedly started and
stopped to measure intervals of time spent in spans of code. A single
summary "timer" event record is written to the Trace2 event stream at the
end of the program. Timers are accumulated in TLS, so it can also report
per-thread interval times when desired.

Timer events are automatically written during the Trace2 "atexit" handler,
so various subsystems don't need to worry about that.

New timers may be defined by adding a new enum trace2_timer_id value and a
row to the trace2/tr2_tmr.c:tr2tmr_def_block[] global table.

Timer events include the number of intervals (start+stop calls), the total
elapsed time, and min/max intervals.

Two test timers are predefined and used by t/helper/test-trace2.c and the
t/t0211 and t/0212 tests.

 2. Global Counters

A global counter is a lighter weight version of the above that just
accumulates integer values, but without the timing and min/max statistics.

Counter events are written during the Trace2 "atexit" handler automatically,
so subsystems that use these counters don't need to create their own.

New counters may be defined by adding a new enum trace2_counter_id value and
a row to the trace2/tr2_ctr.c:tr2ctr_def_block[] global table.

 3. Rationale

Timers and counters are an alternative to the existing "region" and "data"
events. The latter are intended to trace the major flow (or phases) of the
program and possibly capture the amount of work performed within a loop, for
example. The former are offered as a way to measure activity that is not
localized, such as the time spent in zlib or lstat, which may be called from
many different parts of the program.

There are currently several places in the Git code where we want to measure
such activity -- changed-path Bloom filter stats, topo-walk commit counts,
and tree-walk counts and max-depths. A conversation in [1] suggested that we
should investigate a more general mechanism to collect stats so that each
instance doesn't need to recreate their own atexit handling mechanism.

This is an attempt to address that and let us easily explore other areas in
the future.

This patch series does not attempt to refactor those three instances to use
the new timers and counters. That should be a separate effort -- in part
because we may want to retool them rather than just translate them. For
example, rather than just translating the existing four Bloom filter counts
(in revision.c) into Trace2 counters, we may instead want to have a "happy
path timer" and a "sad path timer" if that would provide more insight.

 4. Notes

The first two commits in this series attend to some cleanup that was
discussed in [2] and [3]. The first (using size_t rather than int) is
harmless and could be done in a separate series if desired. The second
(using a char* rather than a strbuf for the thread-name) is a nice cleanup
before I change how I use the thread-name in a later commit in the series.

[1]
https://lore.kernel.org/git/cbc17f1b-57fc-497f-f1ab-baa8cc84620d@gmail.com/
[2] https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/ [3]
https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/

Jeff Hostetler (9):
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  trace2: defer free of TLS CTX until program exit.
  trace2: add thread-name override to event target
  trace2: add thread-name override to perf target
  trace2: add timer events to perf and event target formats
  trace2: add stopwatch timers
  trace2: add counter events to perf and event target formats
  trace2: add global counters

 Documentation/technical/api-trace2.txt | 159 ++++++++++++++++++++-
 Makefile                               |   2 +
 t/helper/test-trace2.c                 | 184 +++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  49 +++++++
 t/t0212-trace2-event.sh                |  69 ++++++++++
 trace2.c                               | 114 +++++++++++++++
 trace2.h                               |  75 ++++++++++
 trace2/tr2_ctr.c                       |  65 +++++++++
 trace2/tr2_ctr.h                       |  75 ++++++++++
 trace2/tr2_tgt.h                       |  39 ++++++
 trace2/tr2_tgt_event.c                 | 122 ++++++++++++----
 trace2/tr2_tgt_normal.c                |   2 +
 trace2/tr2_tgt_perf.c                  | 112 +++++++++++----
 trace2/tr2_tls.c                       | 110 ++++++++++++---
 trace2/tr2_tls.h                       |  42 +++++-
 trace2/tr2_tmr.c                       | 126 +++++++++++++++++
 trace2/tr2_tmr.h                       | 120 ++++++++++++++++
 17 files changed, 1386 insertions(+), 79 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h


base-commit: e773545c7fe7eca21b134847f4fc2cbc9547fa14
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1099%2Fjeffhostetler%2Ftrace2-stopwatch-v2-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1099/jeffhostetler/trace2-stopwatch-v2-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1099
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 15:01 ` [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char* Jeff Hostetler via GitGitGadget
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().

This was discussed in: https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..a90bd639d48 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -11,8 +11,8 @@
 struct tr2tls_thread_ctx {
 	struct strbuf thread_name;
 	uint64_t *array_us_start;
-	int alloc;
-	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
+	size_t alloc;
+	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
  2021-12-20 15:01 ` [PATCH 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 16:31   ` Ævar Arnfjörð Bjarmason
  2021-12-21  7:22   ` Junio C Hamano
  2021-12-20 15:01 ` [PATCH 3/9] trace2: defer free of TLS CTX until program exit Jeff Hostetler via GitGitGadget
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
The thread name is set when the thread is created and should not be
be modified afterwards.  Replace the strbuf with an allocated pointer
to make that more clear.

This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  |  2 +-
 trace2/tr2_tls.c       | 16 +++++++++-------
 trace2/tr2_tls.h       |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 3a0014417cc..ca48d00aebc 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index e4acca13d64..c3e57fcb3c0 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -106,7 +106,7 @@ static void perf_fmt_prepare(const char *event_name,
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
+		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
 		    event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..cd8b9f2f0a0 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start)
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
+	struct strbuf buf_name = STRBUF_INIT;
 
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
@@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 
 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
-	strbuf_init(&ctx->thread_name, 0);
 	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
+		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
+	strbuf_addstr(&buf_name, thread_name);
+	if (buf_name.len > TR2_MAX_THREAD_NAME)
+		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
+
+	ctx->thread_name = strbuf_detach(&buf_name, NULL);
 
 	pthread_setspecific(tr2tls_key, ctx);
 
@@ -95,7 +97,7 @@ void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
+	free(ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +115,7 @@ void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index a90bd639d48..d968da6a679 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -9,7 +9,7 @@
 #define TR2_MAX_THREAD_NAME (24)
 
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
+	char *thread_name;
 	uint64_t *array_us_start;
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
  2021-12-20 15:01 ` [PATCH 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
  2021-12-20 15:01 ` [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char* Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-21  7:30   ` Junio C Hamano
  2021-12-20 15:01 ` [PATCH 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Defer freeing of the Trace2 thread CTX data until program exit.
Create a global list of thread CTX data to own the storage.

TLS CTX data is allocated when a thread is created and associated
with that thread.  Previously, that storage was deleted when the
thread exited.  Now we simply disassociate the CTX data from the
thread when it exits and let the global CTX list manage the cleanup.

This will be used by a later commit when we add "counters" and
stopwatch-style "timers" to the Trace2 API.  We will add those
fields to the CTX block and allow threads to efficiently (without
locks) accumulate counter and timer data using TLS.  At program
exit, the main thread can run thru the global list and compute
totals before it frees them.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.c | 38 ++++++++++++++++++++++++++++----------
 trace2/tr2_tls.h |  3 ++-
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index cd8b9f2f0a0..b68d297bf51 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -15,7 +15,16 @@ static uint64_t tr2tls_us_start_process;
 static pthread_mutex_t tr2tls_mutex;
 static pthread_key_t tr2tls_key;
 
-static int tr2_next_thread_id; /* modify under lock */
+/*
+ * This list owns all of the thread-specific CTX data.
+ *
+ * While a thread is alive it is associated with a CTX (owned by this
+ * list) and that CTX is installed in the thread's TLS data area.
+ *
+ * Similarly, `tr2tls_thread_main` points to a CTX contained within
+ * this list.
+ */
+static struct tr2tls_thread_ctx *tr2tls_ctx_list; /* modify under lock */
 
 void tr2tls_start_process_clock(void)
 {
@@ -46,7 +55,12 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
 
-	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
+	pthread_mutex_lock(&tr2tls_mutex);
+	if (tr2tls_ctx_list)
+		ctx->thread_id = tr2tls_ctx_list->thread_id + 1;
+	ctx->next_ctx = tr2tls_ctx_list;
+	tr2tls_ctx_list = ctx;
+	pthread_mutex_unlock(&tr2tls_mutex);
 
 	if (ctx->thread_id)
 		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
@@ -91,15 +105,7 @@ int tr2tls_is_main_thread(void)
 
 void tr2tls_unset_self(void)
 {
-	struct tr2tls_thread_ctx *ctx;
-
-	ctx = tr2tls_get_self();
-
 	pthread_setspecific(tr2tls_key, NULL);
-
-	free(ctx->thread_name);
-	free(ctx->array_us_start);
-	free(ctx);
 }
 
 void tr2tls_push_self(uint64_t us_now)
@@ -163,11 +169,23 @@ void tr2tls_init(void)
 
 void tr2tls_release(void)
 {
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
 	tr2tls_unset_self();
 	tr2tls_thread_main = NULL;
 
 	pthread_mutex_destroy(&tr2tls_mutex);
 	pthread_key_delete(tr2tls_key);
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		free(ctx->thread_name);
+		free(ctx->array_us_start);
+		free(ctx);
+
+		ctx = next;
+	}
 }
 
 int tr2tls_locked_increment(int *p)
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index d968da6a679..c6b6c69b25a 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -9,6 +9,7 @@
 #define TR2_MAX_THREAD_NAME (24)
 
 struct tr2tls_thread_ctx {
+	struct tr2tls_thread_ctx *next_ctx;
 	char *thread_name;
 	uint64_t *array_us_start;
 	size_t alloc;
@@ -45,7 +46,7 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void);
 int tr2tls_is_main_thread(void);
 
 /*
- * Free our TLS data.
+ * Disassociate thread's TLS CTX data from the thread.
  */
 void tr2tls_unset_self(void);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/9] trace2: add thread-name override to event target
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-12-20 15:01 ` [PATCH 3/9] trace2: defer free of TLS CTX until program exit Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 15:01 ` [PATCH 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach the Trace2 event target to allow the thread-name field to
be specified rather than always inherited from the TLS CTX.

This will be used in a future commit for global events that should
not be tied to a particular thread, such as a global stopwatch timer
that aggregates data from all threads.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c | 59 ++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 28 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index ca48d00aebc..4ce50944298 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -81,14 +81,17 @@ static void fn_term(void)
  */
 static void event_fmt_prepare(const char *event_name, const char *file,
 			      int line, const struct repository *repo,
-			      struct json_writer *jw)
+			      struct json_writer *jw,
+			      const char *thread_name_override)
 {
-	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 	struct tr2_tbuf tb_now;
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name);
+	jw_object_string(jw, "thread",
+			 ((thread_name_override && *thread_name_override)
+			  ? thread_name_override
+			  : tr2tls_get_self()->thread_name));
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
@@ -114,7 +117,7 @@ static void fn_too_many_files_fl(const char *file, int line)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_end(&jw);
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
@@ -127,7 +130,7 @@ static void fn_version_fl(const char *file, int line)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "evt", TR2_EVENT_VERSION);
 	jw_object_string(&jw, "exe", git_version_string);
 	jw_end(&jw);
@@ -147,7 +150,7 @@ static void fn_start_fl(const char *file, int line,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_inline_begin_array(&jw, "argv");
 	jw_array_argv(&jw, argv);
@@ -166,7 +169,7 @@ static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_intmax(&jw, "code", code);
 	jw_end(&jw);
@@ -182,7 +185,7 @@ static void fn_signal(uint64_t us_elapsed_absolute, int signo)
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_intmax(&jw, "signo", signo);
 	jw_end(&jw);
@@ -198,7 +201,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code)
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_intmax(&jw, "code", code);
 	jw_end(&jw);
@@ -231,7 +234,7 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	maybe_add_string_va(&jw, "msg", fmt, ap);
 	/*
 	 * Also emit the format string as a field in case
@@ -253,7 +256,7 @@ static void fn_command_path_fl(const char *file, int line, const char *pathname)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "path", pathname);
 	jw_end(&jw);
 
@@ -268,7 +271,7 @@ static void fn_command_ancestry_fl(const char *file, int line, const char **pare
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_inline_begin_array(&jw, "ancestry");
 
 	while ((parent_name = *parent_names++))
@@ -288,7 +291,7 @@ static void fn_command_name_fl(const char *file, int line, const char *name,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "name", name);
 	if (hierarchy && *hierarchy)
 		jw_object_string(&jw, "hierarchy", hierarchy);
@@ -304,7 +307,7 @@ static void fn_command_mode_fl(const char *file, int line, const char *mode)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "name", mode);
 	jw_end(&jw);
 
@@ -319,7 +322,7 @@ static void fn_alias_fl(const char *file, int line, const char *alias,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "alias", alias);
 	jw_object_inline_begin_array(&jw, "argv");
 	jw_array_argv(&jw, argv);
@@ -338,7 +341,7 @@ static void fn_child_start_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "child_id", cmd->trace2_child_id);
 	if (cmd->trace2_hook_name) {
 		jw_object_string(&jw, "child_class", "hook");
@@ -371,7 +374,7 @@ static void fn_child_exit_fl(const char *file, int line,
 	double t_rel = (double)us_elapsed_child / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "child_id", cid);
 	jw_object_intmax(&jw, "pid", pid);
 	jw_object_intmax(&jw, "code", code);
@@ -392,7 +395,7 @@ static void fn_child_ready_fl(const char *file, int line,
 	double t_rel = (double)us_elapsed_child / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "child_id", cid);
 	jw_object_intmax(&jw, "pid", pid);
 	jw_object_string(&jw, "ready", ready);
@@ -411,7 +414,7 @@ static void fn_thread_start_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_end(&jw);
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
@@ -427,7 +430,7 @@ static void fn_thread_exit_fl(const char *file, int line,
 	double t_rel = (double)us_elapsed_thread / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_rel", 6, t_rel);
 	jw_end(&jw);
 
@@ -442,7 +445,7 @@ static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "exec_id", exec_id);
 	if (exe)
 		jw_object_string(&jw, "exe", exe);
@@ -463,7 +466,7 @@ static void fn_exec_result_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "exec_id", exec_id);
 	jw_object_intmax(&jw, "code", code);
 	jw_end(&jw);
@@ -479,7 +482,7 @@ static void fn_param_fl(const char *file, int line, const char *param,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "param", param);
 	jw_object_string(&jw, "value", value);
 	jw_end(&jw);
@@ -495,7 +498,7 @@ static void fn_repo_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, repo, &jw);
+	event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 	jw_object_string(&jw, "worktree", repo->worktree);
 	jw_end(&jw);
 
@@ -516,7 +519,7 @@ static void fn_region_enter_printf_va_fl(const char *file, int line,
 		struct json_writer jw = JSON_WRITER_INIT;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
 		if (category)
 			jw_object_string(&jw, "category", category);
@@ -542,7 +545,7 @@ static void fn_region_leave_printf_va_fl(
 		double t_rel = (double)us_elapsed_region / 1000000.0;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_double(&jw, "t_rel", 6, t_rel);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
 		if (category)
@@ -570,7 +573,7 @@ static void fn_data_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 		double t_rel = (double)us_elapsed_region / 1000000.0;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_double(&jw, "t_abs", 6, t_abs);
 		jw_object_double(&jw, "t_rel", 6, t_rel);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
@@ -598,7 +601,7 @@ static void fn_data_json_fl(const char *file, int line,
 		double t_rel = (double)us_elapsed_region / 1000000.0;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_double(&jw, "t_abs", 6, t_abs);
 		jw_object_double(&jw, "t_rel", 6, t_rel);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 5/9] trace2: add thread-name override to perf target
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-12-20 15:01 ` [PATCH 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 15:01 ` [PATCH 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach the Trace2 perf target to allow the thread-name field be
specified rather than always inherited from the TLS CTX.

This will be used in a future commit for global events that should
not be tied to a particular thread, such as a global stopwatch timer.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_perf.c | 64 +++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 29 deletions(-)

diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index c3e57fcb3c0..47293e99d4b 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -64,9 +64,14 @@ static void perf_fmt_prepare(const char *event_name,
 			     int line, const struct repository *repo,
 			     uint64_t *p_us_elapsed_absolute,
 			     uint64_t *p_us_elapsed_relative,
-			     const char *category, struct strbuf *buf)
+			     const char *category, struct strbuf *buf,
+			     const char *thread_name_override)
 {
 	int len;
+	const char *thread_name =
+		((thread_name_override && *thread_name_override)
+		 ? thread_name_override
+		 : ctx->thread_name);
 
 	strbuf_setlen(buf, 0);
 
@@ -106,7 +111,7 @@ static void perf_fmt_prepare(const char *event_name,
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
+		    thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
 		    event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
@@ -140,14 +145,15 @@ static void perf_io_write_fl(const char *file, int line, const char *event_name,
 			     uint64_t *p_us_elapsed_absolute,
 			     uint64_t *p_us_elapsed_relative,
 			     const char *category,
-			     const struct strbuf *buf_payload)
+			     const struct strbuf *buf_payload,
+			     const char *thread_name_override)
 {
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 	struct strbuf buf_line = STRBUF_INIT;
 
 	perf_fmt_prepare(event_name, ctx, file, line, repo,
 			 p_us_elapsed_absolute, p_us_elapsed_relative, category,
-			 &buf_line);
+			 &buf_line, thread_name_override);
 	strbuf_addbuf(&buf_line, buf_payload);
 	tr2_dst_write_line(&tr2dst_perf, &buf_line);
 	strbuf_release(&buf_line);
@@ -161,7 +167,7 @@ static void fn_version_fl(const char *file, int line)
 	strbuf_addstr(&buf_payload, git_version_string);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -174,7 +180,7 @@ static void fn_start_fl(const char *file, int line,
 	sq_append_quote_argv_pretty(&buf_payload, argv);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -187,7 +193,7 @@ static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	strbuf_addf(&buf_payload, "code:%d", code);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -199,7 +205,7 @@ static void fn_signal(uint64_t us_elapsed_absolute, int signo)
 	strbuf_addf(&buf_payload, "signo:%d", signo);
 
 	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
+			 &us_elapsed_absolute, NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -211,7 +217,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code)
 	strbuf_addf(&buf_payload, "code:%d", code);
 
 	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
+			 &us_elapsed_absolute, NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -237,7 +243,7 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt,
 	maybe_append_string_va(&buf_payload, fmt, ap);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -249,7 +255,7 @@ static void fn_command_path_fl(const char *file, int line, const char *pathname)
 	strbuf_addstr(&buf_payload, pathname);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -264,7 +270,7 @@ static void fn_command_ancestry_fl(const char *file, int line, const char **pare
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -279,7 +285,7 @@ static void fn_command_name_fl(const char *file, int line, const char *name,
 		strbuf_addf(&buf_payload, " (%s)", hierarchy);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -291,7 +297,7 @@ static void fn_command_mode_fl(const char *file, int line, const char *mode)
 	strbuf_addstr(&buf_payload, mode);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -306,7 +312,7 @@ static void fn_alias_fl(const char *file, int line, const char *alias,
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -342,7 +348,7 @@ static void fn_child_start_fl(const char *file, int line,
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -356,7 +362,7 @@ static void fn_child_exit_fl(const char *file, int line,
 	strbuf_addf(&buf_payload, "[ch%d] pid:%d code:%d", cid, pid, code);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
+			 &us_elapsed_child, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -370,7 +376,7 @@ static void fn_child_ready_fl(const char *file, int line,
 	strbuf_addf(&buf_payload, "[ch%d] pid:%d ready:%s", cid, pid, ready);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
+			 &us_elapsed_child, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -381,7 +387,7 @@ static void fn_thread_start_fl(const char *file, int line,
 	struct strbuf buf_payload = STRBUF_INIT;
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -393,7 +399,7 @@ static void fn_thread_exit_fl(const char *file, int line,
 	struct strbuf buf_payload = STRBUF_INIT;
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_thread, NULL, &buf_payload);
+			 &us_elapsed_thread, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -414,7 +420,7 @@ static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -430,7 +436,7 @@ static void fn_exec_result_fl(const char *file, int line,
 		strbuf_addf(&buf_payload, " err:%s", strerror(code));
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -443,7 +449,7 @@ static void fn_param_fl(const char *file, int line, const char *param,
 	strbuf_addf(&buf_payload, "%s:%s", param, value);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -457,7 +463,7 @@ static void fn_repo_fl(const char *file, int line,
 	sq_quote_buf_pretty(&buf_payload, repo->worktree);
 
 	perf_io_write_fl(file, line, event_name, repo, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -479,7 +485,7 @@ static void fn_region_enter_printf_va_fl(const char *file, int line,
 	}
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 NULL, category, &buf_payload);
+			 NULL, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -499,7 +505,7 @@ static void fn_region_leave_printf_va_fl(
 	}
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
+			 &us_elapsed_region, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -514,7 +520,7 @@ static void fn_data_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	strbuf_addf(&buf_payload, "%s:%s", key, value);
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
+			 &us_elapsed_region, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -530,7 +536,7 @@ static void fn_data_json_fl(const char *file, int line,
 	strbuf_addf(&buf_payload, "%s:%s", key, value->json.buf);
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
+			 &us_elapsed_region, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -544,7 +550,7 @@ static void fn_printf_va_fl(const char *file, int line,
 	maybe_append_string_va(&buf_payload, fmt, ap);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 6/9] trace2: add timer events to perf and event target formats
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-12-20 15:01 ` [PATCH 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 16:39   ` Ævar Arnfjörð Bjarmason
  2021-12-21 14:20   ` Derrick Stolee
  2021-12-20 15:01 ` [PATCH 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach Trace2 "perf" and "event" formats to handle "timer" events for
stopwatch timers.  Update API documentation accordingly.

In a future commit, stopwatch timers will be added to the Trace2 API
and it will emit these "timer" events.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 25 +++++++++++++++-
 trace2/tr2_tgt.h                       | 25 ++++++++++++++++
 trace2/tr2_tgt_event.c                 | 40 +++++++++++++++++++++++++-
 trace2/tr2_tgt_normal.c                |  1 +
 trace2/tr2_tgt_perf.c                  | 29 +++++++++++++++++++
 5 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index bb13ca3db8b..e6ed94ba814 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
 {
 	"event":"version",
 	...
-	"evt":"3",		       # EVENT format version
+	"evt":"4",		       # EVENT format version
 	"exe":"2.20.1.155.g426c96fcdb" # git version
 }
 ------------
@@ -815,6 +815,29 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"timer"`::
+	This event is generated at the end of the program and contains
+	statistics for a global stopwatch timer.
++
+------------
+{
+	"event":"timer",
+	...
+	"name":"test",      # timer name
+	"count":42,         # number of start+stop intervals
+	"t_total":1.234,    # sum of all intervals (by thread or globally)
+	"t_min":0.1,        # shortest interval
+	"t_max":0.9,        # longest interval
+}
+------------
++
+Stopwatch timer data is independently collected by each thread and then
+aggregated for the whole program, so the total time reported here
+may exceed the "atexit" elapsed time of the program.
++
+Timer events may represent an individual thread or a summation across
+the entire program.  Summation events will have a unique thread name.
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 65f94e15748..1f548eb4b93 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -96,6 +96,30 @@ typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int line,
 					 uint64_t us_elapsed_absolute,
 					 const char *fmt, va_list ap);
 
+/*
+ * Stopwatch timer event.  This function writes the previously accumlated
+ * stopwatch timer values to the event streams.  Unlike other Trace2 API
+ * events, this is decoupled from the data collection.
+ *
+ * This does not take a (file,line) pair because a timer event reports
+ * the cummulative time spend in the timer over a series of intervals
+ * -- it does not represent a single usage (like region or data events
+ * do).
+ *
+ * The thread name is optional.  If non-null it will override the
+ * value inherited from the caller's TLS CTX.  This allows data
+ * for global timers to be reported without associating it with a
+ * single thread.
+ */
+typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
+				  const char *thread_name,
+				  const char *category,
+				  const char *timer_name,
+				  uint64_t interval_count,
+				  uint64_t us_total_time,
+				  uint64_t us_min_time,
+				  uint64_t us_max_time);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -132,6 +156,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_fl_t                   *pfn_data_fl;
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
+	tr2_tgt_evt_timer_t                     *pfn_timer;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 4ce50944298..9b3905b920c 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -19,8 +19,13 @@ static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
  * interpretation of existing events or fields. Smaller changes, such as adding
  * a new field to an existing event, do not require an increment to the EVENT
  * format version.
+ *
+ * Verison 1: original version
+ * Version 2: added "too_many_files" event
+ * Version 3: added "child_ready" event
+ * Version 4: added "timer" event
  */
-#define TR2_EVENT_VERSION "3"
+#define TR2_EVENT_VERSION "4"
 
 /*
  * Region nesting limit for messages written to the event target.
@@ -615,6 +620,38 @@ static void fn_data_json_fl(const char *file, int line,
 	}
 }
 
+static void fn_timer(uint64_t us_elapsed_absolute,
+		     const char *thread_name,
+		     const char *category,
+		     const char *timer_name,
+		     uint64_t interval_count,
+		     uint64_t us_total_time,
+		     uint64_t us_min_time,
+		     uint64_t us_max_time)
+{
+	const char *event_name = "timer";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_abs = (double)us_elapsed_absolute / 1000000.0;
+
+	double t_total = (double)us_total_time / 1000000.0;
+	double t_min   = (double)us_min_time   / 1000000.0;
+	double t_max   = (double)us_max_time   / 1000000.0;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	jw_object_double(&jw, "t_abs", 6, t_abs);
+	jw_object_string(&jw, "name", timer_name);
+	jw_object_intmax(&jw, "count", interval_count);
+	jw_object_double(&jw, "t_total", 6, t_total);
+	jw_object_double(&jw, "t_min", 6, t_min);
+	jw_object_double(&jw, "t_max", 6, t_max);
+
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	&tr2dst_event,
 
@@ -646,4 +683,5 @@ struct tr2_tgt tr2_tgt_event = {
 	fn_data_fl,
 	fn_data_json_fl,
 	NULL, /* printf */
+	fn_timer,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 58d9e430f05..23a7e78dcaa 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -355,4 +355,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	NULL, /* data */
 	NULL, /* data_json */
 	fn_printf_va_fl,
+	NULL, /* timer */
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 47293e99d4b..7597cb52ed5 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -554,6 +554,34 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(uint64_t us_elapsed_absolute,
+		     const char *thread_name,
+		     const char *category,
+		     const char *timer_name,
+		     uint64_t interval_count,
+		     uint64_t us_total_time,
+		     uint64_t us_min_time,
+		     uint64_t us_max_time)
+{
+	const char *event_name = "timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	double t_total = (double)us_total_time / 1000000.0;
+	double t_min   = (double)us_min_time   / 1000000.0;
+	double t_max   = (double)us_max_time   / 1000000.0;
+
+	strbuf_addf(&buf_payload, "name:%s", timer_name);
+	strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count);
+	strbuf_addf(&buf_payload, " total:%9.6f", t_total);
+	strbuf_addf(&buf_payload, " min:%9.6f", t_min);
+	strbuf_addf(&buf_payload, " max:%9.6f", t_max);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL,
+			 category, &buf_payload, thread_name);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	&tr2dst_perf,
 
@@ -585,4 +613,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	fn_data_fl,
 	fn_data_json_fl,
 	fn_printf_va_fl,
+	fn_timer,
 };
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 7/9] trace2: add stopwatch timers
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-12-20 15:01 ` [PATCH 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 16:42   ` Ævar Arnfjörð Bjarmason
  2021-12-21 14:45   ` Derrick Stolee
  2021-12-20 15:01 ` [PATCH 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add a stopwatch timer mechanism to Git.

Timers are an alternative to regions.  Timers can capture a series of
intervals, such as calls to a library routine or a span of code.  They
are intended for code that is not necessarily associated with a
particular phase of the command.

Timer data is accumulated throughout the command and a timer "summary"
event is logged (one per timer) at program exit.

Optionally, timer data may also be reported by thread for certain
timers.  (See trace2/tr2_tmr.c:tr2tmr_def_block[].)

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  48 ++++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  98 +++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  25 +++++
 t/t0212-trace2-event.sh                |  35 +++++++
 trace2.c                               |  62 ++++++++++++
 trace2.h                               |  42 +++++++++
 trace2/tr2_tls.c                       |  29 ++++++
 trace2/tr2_tls.h                       |  17 ++++
 trace2/tr2_tmr.c                       | 126 +++++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 120 +++++++++++++++++++++++
 11 files changed, 603 insertions(+)
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index e6ed94ba814..03a61332a2d 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -1230,6 +1230,54 @@ at offset 508.
 This example also shows that thread names are assigned in a racy manner
 as each thread starts and allocates TLS storage.
 
+Timer Events::
+
+	Trace2 also provides global stopwatch timers as an alternative
+	to regions.  These make it possible to measure the time spent
+	in a span of code or a library routine called from many places
+	and not	associated with a single phase of the overall command.
++
+At the end of the program, a single summary timer event is emitted; this
+aggregates timer usage across all threads.  These events have "summary"
+as their thread name.
++
+For some timers, individual (per-thread) timer events are also generated.
+These may be helpful in understanding how work is balanced between threads
+in some circumstances.
++
+Timers are defined in `enum trace2_timer_id` in trace2.h and in
+`trace2/tr2_tmr.c:tr2tmr_def_block[]`.
++
+----------------
+static void *unpack_compressed_entry(struct packed_git *p,
+				    struct pack_window **w_curs,
+				    off_t curpos,
+				    unsigned long size)
+{
+	...
+	trace2_timer_start(TRACE2_TIMER_ID__TEST1);
+	git_inflate_init(&stream);
+	...
+	git_inflate_end(&stream);
+	trace2_timer_stop(TRACE2_TIMER_ID__TEST1);
+	...
+}
+----------------
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+...
+d0 | summary                  | timer        |     |  0.111026 |           | test         | name:test1 count:4 total: 0.000393 min: 0.000006 max: 0.000302
+d0 | main                     | atexit       |     |  0.111026 |           |              | code:0
+----------------
++
+In this example, the "test1" timer was started 4 times and ran for
+0.000393 seconds.
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index ed75ed422b5..8b657f0162a 100644
--- a/Makefile
+++ b/Makefile
@@ -1022,6 +1022,7 @@ LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
+LIB_OBJS += trace2/tr2_tmr.o
 LIB_OBJS += trace2/tr2_sysenv.o
 LIB_OBJS += trace2/tr2_tbuf.o
 LIB_OBJS += trace2/tr2_tgt_event.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f93633f895a..e98db5ba4c1 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -206,6 +206,102 @@ static int ut_007bug(int argc, const char **argv)
 	BUG("the bug message");
 }
 
+/*
+ * Single-threaded timer test.  Create several intervals using the
+ * TEST1 timer.  The test script can verify that an aggregate Trace2
+ * "timer" event is emitted indicating that we started+stopped the
+ * timer the requested number of times.
+ */
+static int ut_008timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay>";
+
+	int count = 0;
+	int delay = 0;
+	int k;
+
+	if (argc != 2)
+		die("%s", usage_error);
+	if (get_i(&count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&delay, argv[1]))
+		die("%s", usage_error);
+
+	for (k = 0; k < count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+		sleep_millisec(delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+	}
+
+	return 0;
+}
+
+struct ut_009_data {
+	int count;
+	int delay;
+};
+
+static void *ut_009timer_thread_proc(void *_ut_009_data)
+{
+	struct ut_009_data *data = _ut_009_data;
+	int k;
+
+	trace2_thread_start("ut_009");
+
+	for (k = 0; k < data->count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
+		sleep_millisec(data->delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
+	}
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+
+/*
+ * Multi-threaded timer test.  Create several threads that each create
+ * several intervals using the TEST2 timer.  The test script can verify
+ * that an individual Trace2 "timer" event for each thread and an
+ * aggregate "timer" event are generated.
+ */
+static int ut_009timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay> <threads>";
+
+	struct ut_009_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.delay, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_009timer_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -223,6 +319,8 @@ static struct unit_test ut_table[] = {
 	{ ut_005exec,     "005exec",   "<git_command_args>" },
 	{ ut_006data,     "006data",   "[<category> <key> <value>]+" },
 	{ ut_007bug,      "007bug",    "" },
+	{ ut_008timer,    "008timer",  "<count> <ms_delay>" },
+	{ ut_009timer,    "009timer",  "<count> <ms_delay> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 22d0845544e..5c99d734ea2 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -173,4 +173,29 @@ test_expect_success 'using global config, perf stream, return code 0' '
 	test_cmp expect actual
 '
 
+# Exercise the stopwatch timer "test" in a loop and confirm that it was
+# we have as many start/stop intervals as expected.  We cannot really test
+# the (elapsed, min, max) timer values, so we assume they are good.
+#
+test_expect_success 'test stopwatch timers - summary only' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 008timer 5 10 &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+	grep "d0|summary|timer||_T_ABS_||test|name:test1 count:5" actual
+'
+
+test_expect_success 'test stopwatch timers - summary and threads' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 009timer 5 10 3 &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+	grep "d0|th01:ut_009|timer||_T_ABS_||test|name:test2 count:5" actual &&
+	grep "d0|th02:ut_009|timer||_T_ABS_||test|name:test2 count:5" actual &&
+	grep "d0|th02:ut_009|timer||_T_ABS_||test|name:test2 count:5" actual &&
+	grep "d0|summary|timer||_T_ABS_||test|name:test2 count:15" actual
+'
+
 test_done
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index 6d3374ff773..462c001deca 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -323,4 +323,39 @@ test_expect_success 'discard traces when there are too many files' '
 	head -n2 trace_target_dir/git-trace2-discard | tail -n1 | grep \"event\":\"too_many_files\"
 '
 
+# Exercise the stopwatch timer "test" in a loop and confirm that it was
+# we have as many start/stop intervals as expected.  We cannot really test
+# the (t_timer, t_min, t_max) timer values, so we assume they are good.
+#
+
+have_timer_event () {
+	thread=$1
+	name=$2
+	count=$3
+	file=$4
+
+	grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file
+
+	return $?
+}
+
+test_expect_success 'test stopwatch timers - global, single-thread' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 008timer 5 10 &&
+	have_timer_event "summary" "test1" 5 trace.event
+'
+
+test_expect_success 'test stopwatch timers - global+threads' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 009timer 5 10 3 &&
+	have_timer_event "th01:ut_009" "test2" 5 trace.event &&
+	have_timer_event "th02:ut_009" "test2" 5 trace.event &&
+	have_timer_event "th03:ut_009" "test2" 5 trace.event &&
+	have_timer_event "summary" "test2" 15 trace.event
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index b2d471526fd..c073ffa836f 100644
--- a/trace2.c
+++ b/trace2.c
@@ -13,6 +13,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static int trace2_enabled;
 
@@ -83,6 +84,42 @@ static void tr2_tgt_disable_builtins(void)
 		tgt_j->pfn_term();
 }
 
+static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+	struct tr2tmr_block merged;
+
+	memset(&merged, 0, sizeof(merged));
+
+	/*
+	 * Sum across all of the per-thread stopwatch timer data into
+	 * a single composite block of timer values.
+	 */
+	tr2tls_aggregate_timer_blocks(&merged);
+
+	/*
+	 * Emit "summary" timer events for each composite timer value
+	 * that had activity.
+	 */
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tr2tmr_emit_block(tgt_j->pfn_timer,
+					  us_elapsed_absolute,
+					  &merged, "summary");
+}
+
+static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tr2tls_emit_timer_blocks_by_thread(tgt_j->pfn_timer,
+							   us_elapsed_absolute);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -110,6 +147,9 @@ static void tr2main_atexit_handler(void)
 	 */
 	tr2tls_pop_unwind_self();
 
+	tr2main_emit_thread_timers(us_elapsed_absolute);
+	tr2main_emit_summary_timers(us_elapsed_absolute);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -841,3 +881,25 @@ const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
 }
+
+void trace2_timer_start(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("invalid timer id: %d", tid);
+
+	tr2tmr_start(tid);
+}
+
+void trace2_timer_stop(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("invalid timer id: %d", tid);
+
+	tr2tmr_stop(tid);
+}
diff --git a/trace2.h b/trace2.h
index 0cc7b5f5312..32e2eaca7c8 100644
--- a/trace2.h
+++ b/trace2.h
@@ -51,6 +51,7 @@ struct json_writer;
  * [] trace2_region*    -- emit region nesting messages.
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
+ * [] trace2_timer*     -- start/stop stopwatch timer (messages are deferred).
  */
 
 /*
@@ -531,4 +532,45 @@ void trace2_collect_process_info(enum trace2_process_info_reason reason);
 
 const char *trace2_session_id(void);
 
+/*
+ * Define the set of stopwatch timers.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum must also be added to the timer definitions
+ * array.  See `trace2/tr2_tmr.c:tr2tmr_def_block[]`.
+ */
+enum trace2_timer_id {
+	/*
+	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
+
+
+	/* Add additional timer definitions before here. */
+	TRACE2_NUMBER_OF_TIMERS
+};
+
+/*
+ * Start/Stop a stopwatch timer in the current thread.
+ *
+ * The time spent in each start/stop interval will be accumulated and
+ * a "timer" event will be emitted when the program exits.
+ *
+ * Note: Since the stopwatch API routines do not generate individual
+ * events, they do not take (file, line) arguments.  Similarly, the
+ * category and timer name values are defined at compile-time in the
+ * timer definitions array, so they are not needed here in the API.
+ */
+void trace2_timer_start(enum trace2_timer_id tid);
+void trace2_timer_stop(enum trace2_timer_id tid);
+
 #endif /* TRACE2_H */
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index b68d297bf51..068938d334e 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "thread-utils.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 /*
  * Initialize size of the thread stack for nested regions.
@@ -199,3 +200,31 @@ int tr2tls_locked_increment(int *p)
 
 	return current_value;
 }
+
+void tr2tls_aggregate_timer_blocks(struct tr2tmr_block *merged)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2tmr_aggregate_timers(merged, &ctx->timers);
+
+		ctx = next;
+	}
+}
+
+void tr2tls_emit_timer_blocks_by_thread(tr2_tgt_evt_timer_t *pfn,
+					uint64_t us_elapsed_absolute)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2tmr_emit_block(pfn, us_elapsed_absolute, &ctx->timers,
+				  ctx->thread_name);
+
+		ctx = next;
+	}
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index c6b6c69b25a..10669f0d7b9 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_tmr.h"
 
 /*
  * Arbitry limit for thread names for column alignment.
@@ -15,8 +16,24 @@ struct tr2tls_thread_ctx {
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+
+	struct tr2tmr_block timers;
 };
 
+/*
+ * Iterate over the global list of TLS CTX data and aggregate the timer
+ * data into the given timer block.
+ */
+void tr2tls_aggregate_timer_blocks(struct tr2tmr_block *merged);
+
+/*
+ * Iterate over the global list of TLS CTX data (the complete set of
+ * threads that have used Trace2 resources) data and emit "per-thread"
+ * timer data for each.
+ */
+void tr2tls_emit_timer_blocks_by_thread(tr2_tgt_evt_timer_t *pfn,
+					uint64_t us_elapsed_absolute);
+
 /*
  * Create TLS data for the current thread.  This gives us a place to
  * put per-thread data, such as thread start time, function nesting
diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c
new file mode 100644
index 00000000000..216cbd04cca
--- /dev/null
+++ b/trace2/tr2_tmr.c
@@ -0,0 +1,126 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
+
+/*
+ * Define metadata for each stopwatch timer.  This list must match the
+ * set defined in "enum trace2_timer_id".
+ */
+struct tr2tmr_def {
+	const char *category;
+	const char *name;
+
+	unsigned int want_thread_events:1;
+};
+
+static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = {
+	[TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 },
+	[TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 },
+};
+
+void tr2tmr_start(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
+
+	t->recursion_count++;
+	if (t->recursion_count > 1)
+		return; /* ignore recursive starts */
+
+	t->start_us = getnanotime() / 1000;
+}
+
+void tr2tmr_stop(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
+	uint64_t us_now;
+	uint64_t us_interval;
+
+	assert(t->recursion_count > 0);
+
+	t->recursion_count--;
+	if (t->recursion_count > 0)
+		return; /* still in recursive call */
+
+	us_now = getnanotime() / 1000;
+	us_interval = us_now - t->start_us;
+
+	t->total_us += us_interval;
+
+	if (!t->interval_count) {
+		t->min_us = us_interval;
+		t->max_us = us_interval;
+	} else {
+		if (us_interval < t->min_us)
+			t->min_us = us_interval;
+		if (us_interval > t->max_us)
+			t->max_us = us_interval;
+	}
+
+	t->interval_count++;
+}
+
+void tr2tmr_aggregate_timers(struct tr2tmr_block *merged,
+			     const struct tr2tmr_block *src)
+{
+	enum trace2_timer_id tid;
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		struct tr2tmr_timer *t_merged = &merged->timer[tid];
+		const struct tr2tmr_timer *t = &src->timer[tid];
+
+		t_merged->is_aggregate = 1;
+
+		if (t->recursion_count > 0) {
+			/*
+			 * A thread exited with a stopwatch running.
+			 *
+			 * NEEDSWORK: should we assert or throw a warning
+			 * for the open interval.  I'm going to ignore it
+			 * and keep going because we may have valid data
+			 * for previously closed intervals on this timer.
+			 */
+		}
+
+		if (!t->interval_count)
+			continue; /* this timer was not used by this thread. */
+
+		t_merged->total_us += t->total_us;
+
+		if (!t_merged->interval_count) {
+			t_merged->min_us = t->min_us;
+			t_merged->max_us = t->max_us;
+		} else {
+			if (t->min_us < t_merged->min_us)
+				t_merged->min_us = t->min_us;
+			if (t->max_us > t_merged->max_us)
+				t_merged->max_us = t->max_us;
+		}
+
+		t_merged->interval_count += t->interval_count;
+	}
+
+	merged->is_aggregate = 1;
+}
+
+void tr2tmr_emit_block(tr2_tgt_evt_timer_t *pfn, uint64_t us_elapsed_absolute,
+		       const struct tr2tmr_block *blk, const char *thread_name)
+{
+	enum trace2_timer_id tid;
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		const struct tr2tmr_timer *t = &blk->timer[tid];
+		const struct tr2tmr_def *d = &tr2tmr_def_block[tid];
+
+		if (!t->interval_count)
+			continue; /* timer was not used */
+
+		if (!d->want_thread_events && !t->is_aggregate)
+			continue; /* per-thread events not wanted */
+
+		pfn(us_elapsed_absolute, thread_name, d->category, d->name,
+		    t->interval_count, t->total_us, t->min_us, t->max_us);
+	}
+}
diff --git a/trace2/tr2_tmr.h b/trace2/tr2_tmr.h
new file mode 100644
index 00000000000..72f34f36d5f
--- /dev/null
+++ b/trace2/tr2_tmr.h
@@ -0,0 +1,120 @@
+#ifndef TR2_TM_H
+#define TR2_TM_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow "stopwatch" timers.
+ *
+ * Timers can be used to measure "interesting" activity that does not
+ * fit the "region" model, such as code called from many different
+ * regions (like zlib) and/or where data for individual calls are not
+ * interesting or are too numerous to be efficiently logged.
+ *
+ * Timer values are accumulated during program execution and emitted
+ * to the Trace2 logs at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * timers and timer ids.  This lets us avoid the complexities of
+ * dynamically allocating a timer on demand and sharing that
+ * definition with other threads.
+ *
+ * Timer values are stored in a fixed size "timer block" inside the
+ * TLS CTX.  This allows data to be collected on a thread-by-thread
+ * basis without locking.
+ *
+ * We define (at compile time) a set of "timer ids" to access the
+ * various timers inside the fixed size "timer block".
+ *
+ * Timer definitions include the Trace2 "category" and similar fields.
+ * This eliminates the need to include those args on the various timer
+ * APIs.
+ *
+ * Timer results are summarized and emitted by the main thread at
+ * program exit by iterating over the global list of CTX data.
+ */
+
+/*
+ * The definition of an individual timer and used by an individual
+ * thread.
+ */
+struct tr2tmr_timer {
+	/*
+	 * Total elapsed time for this timer in this thread.
+	 */
+	uint64_t total_us;
+
+	/*
+	 * The maximum and minimum interval values observed for this
+	 * timer in this thread.
+	 */
+	uint64_t min_us;
+	uint64_t max_us;
+
+	/*
+	 * The value of the clock when this timer was started in this
+	 * thread.  (Undefined when the timer is not active in this
+	 * thread.)
+	 */
+	uint64_t start_us;
+
+	/*
+	 * Number of times that this timer has been started and stopped
+	 * in this thread.  (Recursive starts are ignored.)
+	 */
+	size_t interval_count;
+
+	/*
+	 * Number of nested starts on the stack in this thread.  (We
+	 * ignore recursive starts and use this to track the recursive
+	 * calls.)
+	 */
+	unsigned int recursion_count;
+
+	/*
+	 * Has data from multiple threads been combined into this object.
+	 */
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * A compile-time fixed-size block of timers to insert into the TLS CTX.
+ *
+ * We use this simple wrapper around the array of timer instances to
+ * avoid C syntax quirks and the need to pass around an additional size_t
+ * argument.
+ */
+struct tr2tmr_block {
+	struct tr2tmr_timer timer[TRACE2_NUMBER_OF_TIMERS];
+
+	/*
+	 * Has data from multiple threads been combined into this block.
+	 */
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * Private routines used by trace2.c to actually start/stop an individual
+ * timer in the current thread.
+ */
+void tr2tmr_start(enum trace2_timer_id tid);
+void tr2tmr_stop(enum trace2_timer_id tid);
+
+/*
+ * Accumulate timer data from source block into the merged block.
+ */
+void tr2tmr_aggregate_timers(struct tr2tmr_block *merged,
+			     const struct tr2tmr_block *src);
+
+/*
+ * Send stopwatch data for all of the timers in this block to the
+ * target.
+ *
+ * This will generate an event record for each timer that had activity
+ * during the program's execution.
+ */
+void tr2tmr_emit_block(tr2_tgt_evt_timer_t *pfn, uint64_t us_elapsed_absolute,
+		       const struct tr2tmr_block *blk, const char *thread_name);
+
+#endif /* TR2_TM_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 8/9] trace2: add counter events to perf and event target formats
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-12-20 15:01 ` [PATCH 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 16:51   ` Ævar Arnfjörð Bjarmason
  2021-12-20 15:01 ` [PATCH 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach Trace2 "perf" and "event" formats to handle "counter" events
for global counters.  Update the API documentation accordingly.

In a future commit, global counters will be added to the Trace2 API
and it will emit these "counter" events at program exit.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 19 +++++++++++++++++++
 trace2/tr2_tgt.h                       | 14 ++++++++++++++
 trace2/tr2_tgt_event.c                 | 25 ++++++++++++++++++++++++-
 trace2/tr2_tgt_normal.c                |  1 +
 trace2/tr2_tgt_perf.c                  | 19 +++++++++++++++++++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 03a61332a2d..bb116dc85db 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -838,6 +838,25 @@ may exceed the "atexit" elapsed time of the program.
 Timer events may represent an individual thread or a summation across
 the entire program.  Summation events will have a unique thread name.
 
+`"counter"`::
+	This event is generated at the end of the program and contains
+	the value of a global counter.
++
+------------
+{
+	"event":"counter",
+	...
+	"name":"test",      # counter name
+	"value":42,         # value of the counter
+}
+------------
++
+A global counter can be incremented throughout the execution of the
+program.  It will be reported in a "counter" event just prior to exit.
++
+Counter events may represent an individual thread or a summation across
+the entire program.  Summation events will have a unique thread name.
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 1f548eb4b93..33a2bb99199 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -120,6 +120,19 @@ typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
 				  uint64_t us_min_time,
 				  uint64_t us_max_time);
 
+/*
+ * Item counter event.
+ *
+ * This also does not take a (file,line) pair.
+ *
+ * The thread name is optional.
+ */
+typedef void(tr2_tgt_evt_counter_t)(uint64_t us_elapsed_absolute,
+				    const char *thread_name,
+				    const char *category,
+				    const char *counter_name,
+				    uint64_t value);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -157,6 +170,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
 	tr2_tgt_evt_timer_t                     *pfn_timer;
+	tr2_tgt_evt_counter_t                   *pfn_counter;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 9b3905b920c..ca36d44dfd7 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -23,7 +23,7 @@ static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
  * Verison 1: original version
  * Version 2: added "too_many_files" event
  * Version 3: added "child_ready" event
- * Version 4: added "timer" event
+ * Version 4: added "timer" and "counter" events
  */
 #define TR2_EVENT_VERSION "4"
 
@@ -652,6 +652,28 @@ static void fn_timer(uint64_t us_elapsed_absolute,
 	jw_release(&jw);
 }
 
+static void fn_counter(uint64_t us_elapsed_absolute,
+		       const char *thread_name,
+		       const char *category,
+		       const char *counter_name,
+		       uint64_t value)
+{
+	const char *event_name = "counter";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_abs = (double)us_elapsed_absolute / 1000000.0;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	jw_object_double(&jw, "t_abs", 6, t_abs);
+	jw_object_string(&jw, "name", counter_name);
+	jw_object_intmax(&jw, "value", value);
+
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	&tr2dst_event,
 
@@ -684,4 +706,5 @@ struct tr2_tgt tr2_tgt_event = {
 	fn_data_json_fl,
 	NULL, /* printf */
 	fn_timer,
+	fn_counter,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 23a7e78dcaa..1778232f6e9 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -356,4 +356,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	NULL, /* data_json */
 	fn_printf_va_fl,
 	NULL, /* timer */
+	NULL, /* counter */
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 7597cb52ed5..eb4577ec40b 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -582,6 +582,24 @@ static void fn_timer(uint64_t us_elapsed_absolute,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(uint64_t us_elapsed_absolute,
+		       const char *thread_name,
+		       const char *category,
+		       const char *counter_name,
+		       uint64_t value)
+{
+	const char *event_name = "counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "name:%s", counter_name);
+	strbuf_addf(&buf_payload, " value:%"PRIu64, value);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL,
+			 category, &buf_payload, thread_name);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	&tr2dst_perf,
 
@@ -614,4 +632,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	fn_data_json_fl,
 	fn_printf_va_fl,
 	fn_timer,
+	fn_counter,
 };
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 9/9] trace2: add global counters
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-12-20 15:01 ` [PATCH 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
@ 2021-12-20 15:01 ` Jeff Hostetler via GitGitGadget
  2021-12-20 17:14   ` Ævar Arnfjörð Bjarmason
  2021-12-21 14:51 ` [PATCH 0/9] Trace2 stopwatch timers and " Derrick Stolee
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
  10 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-20 15:01 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add global counters to Trace2.

Create a mechanism in Trace2 to count an activity and emit a single
"counter" event at the end of the program.  This is an alternative
to the existing "data" events that are emitted immediately.

Create an array of counters (indexed by `enum trace2_counter_id`)
to allow various activites to be tracked as desired.

Preload the array with two counters for testing purposes.

Create unit tests to demonstrate and verify.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 67 ++++++++++++++++++++
 Makefile                               |  1 +
 t/helper/test-trace2.c                 | 86 ++++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 | 24 +++++++
 t/t0212-trace2-event.sh                | 34 ++++++++++
 trace2.c                               | 52 ++++++++++++++++
 trace2.h                               | 33 ++++++++++
 trace2/tr2_ctr.c                       | 65 +++++++++++++++++++
 trace2/tr2_ctr.h                       | 75 ++++++++++++++++++++++
 trace2/tr2_tls.c                       | 29 +++++++++
 trace2/tr2_tls.h                       | 16 +++++
 11 files changed, 482 insertions(+)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index bb116dc85db..14e6e50a2d6 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -1297,6 +1297,73 @@ d0 | main                     | atexit       |     |  0.111026 |           |
 In this example, the "test1" timer was started 4 times and ran for
 0.000393 seconds.
 
+Counter Events::
+
+	Trace2 also provides global counters as an alternative to regions
+	and data events.  These make it possible to count an activity of
+	interest, such a call to a library routine, during the program
+	and get a single counter event at the end.
++
+At the end of the program, a single summary event is emitted; this
+value is aggregated across all threads.  These events have "summary"
+as their thread name.
++
+For some counters, individual (per-thread) counter events are also
+generated.  This may be helpful in understanding how work is balanced
+between threads in some circumstances.
++
+----------------
+static void *load_cache_entries_thread(void *_data)
+{
+	struct load_cache_entries_thread_data *p = _data;
+	int i;
+
+	trace2_thread_start("load_cache_entries");
+	...
+	trace2_thread_exit();
+}
+
+static unsigned long load_cache_entry_block(struct index_state *istate,
+			struct mem_pool *ce_mem_pool, int offset, int nr, const char *mmap,
+			unsigned long start_offset, const struct cache_entry *previous_ce)
+{
+	int i;
+	unsigned long src_offset = start_offset;
+
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, nr);
+
+	for (i = offset; i < offset + nr; i++) {
+		...
+	}
+}
+----------------
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+...
+d0 | main                     | exit         |     | 53.977680 |           |              | code:0
+d0 | th12:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193220
+d0 | th11:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th10:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th09:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th08:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th07:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th06:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th05:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th04:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th03:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th02:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | summary                  | counter      |     | 53.977708 |           | test         | name:test2 value:2125430
+d0 | main                     | atexit       |     | 53.977708 |           |              | code:0
+----------------
++
+This example shows the value computed by each of the 11
+`load_cache_entries` threads and the total across all threads.
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index 8b657f0162a..cc5bd8593f1 100644
--- a/Makefile
+++ b/Makefile
@@ -1020,6 +1020,7 @@ LIB_OBJS += trace.o
 LIB_OBJS += trace2.o
 LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
+LIB_OBJS += trace2/tr2_ctr.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
 LIB_OBJS += trace2/tr2_tmr.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index e98db5ba4c1..b64264cfed4 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -302,6 +302,90 @@ static int ut_009timer(int argc, const char **argv)
 	return 0;
 }
 
+/*
+ * Single-threaded counter test.  Add several values to the TEST1 counter.
+ * The test script can verify that an aggregate Trace2 "counter" event is
+ * emitted containing the sum of the values provided.
+ */
+static int ut_010counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> [<v2> [...]]";
+	int value;
+	int k;
+
+	if (argc < 1)
+		die("%s", usage_error);
+
+	for (k = 0; k < argc; k++) {
+		if (get_i(&value, argv[k]))
+			die("invalid value[%s] -- %s",
+			    argv[k], usage_error);
+		trace2_counter_add(TRACE2_COUNTER_ID_TEST1, value);
+	}
+
+	return 0;
+}
+
+struct ut_011_data {
+	int v1, v2;
+};
+
+static void *ut_011counter_thread_proc(void *_ut_011_data)
+{
+	struct ut_011_data *data = _ut_011_data;
+
+	trace2_thread_start("ut_011");
+
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v1);
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v2);
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+/*
+ * Multi-threaded counter test.  Create several threads that each
+ * increment the TEST2 global counter.  The test script can verify
+ * that an individual Trace2 "counter" event for each thread and an
+ * aggregate "counter" event are generated.
+ */
+static int ut_011counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> <v2> <threads>";
+
+	struct ut_011_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.v1, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.v2, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_011counter_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -321,6 +405,8 @@ static struct unit_test ut_table[] = {
 	{ ut_007bug,      "007bug",    "" },
 	{ ut_008timer,    "008timer",  "<count> <ms_delay>" },
 	{ ut_009timer,    "009timer",  "<count> <ms_delay> <threads>" },
+	{ ut_010counter,  "010counter","<v1> [<v2> [<v3> [...]]]" },
+	{ ut_011counter,  "011counter","<v1> <v2> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 5c99d734ea2..498cbb7316b 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -198,4 +198,28 @@ test_expect_success 'test stopwatch timers - summary and threads' '
 	grep "d0|summary|timer||_T_ABS_||test|name:test2 count:15" actual
 '
 
+# Exercise the global counter "test" in a loop and confirm that we get an
+# event with the sum.
+#
+test_expect_success 'test global counters - global, single-thead' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 010counter 2 3 5 7 11 13  &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+	grep "d0|summary|counter||_T_ABS_||test|name:test1 value:41" actual
+'
+
+test_expect_success 'test global counters - global+threads' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 011counter 5 10 3 &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+	grep "d0|th01:ut_011|counter||_T_ABS_||test|name:test2 value:15" actual &&
+	grep "d0|th02:ut_011|counter||_T_ABS_||test|name:test2 value:15" actual &&
+	grep "d0|th02:ut_011|counter||_T_ABS_||test|name:test2 value:15" actual &&
+	grep "d0|summary|counter||_T_ABS_||test|name:test2 value:45" actual
+'
+
 test_done
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index 462c001deca..66a73243585 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -358,4 +358,38 @@ test_expect_success 'test stopwatch timers - global+threads' '
 	have_timer_event "summary" "test2" 15 trace.event
 '
 
+# Exercise the global counter in a loop and confirm that we get the
+# expected sum in an event record.
+#
+
+have_counter_event () {
+	thread=$1
+	name=$2
+	value=$3
+	file=$4
+
+	grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file
+
+	return $?
+}
+
+test_expect_success 'test global counter - global, single-thread' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 010counter 2 3 5 7 11 13 &&
+	have_counter_event "summary" "test1" 41 trace.event
+'
+
+test_expect_success 'test global counter - global+threads' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 011counter 5 10 3 &&
+	have_counter_event "th01:ut_011" "test2" 15 trace.event &&
+	have_counter_event "th02:ut_011" "test2" 15 trace.event &&
+	have_counter_event "th03:ut_011" "test2" 15 trace.event &&
+	have_counter_event "summary" "test2" 45 trace.event
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index c073ffa836f..4c94c5cca68 100644
--- a/trace2.c
+++ b/trace2.c
@@ -8,6 +8,7 @@
 #include "version.h"
 #include "trace2/tr2_cfg.h"
 #include "trace2/tr2_cmd_name.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
 #include "trace2/tr2_sysenv.h"
@@ -120,6 +121,43 @@ static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
 							   us_elapsed_absolute);
 }
 
+static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+	struct tr2ctr_block merged;
+
+	memset(&merged, 0, sizeof(merged));
+
+	/*
+	 * Sum across all of the per-thread counter data into
+	 * a single composite block of counter values.
+	 */
+	tr2tls_aggregate_counter_blocks(&merged);
+
+	/*
+	 * Emit "summary" counter events for each composite counter value
+	 * that had activity.
+	 */
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tr2ctr_emit_block(tgt_j->pfn_counter,
+					  us_elapsed_absolute,
+					  &merged, "summary");
+}
+
+static void tr2main_emit_thread_counters(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tr2tls_emit_counter_blocks_by_thread(
+				tgt_j->pfn_counter,
+				us_elapsed_absolute);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -150,6 +188,9 @@ static void tr2main_atexit_handler(void)
 	tr2main_emit_thread_timers(us_elapsed_absolute);
 	tr2main_emit_summary_timers(us_elapsed_absolute);
 
+	tr2main_emit_thread_counters(us_elapsed_absolute);
+	tr2main_emit_summary_counters(us_elapsed_absolute);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -903,3 +944,14 @@ void trace2_timer_stop(enum trace2_timer_id tid)
 
 	tr2tmr_stop(tid);
 }
+
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS)
+		BUG("invalid counter id: %d", cid);
+
+	tr2ctr_add(cid, value);
+}
diff --git a/trace2.h b/trace2.h
index 32e2eaca7c8..80c781f5a94 100644
--- a/trace2.h
+++ b/trace2.h
@@ -52,6 +52,7 @@ struct json_writer;
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
  * [] trace2_timer*     -- start/stop stopwatch timer (messages are deferred).
+ * [] trace2_counter*   -- global counters (messages are deferrred).
  */
 
 /*
@@ -573,4 +574,36 @@ enum trace2_timer_id {
 void trace2_timer_start(enum trace2_timer_id tid);
 void trace2_timer_stop(enum trace2_timer_id tid);
 
+/*
+ * Define the set of global counters.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we them elsewhere
+ * as array indexes).
+ *
+ * Any value added to this enum must also be added to the counter
+ * definitions array.  See `trace2/tr2_ctr.c:tr2ctr_def_block[]`.
+ */
+enum trace2_counter_id {
+	/*
+	 * Define two counters for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
+
+
+	/* Add additional counter definitions before here. */
+	TRACE2_NUMBER_OF_COUNTERS
+};
+
+/*
+ * Increment global counter by value.
+ */
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value);
+
 #endif /* TRACE2_H */
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
new file mode 100644
index 00000000000..bfc27005dca
--- /dev/null
+++ b/trace2/tr2_ctr.c
@@ -0,0 +1,65 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_ctr.h"
+
+/*
+ * Define metadata for each global counter.  This list must match the
+ * set defined in "enum trace2_counter_id".
+ */
+struct tr2ctr_def {
+	const char *category;
+	const char *name;
+
+	unsigned int want_thread_events:1;
+};
+
+static struct tr2ctr_def tr2ctr_def_block[TRACE2_NUMBER_OF_COUNTERS] = {
+	[TRACE2_COUNTER_ID_TEST1] = { "test", "test1", 0 },
+	[TRACE2_COUNTER_ID_TEST2] = { "test", "test2", 1 },
+};
+
+void tr2ctr_add(enum trace2_counter_id cid, uint64_t value)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2ctr_counter *c = &ctx->counters.counter[cid];
+
+	c->value += value;
+}
+
+void tr2ctr_aggregate_counters(struct tr2ctr_block *merged,
+			       const struct tr2ctr_block *src)
+{
+	enum trace2_counter_id cid;
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		struct tr2ctr_counter *c_merged = &merged->counter[cid];
+		const struct tr2ctr_counter *c = &src->counter[cid];
+
+		c_merged->is_aggregate = 1;
+
+		c_merged->value += c->value;
+	}
+
+	merged->is_aggregate = 1;
+}
+
+void tr2ctr_emit_block(tr2_tgt_evt_counter_t *pfn, uint64_t us_elapsed_absolute,
+		       const struct tr2ctr_block *blk, const char *thread_name)
+{
+	enum trace2_counter_id cid;
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		const struct tr2ctr_counter *c = &blk->counter[cid];
+		const struct tr2ctr_def *d = &tr2ctr_def_block[cid];
+
+		if (!c->value)
+			continue; /* counter was not used */
+
+		if (!d->want_thread_events && !c->is_aggregate)
+			continue; /* per-thread events not wanted */
+
+		pfn(us_elapsed_absolute, thread_name, d->category, d->name,
+		    c->value);
+	}
+}
diff --git a/trace2/tr2_ctr.h b/trace2/tr2_ctr.h
new file mode 100644
index 00000000000..9a805062069
--- /dev/null
+++ b/trace2/tr2_ctr.h
@@ -0,0 +1,75 @@
+#ifndef TR2_CTR_H
+#define TR2_CTR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow global "counters".
+ *
+ * Counters can be used count interesting activity that does not fit
+ * the "region and data" model, such as code called from many
+ * different regions and/or where you want to count a number of items,
+ * but don't have control of when the last item will be processed,
+ * such as counter the number of calls to `lstat()`.
+ *
+ * Counters differ from Trace2 "data" events.  Data events are emitted
+ * immediately and are appropriate for documenting loop counters and
+ * etc.  Counter values are accumulated during the program and the final
+ * counter value event is emitted at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set
+ * of counters and counter ids.  This lets us avoid the complexities
+ * of dynamically allocating a counter and sharing that definition
+ * with other threads.
+ *
+ * We define (at compile time) a set of "counter ids" to access the
+ * various counters inside of a fixed size "counter block".
+ *
+ * A counter defintion table provides the counter category and name
+ * so we can eliminate those arguments from the public counter API.
+ *
+ * Each active thread maintains a counter block in its TLS CTX and
+ * increments them without locking.  At program exit, the counter
+ * blocks from all of the individual CTXs are added together to give
+ * the final summary value for the each global counter.
+ */
+
+/*
+ * The definition of an individual counter.
+ */
+struct tr2ctr_counter {
+	uint64_t value;
+
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * Compile time fixed block of all defined counters.
+ */
+struct tr2ctr_block {
+	struct tr2ctr_counter counter[TRACE2_NUMBER_OF_COUNTERS];
+
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * Add "value" to the global counter.
+ */
+void tr2ctr_add(enum trace2_counter_id cid, uint64_t value);
+
+/*
+ * Accumulate counter data from the source block into the merged block.
+ */
+void tr2ctr_aggregate_counters(struct tr2ctr_block *merged,
+			       const struct tr2ctr_block *src);
+
+/*
+ * Send counter data for all counters in this block to the target.
+ *
+ * This will generate an event record for each counter that had activity.
+ */
+void tr2ctr_emit_block(tr2_tgt_evt_counter_t *pfn, uint64_t us_elapsed_absolute,
+		       const struct tr2ctr_block *blk, const char *thread_name);
+
+#endif /* TR2_CTR_H */
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 068938d334e..ff795d104e6 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "thread-utils.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tls.h"
 #include "trace2/tr2_tmr.h"
 
@@ -228,3 +229,31 @@ void tr2tls_emit_timer_blocks_by_thread(tr2_tgt_evt_timer_t *pfn,
 		ctx = next;
 	}
 }
+
+void tr2tls_aggregate_counter_blocks(struct tr2ctr_block *merged)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2ctr_aggregate_counters(merged, &ctx->counters);
+
+		ctx = next;
+	}
+}
+
+void tr2tls_emit_counter_blocks_by_thread(tr2_tgt_evt_counter_t *pfn,
+					  uint64_t us_elapsed_absolute)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2ctr_emit_block(pfn, us_elapsed_absolute, &ctx->counters,
+				  ctx->thread_name);
+
+		ctx = next;
+	}
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 10669f0d7b9..032b90fa46b 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tmr.h"
 
 /*
@@ -17,9 +18,24 @@ struct tr2tls_thread_ctx {
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 
+	struct tr2ctr_block counters;
 	struct tr2tmr_block timers;
 };
 
+/*
+ * Iterate over the global list of TLS CTX data and aggregate the
+ * counter data into the given counter block.
+ */
+void tr2tls_aggregate_counter_blocks(struct tr2ctr_block *merged);
+
+/*
+ * Iterate over the global list of TLS CTX data (the complete set of
+ * threads that have used Trace2 resources) data and emit "per-thread"
+ * counter data for each.
+ */
+void tr2tls_emit_counter_blocks_by_thread(tr2_tgt_evt_counter_t *pfn,
+					  uint64_t us_elapsed_absolute);
+
 /*
  * Iterate over the global list of TLS CTX data and aggregate the timer
  * data into the given timer block.
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-20 15:01 ` [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char* Jeff Hostetler via GitGitGadget
@ 2021-12-20 16:31   ` Ævar Arnfjörð Bjarmason
  2021-12-20 19:07     ` Jeff Hostetler
  2021-12-21  7:33     ` Junio C Hamano
  2021-12-21  7:22   ` Junio C Hamano
  1 sibling, 2 replies; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-20 16:31 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
> The thread name is set when the thread is created and should not be
> be modified afterwards.  Replace the strbuf with an allocated pointer
> to make that more clear.
>
> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tgt_event.c |  2 +-
>  trace2/tr2_tgt_perf.c  |  2 +-
>  trace2/tr2_tls.c       | 16 +++++++++-------
>  trace2/tr2_tls.h       |  2 +-
>  4 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
> index 3a0014417cc..ca48d00aebc 100644
> --- a/trace2/tr2_tgt_event.c
> +++ b/trace2/tr2_tgt_event.c
> @@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
>  
>  	jw_object_string(jw, "event", event_name);
>  	jw_object_string(jw, "sid", tr2_sid_get());
> -	jw_object_string(jw, "thread", ctx->thread_name.buf);
> +	jw_object_string(jw, "thread", ctx->thread_name);
>  
>  	/*
>  	 * In brief mode, only emit <time> on these 2 event types.
> diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
> index e4acca13d64..c3e57fcb3c0 100644
> --- a/trace2/tr2_tgt_perf.c
> +++ b/trace2/tr2_tgt_perf.c
> @@ -106,7 +106,7 @@ static void perf_fmt_prepare(const char *event_name,
>  
>  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>  	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
> +		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
>  		    event_name);
>  
>  	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 7da94aba522..cd8b9f2f0a0 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  					     uint64_t us_thread_start)
>  {
>  	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> +	struct strbuf buf_name = STRBUF_INIT;
>  
>  	/*
>  	 * Implicitly "tr2tls_push_self()" to capture the thread's start
> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  
>  	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>  
> -	strbuf_init(&ctx->thread_name, 0);
>  	if (ctx->thread_id)
> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
> -	strbuf_addstr(&ctx->thread_name, thread_name);
> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
> +	strbuf_addstr(&buf_name, thread_name);
> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
> +
> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>  
>  	pthread_setspecific(tr2tls_key, ctx);
>  
> @@ -95,7 +97,7 @@ void tr2tls_unset_self(void)
>  
>  	pthread_setspecific(tr2tls_key, NULL);
>  
> -	strbuf_release(&ctx->thread_name);
> +	free(ctx->thread_name);
>  	free(ctx->array_us_start);
>  	free(ctx);
>  }
> @@ -113,7 +115,7 @@ void tr2tls_pop_self(void)
>  	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
>  
>  	if (!ctx->nr_open_regions)
> -		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
> +		BUG("no open regions in thread '%s'", ctx->thread_name);
>  
>  	ctx->nr_open_regions--;
>  }
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index a90bd639d48..d968da6a679 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -9,7 +9,7 @@
>  #define TR2_MAX_THREAD_NAME (24)
>  
>  struct tr2tls_thread_ctx {
> -	struct strbuf thread_name;
> +	char *thread_name;
>  	uint64_t *array_us_start;
>  	size_t alloc;
>  	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */

Junio's suggestion in the linked E-Mail was to make this a "const char *".

Narrowly, I don't see why not just add a "const" to the "struct strbuf
*" instead.

But less narrowly if we're not going to change it why malloc a new one
at all? Can't we just use the "const char *" passed into
tr2tls_create_self(), and for the "th%02d:" case have the code that's
formatting it handle that case?

I.e. have the things that use it as a "%s" now call a function that
formats things as a function of the "ctx->thread_id" (which may be 0)
and limit it by TR2_MAX_THREAD_NAME?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] trace2: add timer events to perf and event target formats
  2021-12-20 15:01 ` [PATCH 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
@ 2021-12-20 16:39   ` Ævar Arnfjörð Bjarmason
  2021-12-20 19:44     ` Jeff Hostetler
  2021-12-21 14:20   ` Derrick Stolee
  1 sibling, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-20 16:39 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach Trace2 "perf" and "event" formats to handle "timer" events for
> stopwatch timers.  Update API documentation accordingly.
>
> In a future commit, stopwatch timers will be added to the Trace2 API
> and it will emit these "timer" events.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  Documentation/technical/api-trace2.txt | 25 +++++++++++++++-
>  trace2/tr2_tgt.h                       | 25 ++++++++++++++++
>  trace2/tr2_tgt_event.c                 | 40 +++++++++++++++++++++++++-
>  trace2/tr2_tgt_normal.c                |  1 +
>  trace2/tr2_tgt_perf.c                  | 29 +++++++++++++++++++
>  5 files changed, 118 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
> index bb13ca3db8b..e6ed94ba814 100644
> --- a/Documentation/technical/api-trace2.txt
> +++ b/Documentation/technical/api-trace2.txt
> @@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
>  {
>  	"event":"version",
>  	...
> -	"evt":"3",		       # EVENT format version
> +	"evt":"4",		       # EVENT format version
>  	"exe":"2.20.1.155.g426c96fcdb" # git version
>  }

FWIW this seems like a time not to bump the version per the proposed
approach in:
https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/

Not directly related to this series, which just preserves the status
quo, but it would be nice to get feedback on that proposal from you.

> [...]
> + * Verison 1: original version

A typo of "Version".

> + * Version 2: added "too_many_files" event
> + * Version 3: added "child_ready" event
> + * Version 4: added "timer" event
>   */
> -#define TR2_EVENT_VERSION "3"
> +#define TR2_EVENT_VERSION "4"
>  
>  /*
>   * Region nesting limit for messages written to the event target.
> @@ -615,6 +620,38 @@ static void fn_data_json_fl(const char *file, int line,
>  	}
>  }
>  
> +static void fn_timer(uint64_t us_elapsed_absolute,
> +		     const char *thread_name,
> +		     const char *category,
> +		     const char *timer_name,
> +		     uint64_t interval_count,
> +		     uint64_t us_total_time,
> +		     uint64_t us_min_time,
> +		     uint64_t us_max_time)
> +{
> +	const char *event_name = "timer";
> +	struct json_writer jw = JSON_WRITER_INIT;
> +	double t_abs = (double)us_elapsed_absolute / 1000000.0;
> +

nit: Odd placement of \n\n

> +	double t_total = (double)us_total_time / 1000000.0;
> +	double t_min   = (double)us_min_time   / 1000000.0;
> +	double t_max   = (double)us_max_time   / 1000000.0;

Both for this...

> +	jw_object_begin(&jw, 0);
> +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
> +	jw_object_double(&jw, "t_abs", 6, t_abs);
> +	jw_object_string(&jw, "name", timer_name);
> +	jw_object_intmax(&jw, "count", interval_count);
> +	jw_object_double(&jw, "t_total", 6, t_total);
> +	jw_object_double(&jw, "t_min", 6, t_min);
> +	jw_object_double(&jw, "t_max", 6, t_max);

[...]

> +static void fn_timer(uint64_t us_elapsed_absolute,
> +		     const char *thread_name,
> +		     const char *category,
> +		     const char *timer_name,
> +		     uint64_t interval_count,
> +		     uint64_t us_total_time,
> +		     uint64_t us_min_time,
> +		     uint64_t us_max_time)
> +{
> +	const char *event_name = "timer";
> +	struct strbuf buf_payload = STRBUF_INIT;
> +
> +	double t_total = (double)us_total_time / 1000000.0;
> +	double t_min   = (double)us_min_time   / 1000000.0;
> +	double t_max   = (double)us_max_time   / 1000000.0;
> +
> +	strbuf_addf(&buf_payload, "name:%s", timer_name);
> +	strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count);
> +	strbuf_addf(&buf_payload, " total:%9.6f", t_total);
> +	strbuf_addf(&buf_payload, " min:%9.6f", t_min);
> +	strbuf_addf(&buf_payload, " max:%9.6f", t_max);

...and this, wouldn't it be better/more readable to retain the uint64_t
for the math, and just cast if needed when we're doing the formatting?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] trace2: add stopwatch timers
  2021-12-20 15:01 ` [PATCH 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2021-12-20 16:42   ` Ævar Arnfjörð Bjarmason
  2021-12-22 21:38     ` Jeff Hostetler
  2021-12-21 14:45   ` Derrick Stolee
  1 sibling, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-20 16:42 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> [...]
> +static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +	struct tr2tmr_block merged;
> +
> +	memset(&merged, 0, sizeof(merged));

Nit: just do a " = { 0 }" assignment above instead.

> +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
> +		BUG("invalid timer id: %d", tid);
> +
> +	tr2tmr_start(tid);
> +}
> +
> +void trace2_timer_stop(enum trace2_timer_id tid)
> +{
> +	if (!trace2_enabled)
> +		return;
> +
> +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
> +		BUG("invalid timer id: %d", tid);

nit / style: maybe assert() instead for cases where assert() produces
better info than BUG(). I.e. it would quote the whole expression, and
show you what condition it violated....

> +void tr2tmr_stop(enum trace2_timer_id tid)
> +{
> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
> +	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
> +	uint64_t us_now;
> +	uint64_t us_interval;
> +
> +	assert(t->recursion_count > 0);

...as you opted to do here.

> +
> +	t->recursion_count--;
> +	if (t->recursion_count > 0)
> +		return; /* still in recursive call */
> +
> +	us_now = getnanotime() / 1000;
> +	us_interval = us_now - t->start_us;
> +
> +	t->total_us += us_interval;
> +
> +	if (!t->interval_count) {
> +		t->min_us = us_interval;
> +		t->max_us = us_interval;
> +	} else {
> +		if (us_interval < t->min_us)
> +			t->min_us = us_interval;
> +		if (us_interval > t->max_us)
> +			t->max_us = us_interval;
> +	}

Perhaps more readable/easily understood as just a (untested):

    if (!t->interval_count || us_interval >= t->min_us)
	    t->min_us = us_interval;
    if (!t->interval_count || us_interval >= t->max_us)
	    t->max_us = us_interval;

I.e. to avoid duplicating the identical assignment...

> [...]
> +		if (!t->interval_count)
> +			continue; /* this timer was not used by this thread. */
> +
> +		t_merged->total_us += t->total_us;
> +
> +		if (!t_merged->interval_count) {
> +			t_merged->min_us = t->min_us;
> +			t_merged->max_us = t->max_us;
> +		} else {
> +			if (t->min_us < t_merged->min_us)
> +				t_merged->min_us = t->min_us;
> +			if (t->max_us > t_merged->max_us)
> +				t_merged->max_us = t->max_us;
> +		}

...ditto, maybe since it's used at least twice factor it out to some
trivial "static" helper here (maybe not worth it..)>

> +	/*
> +	 * Number of nested starts on the stack in this thread.  (We
> +	 * ignore recursive starts and use this to track the recursive
> +	 * calls.)
> +	 */
> +	unsigned int recursion_count;

Earlier we have various forms of:

    if (t->recursion_count > 1)

But since it's unsigned can we just make those a:

    if (t->recursion_count)

?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 8/9] trace2: add counter events to perf and event target formats
  2021-12-20 15:01 ` [PATCH 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
@ 2021-12-20 16:51   ` Ævar Arnfjörð Bjarmason
  2021-12-22 22:56     ` Jeff Hostetler
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-20 16:51 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> [...]
> diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
> index 9b3905b920c..ca36d44dfd7 100644
> --- a/trace2/tr2_tgt_event.c
> +++ b/trace2/tr2_tgt_event.c
> @@ -23,7 +23,7 @@ static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
>   * Verison 1: original version
>   * Version 2: added "too_many_files" event
>   * Version 3: added "child_ready" event
> - * Version 4: added "timer" event
> + * Version 4: added "timer" and "counter" events
>   */
>  #define TR2_EVENT_VERSION "4"
>  

Nit on series structure: Earlier we bumped the version to 4, but here
we're changing existing version 4 behavior. Would be better IMO just
bump it at the end (if at all needed, per:
https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/)

> +static void fn_counter(uint64_t us_elapsed_absolute,
> +		       const char *thread_name,
> +		       const char *category,
> +		       const char *counter_name,
> +		       uint64_t value)
> +{
> +	const char *event_name = "counter";
> +	struct strbuf buf_payload = STRBUF_INIT;
> +
> +	strbuf_addf(&buf_payload, "name:%s", counter_name);
> +	strbuf_addf(&buf_payload, " value:%"PRIu64, value);

Odd to have these be two seperate strbuf_addf()...
> +
> +	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
> +			 &us_elapsed_absolute, NULL,
> +			 category, &buf_payload, thread_name);
> +	strbuf_release(&buf_payload);

...but more generally, and I see from e.g. the existing fn_version_fl
that you're just using existing patterns, but it seems odd not to have a
trivial varargs fmt helper for perf_io_write_fl that would avoid the
whole strbuf/addf/release dance.

I did a quick experiment to do that, patch on "master" below. A lot of
the boilerplate could be simplified by factoring out the
sq_quote_buf_pretty() case, and even this approach (re)allocs in a way
that looks avoidable in many cases if perf_fmt_prepare() were improved
(but it looks like it nedes its if/while loops in some cases still):

diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 2ff9cf70835..bcbb0d8a250 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -153,16 +153,33 @@ static void perf_io_write_fl(const char *file, int line, const char *event_name,
 	strbuf_release(&buf_line);
 }
 
+__attribute__((format (printf, 8, 9)))
+static void perf_io_write_fl_fmt(const char *file, int line, const char *event_name,
+				 const struct repository *repo,
+				 uint64_t *p_us_elapsed_absolute,
+				 uint64_t *p_us_elapsed_relative,
+				 const char *category,
+				 const char *fmt, ...)
+{
+	va_list ap;
+	struct strbuf sb = STRBUF_INIT;
+
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+
+	perf_io_write_fl(file, line, event_name, repo, p_us_elapsed_absolute,
+			 p_us_elapsed_relative, category, &sb);
+
+	strbuf_release(&sb);
+}
+
 static void fn_version_fl(const char *file, int line)
 {
 	const char *event_name = "version";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addstr(&buf_payload, git_version_string);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s", git_version_string);
 }
 
 static void fn_start_fl(const char *file, int line,
@@ -182,37 +199,25 @@ static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 		       int code)
 {
 	const char *event_name = "exit";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	strbuf_addf(&buf_payload, "code:%d", code);
-
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     NULL, NULL, "code:%d", code);
 }
 
 static void fn_signal(uint64_t us_elapsed_absolute, int signo)
 {
 	const char *event_name = "signal";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	strbuf_addf(&buf_payload, "signo:%d", signo);
-
-	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL, NULL, "signo:%d", signo);
 }
 
 static void fn_atexit(uint64_t us_elapsed_absolute, int code)
 {
 	const char *event_name = "atexit";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "code:%d", code);
 
-	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL, NULL, "code:%d", code);
 }
 
 static void maybe_append_string_va(struct strbuf *buf, const char *fmt,
@@ -244,13 +249,9 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt,
 static void fn_command_path_fl(const char *file, int line, const char *pathname)
 {
 	const char *event_name = "cmd_path";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addstr(&buf_payload, pathname);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s", pathname);
 }
 
 static void fn_command_ancestry_fl(const char *file, int line, const char **parent_names)
@@ -286,13 +287,9 @@ static void fn_command_name_fl(const char *file, int line, const char *name,
 static void fn_command_mode_fl(const char *file, int line, const char *mode)
 {
 	const char *event_name = "cmd_mode";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addstr(&buf_payload, mode);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s", mode);
 }
 
 static void fn_alias_fl(const char *file, int line, const char *alias,
@@ -351,13 +348,10 @@ static void fn_child_exit_fl(const char *file, int line,
 			     int code, uint64_t us_elapsed_child)
 {
 	const char *event_name = "child_exit";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	strbuf_addf(&buf_payload, "[ch%d] pid:%d code:%d", cid, pid, code);
-
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     &us_elapsed_child, NULL, "[ch%d] pid:%d code:%d",
+			     cid, pid, code);
 }
 
 static void fn_child_ready_fl(const char *file, int line,
@@ -365,24 +359,19 @@ static void fn_child_ready_fl(const char *file, int line,
 			      const char *ready, uint64_t us_elapsed_child)
 {
 	const char *event_name = "child_ready";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "[ch%d] pid:%d ready:%s", cid, pid, ready);
 
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     &us_elapsed_child, NULL,
+			     "[ch%d] pid:%d ready:%s", cid, pid, ready);
 }
 
 static void fn_thread_start_fl(const char *file, int line,
 			       uint64_t us_elapsed_absolute)
 {
 	const char *event_name = "thread_start";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     NULL, NULL, "%s", ""); /* TODO: No payload, support NULL? */
 }
 
 static void fn_thread_exit_fl(const char *file, int line,
@@ -390,11 +379,9 @@ static void fn_thread_exit_fl(const char *file, int line,
 			      uint64_t us_elapsed_thread)
 {
 	const char *event_name = "thread_exit";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_thread, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     &us_elapsed_thread, NULL, "%s", ""); /* TODO: No payload, support NULL ? */
 }
 
 static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute,
@@ -438,13 +425,9 @@ static void fn_param_fl(const char *file, int line, const char *param,
 			const char *value)
 {
 	const char *event_name = "def_param";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "%s:%s", param, value);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s:%s", param, value);
 }
 
 static void fn_repo_fl(const char *file, int line,
@@ -525,13 +508,10 @@ static void fn_data_json_fl(const char *file, int line,
 			    const struct json_writer *value)
 {
 	const char *event_name = "data_json";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "%s:%s", key, value->json.buf);
 
-	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, repo, &us_elapsed_absolute,
+			     &us_elapsed_region, category,
+			     "%s:%s", key, value->json.buf);
 }
 
 static void fn_printf_va_fl(const char *file, int line,

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 9/9] trace2: add global counters
  2021-12-20 15:01 ` [PATCH 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
@ 2021-12-20 17:14   ` Ævar Arnfjörð Bjarmason
  2021-12-22 22:18     ` Jeff Hostetler
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-20 17:14 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> [...]
> +struct ut_011_data {
> +	int v1, v2;
> +};
> [...]
> +	struct ut_011_data data = { 0, 0 };

Nit: Just "{ 0 }" is OK for zero'd out initialization. No need to keep
extending this for every field.

For things that want to exhaustively list fields for clarity, designated
would be preferred:
o
    { .v1 = 0, .v2 = 0 }

> +	int nr_threads = 0;
> +	int k;
> +	pthread_t *pids = NULL;
> +
> +	if (argc != 3)
> +		die("%s", usage_error);
> +	if (get_i(&data.v1, argv[0]))
> +		die("%s", usage_error);
> +	if (get_i(&data.v2, argv[1]))
> +		die("%s", usage_error);
> +	if (get_i(&nr_threads, argv[2]))
> +		die("%s", usage_error);

A partial nit on existing code, as this just extends the pattern, but
couldn't much of this get_i() etc. just be made redundant by simply
using the parse-options.c API here?  I.e. OPTION_INTEGER and using named
arguments would do the validation or you.

> +# Exercise the global counter in a loop and confirm that we get the
> +# expected sum in an event record.
> +#
> +
> +have_counter_event () {
> +	thread=$1
> +	name=$2
> +	value=$3
> +	file=$4
> +
> +	grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file
> +
> +	return $?
> +}

It looks like there's no helper, but this is the Nth thing I see where
wish our "test_region" helper were just a bit more generalized. I.e.:

    test_trace2 --match=counter --match=thread=$thread --match=name=$name --match=value=$value <trace>

With test_region just being a wrapper for something like:

    test_trace2 --match=region_enter --match=category=$category --match=label=$label <trace> &&
    test_trace2 --match=region_leave --match=category=$category --match=label=$label <trace>

> +static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +	struct tr2ctr_block merged;
> +
> +	memset(&merged, 0, sizeof(merged));

nit: more memset v.s. "{ 0 }".

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-20 16:31   ` Ævar Arnfjörð Bjarmason
@ 2021-12-20 19:07     ` Jeff Hostetler
  2021-12-20 19:35       ` Ævar Arnfjörð Bjarmason
  2021-12-21  7:33     ` Junio C Hamano
  1 sibling, 1 reply; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-20 19:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
>> The thread name is set when the thread is created and should not be
>> be modified afterwards.  Replace the strbuf with an allocated pointer
>> to make that more clear.
>>
>> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/
 >>...
>> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
>> index 7da94aba522..cd8b9f2f0a0 100644
>> --- a/trace2/tr2_tls.c
>> +++ b/trace2/tr2_tls.c
>> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   					     uint64_t us_thread_start)
>>   {
>>   	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>> +	struct strbuf buf_name = STRBUF_INIT;
>>   
>>   	/*
>>   	 * Implicitly "tr2tls_push_self()" to capture the thread's start
>> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   
>>   	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>>   
>> -	strbuf_init(&ctx->thread_name, 0);
>>   	if (ctx->thread_id)
>> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
>> -	strbuf_addstr(&ctx->thread_name, thread_name);
>> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
>> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
>> +	strbuf_addstr(&buf_name, thread_name);
>> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
>> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
>> +
>> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>>   
>>   	pthread_setspecific(tr2tls_key, ctx);
>>   
 >>..
>> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
>> index a90bd639d48..d968da6a679 100644
>> --- a/trace2/tr2_tls.h
>> +++ b/trace2/tr2_tls.h
>> @@ -9,7 +9,7 @@
>>   #define TR2_MAX_THREAD_NAME (24)
>>   
>>   struct tr2tls_thread_ctx {
>> -	struct strbuf thread_name;
>> +	char *thread_name;
>>   	uint64_t *array_us_start;
>>   	size_t alloc;
>>   	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
> 
> Junio's suggestion in the linked E-Mail was to make this a "const char *".

Yes, it was.  To me a "const char *" in a structure means that
the structure does not own the pointer and must not free it.
Whereas as "char *" means that the structure might own it and
should maybe free it when the structure is freed.  My usage here
is that the structure does own it (because it took it from the
temporary strbuf using strbuf_detach()) and so it must free it.
Therefore it should not be "const".  This has nothing to do with
whether or not we allow the thread name to be changed after the
fact.  (We don't, but that is a different issue).

> 
> Narrowly, I don't see why not just add a "const" to the "struct strbuf
> *" instead.

Adding "const" to a strbuf would be wrong in this case, since the
structure owns the strbuf and needs to strbuf_release the contained
buffer and (now) free the strbuf pointer, right?

This also makes things confusing -- all callers of tr2tls_create_self()
would now be responsible for allocating a strbuf to pass in -- and who
would own those.  This would also create opportunities for mistakes if
they pass in the address of a stack-based strbuf, right?

This is being used to initialize thread-based data, so the caller
can't just use a "function local static" or a "global static" strbuf.


> 
> But less narrowly if we're not going to change it why malloc a new one
> at all? Can't we just use the "const char *" passed into
> tr2tls_create_self(), and for the "th%02d:" case have the code that's
> formatting it handle that case?
> 
> I.e. have the things that use it as a "%s" now call a function that
> formats things as a function of the "ctx->thread_id" (which may be 0)
> and limit it by TR2_MAX_THREAD_NAME?
> 

This would be less efficient, right?  That thread name is included in
*EVERY* _perf and _event message emitted.  If we were to change the
design to have basically a callback to get the formatted value based
on the `ctx` or `cts->thread_id` and dynamically formatting the name,
then we would have to hit that callback once (or twice) for every Trace2
message, right?  That would be much slower than just having a fixed
string (formatted when the thread is created) that we can just use.
And even if we said that the callback could cache the result (like
we do when we lookup env vars), where would it cache it?  It would have
to cache it in the `ctx`, which is where it currently is and without
any of the unnecessary overhead, right?

I think you're assuming that callers of `tr2tls_create_self()` always
pass a literal string such that that string value is always safe to
reference later.  Nothing would prevent a caller from passing the
address of a stack buffer.  It is not safe to assume that that string
pointer will always be valid, such as after the thread exits.  It is
better for _create_self() to copy the given string (whether we format
it immediately or not) than to assume that the pointer will always be
valid, right?


So I don't think we should deviate from the patch that I submitted.

Jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-20 19:07     ` Jeff Hostetler
@ 2021-12-20 19:35       ` Ævar Arnfjörð Bjarmason
  2021-12-22 16:32         ` Jeff Hostetler
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-20 19:35 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler


On Mon, Dec 20 2021, Jeff Hostetler wrote:

> On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
>> 
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
>>> The thread name is set when the thread is created and should not be
>>> be modified afterwards.  Replace the strbuf with an allocated pointer
>>> to make that more clear.
>>>
>>> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/
>>>...
>>> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
>>> index 7da94aba522..cd8b9f2f0a0 100644
>>> --- a/trace2/tr2_tls.c
>>> +++ b/trace2/tr2_tls.c
>>> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>>   					     uint64_t us_thread_start)
>>>   {
>>>   	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>>> +	struct strbuf buf_name = STRBUF_INIT;
>>>     	/*
>>>   	 * Implicitly "tr2tls_push_self()" to capture the thread's start
>>> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>>     	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>>>   -	strbuf_init(&ctx->thread_name, 0);
>>>   	if (ctx->thread_id)
>>> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
>>> -	strbuf_addstr(&ctx->thread_name, thread_name);
>>> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>>> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
>>> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
>>> +	strbuf_addstr(&buf_name, thread_name);
>>> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
>>> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
>>> +
>>> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>>>     	pthread_setspecific(tr2tls_key, ctx);
>>>   
>>>..
>>> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
>>> index a90bd639d48..d968da6a679 100644
>>> --- a/trace2/tr2_tls.h
>>> +++ b/trace2/tr2_tls.h
>>> @@ -9,7 +9,7 @@
>>>   #define TR2_MAX_THREAD_NAME (24)
>>>     struct tr2tls_thread_ctx {
>>> -	struct strbuf thread_name;
>>> +	char *thread_name;
>>>   	uint64_t *array_us_start;
>>>   	size_t alloc;
>>>   	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
>> Junio's suggestion in the linked E-Mail was to make this a "const
>> char *".
>
> Yes, it was.  To me a "const char *" in a structure means that
> the structure does not own the pointer and must not free it.
> Whereas as "char *" means that the structure might own it and
> should maybe free it when the structure is freed.  My usage here
> is that the structure does own it (because it took it from the
> temporary strbuf using strbuf_detach()) and so it must free it.
> Therefore it should not be "const".  This has nothing to do with
> whether or not we allow the thread name to be changed after the
> fact.  (We don't, but that is a different issue).

We use the pattern of having a "const char *" that's really a "char *"
with a cast to free() in many existing APIs for this scenario.

Maybe the cast for free would be more correct here, see my recent
9081a421a6d (checkout: fix "branch info" memory leaks, 2021-11-16) & the
discussion it referencese. I.e. in that case we didn't go for the
"free((char *)ptr)" cast as it was a private API.

>> Narrowly, I don't see why not just add a "const" to the "struct
>> strbuf
>> *" instead.
>
> Adding "const" to a strbuf would be wrong in this case, since the
> structure owns the strbuf and needs to strbuf_release the contained
> buffer and (now) free the strbuf pointer, right?
>
> This also makes things confusing -- all callers of tr2tls_create_self()
> would now be responsible for allocating a strbuf to pass in -- and who
> would own those.  This would also create opportunities for mistakes if
> they pass in the address of a stack-based strbuf, right?
>
> This is being used to initialize thread-based data, so the caller
> can't just use a "function local static" or a "global static" strbuf.

Right, I meant that in the context of who/where you'd have your casts.

>> But less narrowly if we're not going to change it why malloc a new
>> one
>> at all? Can't we just use the "const char *" passed into
>> tr2tls_create_self(), and for the "th%02d:" case have the code that's
>> formatting it handle that case?
>> I.e. have the things that use it as a "%s" now call a function that
>> formats things as a function of the "ctx->thread_id" (which may be 0)
>> and limit it by TR2_MAX_THREAD_NAME?
>> 
>
> This would be less efficient, right?  That thread name is included in
> *EVERY* _perf and _event message emitted.  If we were to change the
> design to have basically a callback to get the formatted value based
> on the `ctx` or `cts->thread_id` and dynamically formatting the name,
> then we would have to hit that callback once (or twice) for every Trace2
> message, right?  That would be much slower than just having a fixed
> string (formatted when the thread is created) that we can just use.
> And even if we said that the callback could cache the result (like
> we do when we lookup env vars), where would it cache it?  It would have
> to cache it in the `ctx`, which is where it currently is and without
> any of the unnecessary overhead, right?

Aren't we per
https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/
doing a lot of that formatting (and sometimes allocation) anyway in a
way that's easily avoidable for the "perf" backend?

And for tr2_tgt_event.c we'll call jw_object_string(), which calls
append_quoted_string() for each event. That'll be re-quoting (presumably
always needlessly) the thread_name every time.

So just deferring a single strbuf_addf() doesn't seem like it would slow
things down.

> I think you're assuming that callers of `tr2tls_create_self()` always
> pass a literal string such that that string value is always safe to
> reference later.  Nothing would prevent a caller from passing the
> address of a stack buffer.  It is not safe to assume that that string
> pointer will always be valid, such as after the thread exits.  It is
> better for _create_self() to copy the given string (whether we format
> it immediately or not) than to assume that the pointer will always be
> valid, right?

Sure, if that's the API we can xstrdup() it, and/or xstrfmt() it etc. as
we're doing now.

> So I don't think we should deviate from the patch that I submitted.

I'm not saying anything needs to change here, these were really just
read-through suggestion, but I think per the above (about the casts &
optimization) that some of your assumptions here may not hold.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] trace2: add timer events to perf and event target formats
  2021-12-20 16:39   ` Ævar Arnfjörð Bjarmason
@ 2021-12-20 19:44     ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-20 19:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 12/20/21 11:39 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach Trace2 "perf" and "event" formats to handle "timer" events for
>> stopwatch timers.  Update API documentation accordingly.
>>
>> In a future commit, stopwatch timers will be added to the Trace2 API
>> and it will emit these "timer" events.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
 >>...
>> diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
>> index bb13ca3db8b..e6ed94ba814 100644
>> --- a/Documentation/technical/api-trace2.txt
>> +++ b/Documentation/technical/api-trace2.txt
>> @@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
>>   {
>>   	"event":"version",
>>   	...
>> -	"evt":"3",		       # EVENT format version
>> +	"evt":"4",		       # EVENT format version
>>   	"exe":"2.20.1.155.g426c96fcdb" # git version
>>   }
> 
> FWIW this seems like a time not to bump the version per the proposed
> approach in:
> https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/
> 
> Not directly related to this series, which just preserves the status
> quo, but it would be nice to get feedback on that proposal from you.

Frankly, my eyes glazed over every time I tried to read it....

Your proposal looks fine.  And yes, our assumptions are that because
we have structured data, new event types and/or new fields can be
added and safely ignored by JSON parsers, so we should be OK.
So we're assuming that only if we drop events or fields or change
the meaning of one of them, would a parser need to react and so
we can limit version bumps to those instances.

I'm OK with this.

I'll let you draft the wording in api-trace2.txt to explain the
how/when/why we want to update the version number in the future.
Thanks.


> 
>> [...]
>> + * Verison 1: original version
> 
> A typo of "Version".
> 
>> + * Version 2: added "too_many_files" event
>> + * Version 3: added "child_ready" event
>> + * Version 4: added "timer" event
>>    */
>> -#define TR2_EVENT_VERSION "3"
>> +#define TR2_EVENT_VERSION "4"
>>   

I'll roll this back in my next version.

>>   /*
>>    * Region nesting limit for messages written to the event target.
>> @@ -615,6 +620,38 @@ static void fn_data_json_fl(const char *file, int line,
>>   	}
>>   }
>>   
>> +static void fn_timer(uint64_t us_elapsed_absolute,
>> +		     const char *thread_name,
>> +		     const char *category,
>> +		     const char *timer_name,
>> +		     uint64_t interval_count,
>> +		     uint64_t us_total_time,
>> +		     uint64_t us_min_time,
>> +		     uint64_t us_max_time)
>> +{
>> +	const char *event_name = "timer";
>> +	struct json_writer jw = JSON_WRITER_INIT;
>> +	double t_abs = (double)us_elapsed_absolute / 1000000.0;
>> +
> 
> nit: Odd placement of \n\n
> 
>> +	double t_total = (double)us_total_time / 1000000.0;
>> +	double t_min   = (double)us_min_time   / 1000000.0;
>> +	double t_max   = (double)us_max_time   / 1000000.0;
> 
> Both for this...
> 
>> +	jw_object_begin(&jw, 0);
>> +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
>> +	jw_object_double(&jw, "t_abs", 6, t_abs);
>> +	jw_object_string(&jw, "name", timer_name);
>> +	jw_object_intmax(&jw, "count", interval_count);
>> +	jw_object_double(&jw, "t_total", 6, t_total);
>> +	jw_object_double(&jw, "t_min", 6, t_min);
>> +	jw_object_double(&jw, "t_max", 6, t_max);
> 
> [...]
> 
>> +static void fn_timer(uint64_t us_elapsed_absolute,
>> +		     const char *thread_name,
>> +		     const char *category,
>> +		     const char *timer_name,
>> +		     uint64_t interval_count,
>> +		     uint64_t us_total_time,
>> +		     uint64_t us_min_time,
>> +		     uint64_t us_max_time)
>> +{
>> +	const char *event_name = "timer";
>> +	struct strbuf buf_payload = STRBUF_INIT;
>> +
>> +	double t_total = (double)us_total_time / 1000000.0;
>> +	double t_min   = (double)us_min_time   / 1000000.0;
>> +	double t_max   = (double)us_max_time   / 1000000.0;
>> +
>> +	strbuf_addf(&buf_payload, "name:%s", timer_name);
>> +	strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count);
>> +	strbuf_addf(&buf_payload, " total:%9.6f", t_total);
>> +	strbuf_addf(&buf_payload, " min:%9.6f", t_min);
>> +	strbuf_addf(&buf_payload, " max:%9.6f", t_max);
> 
> ....and this, wouldn't it be better/more readable to retain the uint64_t
> for the math, and just cast if needed when we're doing the formatting?
> 

I had those expressions inline at first and it really junked up the
lines and made things hard to read -- partially because of the need
to wrap the lines a lot.  I went with the local t_* temp vars to make
it more clear what we were doing.  This style also matched the existing
code in _tgt_event.c for `t_abs` and `t_rel` in all of the fn_*.

Jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-20 15:01 ` [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char* Jeff Hostetler via GitGitGadget
  2021-12-20 16:31   ` Ævar Arnfjörð Bjarmason
@ 2021-12-21  7:22   ` Junio C Hamano
  2021-12-22 16:28     ` Jeff Hostetler
  1 sibling, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2021-12-21  7:22 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
> The thread name is set when the thread is created and should not be
> be modified afterwards.  Replace the strbuf with an allocated pointer
> to make that more clear.

Sounds good.  Use of strbuf is perfectly fine while you compute the
final value of the string, but as a more permanent location to store
the result, it often is unsuitable (and strbuf_split_buf() is a prime
example of how *not* to design your API function around the type).

> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 7da94aba522..cd8b9f2f0a0 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  					     uint64_t us_thread_start)
>  {
>  	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> +	struct strbuf buf_name = STRBUF_INIT;
>  
>  	/*
>  	 * Implicitly "tr2tls_push_self()" to capture the thread's start
> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  
>  	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>  
> -	strbuf_init(&ctx->thread_name, 0);
>  	if (ctx->thread_id)
> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
> -	strbuf_addstr(&ctx->thread_name, thread_name);
> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
> +	strbuf_addstr(&buf_name, thread_name);
> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
> +
> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);

This is not exactly a new problem, but if we use a mechanism to
allow arbitrary long string (like composing with strbuf and
detaching the resulting string as is), instead of having a fixed
name[] array embedded in the ctx structure, I wonder if applying the
maximum length this early makes sense.  Such a truncation would
allow more than one ctx structures to share the same name, which
somehow feels error prone, inviting a mistake to use .thread_name
member as an identifier, when its only intended use is to give a
human-readable and not necessarily unique label.  Of course, if the
maximum is reasonably low, like a few dozen bytes, it may even make
sense to embed an array of the fixed size and not worry about an
extra pointer.

> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index a90bd639d48..d968da6a679 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -9,7 +9,7 @@
>  #define TR2_MAX_THREAD_NAME (24)
>  
>  struct tr2tls_thread_ctx {
> -	struct strbuf thread_name;
> +	char *thread_name;

That is, something like

	char thread_name[TR2_MAX_THREAD_NAME + 1];

perhaps with moving it to the end of the struct to avoid padding
waste, would make more sense than the posted patch, if we accept
an early truncation and information loss.

The other extreme would also make equally more sense than the posted
patch.  Just grab strbuf_detach() result without truncation and
point at it with "char *thread_name" here, and if the output layer
wants to limit the names to some reasonable length, deal with the
TR2_MAX_THREAD_NAME at that layer, without losing information too
early.  It might be a much bigger surgery, I am afraid, because the
users of ctx->thread_name (and old ctx->thread_name.buf) all are
relying on the string being shorter than TR2_MAX_THREAD_NAME.

>  	uint64_t *array_us_start;
>  	size_t alloc;
>  	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-20 15:01 ` [PATCH 3/9] trace2: defer free of TLS CTX until program exit Jeff Hostetler via GitGitGadget
@ 2021-12-21  7:30   ` Junio C Hamano
  2021-12-22 21:59     ` Jeff Hostetler
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2021-12-21  7:30 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Defer freeing of the Trace2 thread CTX data until program exit.
> Create a global list of thread CTX data to own the storage.
>
> TLS CTX data is allocated when a thread is created and associated
> with that thread.  Previously, that storage was deleted when the
> thread exited.  Now we simply disassociate the CTX data from the
> thread when it exits and let the global CTX list manage the cleanup.

By the way, TLS CTX sounds embarrassingly close and confusing to
some function that we may find in say openssl or some crypto stuff
X-<.  Was there a strong reason to avoid calling these functions and
types something like tr2_thread_ctx instead of tr2tls_thread_ctx?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-20 16:31   ` Ævar Arnfjörð Bjarmason
  2021-12-20 19:07     ` Jeff Hostetler
@ 2021-12-21  7:33     ` Junio C Hamano
  1 sibling, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2021-12-21  7:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>>  struct tr2tls_thread_ctx {
>> -	struct strbuf thread_name;
>> +	char *thread_name;
>>  	uint64_t *array_us_start;
>>  	size_t alloc;
>>  	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
>
> Junio's suggestion in the linked E-Mail was to make this a "const char *".

Sorry, but in that linked E-Mail, I wasn't picking between "const
char *" and "char *" at all.  What I cared was *not* to keep a
long-term constant string in a member whose type is "struct strbuf".

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/9] trace2: add timer events to perf and event target formats
  2021-12-20 15:01 ` [PATCH 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
  2021-12-20 16:39   ` Ævar Arnfjörð Bjarmason
@ 2021-12-21 14:20   ` Derrick Stolee
  1 sibling, 0 replies; 55+ messages in thread
From: Derrick Stolee @ 2021-12-21 14:20 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget, git; +Cc: Jeff Hostetler

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> +`"timer"`::
> +	This event is generated at the end of the program and contains
> +	statistics for a global stopwatch timer.
> ++
> +------------
> +{
> +	"event":"timer",
> +	...
> +	"name":"test",      # timer name
> +	"count":42,         # number of start+stop intervals
> +	"t_total":1.234,    # sum of all intervals (by thread or globally)
> +	"t_min":0.1,        # shortest interval
> +	"t_max":0.9,        # longest interval

Could you specify the units for these t_* entries? I'm guessing seconds
based on the example, but I've seen similar timers using milliseconds
instead so it's best to be super clear here.

> +/*
> + * Stopwatch timer event.  This function writes the previously accumlated

s/accumlated/accumulated/

> + * stopwatch timer values to the event streams.  Unlike other Trace2 API
> + * events, this is decoupled from the data collection.
> + *
> + * This does not take a (file,line) pair because a timer event reports
> + * the cummulative time spend in the timer over a series of intervals

s/cummulative/cumulative/

> + * -- it does not represent a single usage (like region or data events
> + * do).
> + *
> + * The thread name is optional.  If non-null it will override the
> + * value inherited from the caller's TLS CTX.  This allows data
> + * for global timers to be reported without associating it with a
> + * single thread.
> + */
> +typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
> +				  const char *thread_name,
> +				  const char *category,
> +				  const char *timer_name,
> +				  uint64_t interval_count,
> +				  uint64_t us_total_time,
> +				  uint64_t us_min_time,
> +				  uint64_t us_max_time);

> diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
> index 4ce50944298..9b3905b920c 100644
> --- a/trace2/tr2_tgt_event.c
> +++ b/trace2/tr2_tgt_event.c
...
> +static void fn_timer(uint64_t us_elapsed_absolute,

(I was going to complain about the generic name here, but it's static
to the tr2_tgt_event.c file, so that's fine.)

> +		     const char *thread_name,
> +		     const char *category,
> +		     const char *timer_name,
> +		     uint64_t interval_count,
> +		     uint64_t us_total_time,
> +		     uint64_t us_min_time,
> +		     uint64_t us_max_time)
> +{
> +	const char *event_name = "timer";
> +	struct json_writer jw = JSON_WRITER_INIT;
> +	double t_abs = (double)us_elapsed_absolute / 1000000.0;
> +
> +	double t_total = (double)us_total_time / 1000000.0;
> +	double t_min   = (double)us_min_time   / 1000000.0;
> +	double t_max   = (double)us_max_time   / 1000000.0;

Looks like seconds here. At first glance, I thought this large division
might cause some loss of precision. However, the structure of floating
point numbers means we probably don't lose that much. It might be worth
_considering_ using milliseconds (only divide by 1000.0) but I'm
probably just being paranoid here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] trace2: add stopwatch timers
  2021-12-20 15:01 ` [PATCH 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
  2021-12-20 16:42   ` Ævar Arnfjörð Bjarmason
@ 2021-12-21 14:45   ` Derrick Stolee
  2021-12-22 21:57     ` Jeff Hostetler
  1 sibling, 1 reply; 55+ messages in thread
From: Derrick Stolee @ 2021-12-21 14:45 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget, git; +Cc: Jeff Hostetler

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Add a stopwatch timer mechanism to Git.

> +static void *ut_009timer_thread_proc(void *_ut_009_data)
> +{
> +	struct ut_009_data *data = _ut_009_data;
> +	int k;
> +
> +	trace2_thread_start("ut_009");
> +
> +	for (k = 0; k < data->count; k++) {
> +		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
> +		sleep_millisec(data->delay);
> +		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
> +	}
> +
> +	trace2_thread_exit();
> +	return NULL;
> +}
> +
> +

nit: double newline.

> +# Exercise the stopwatch timer "test" in a loop and confirm that it was
> +# we have as many start/stop intervals as expected.  We cannot really test
> +# the (elapsed, min, max) timer values, so we assume they are good.

We can't check their values, but we could check that their labels are
emitted.

> +test_expect_success 'test stopwatch timers - summary only' '
> +	test_when_finished "rm trace.perf actual" &&
> +	test_config_global trace2.perfBrief 1 &&
> +	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
> +	test-tool trace2 008timer 5 10 &&
> +	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
> +	grep "d0|summary|timer||_T_ABS_||test|name:test1 count:5" actual

adding something like " total:.* min: .* max:.*" to the end of this
pattern might be good. You could even get really specific about the ".*"
being a floating point number, but I'm not too concerned about that. I
just want to see that these other labels stay consistent in future Git
versions.

> +# Exercise the stopwatch timer "test" in a loop and confirm that it was
> +# we have as many start/stop intervals as expected.  We cannot really test
> +# the (t_timer, t_min, t_max) timer values, so we assume they are good.
Similar, we can do something such as...

> +have_timer_event () {
> +	thread=$1
> +	name=$2
> +	count=$3
> +	file=$4
> +
> +	grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file

Adding more detail to this pattern.

This helper could probably benefit from constructing the regex across
multiple string concatenations, so we can see the different pieces.
Something like

	pattern="\"event\":\"timer\""
	pattern="$pattern.*\"thread\":\"${thread}\""
	pattern="$pattern.*\"name\":\"${name}\""
	pattern="$pattern.*\"count\":\"${count}\""
	pattern="$pattern.*\"t_total\":"
	pattern="$pattern.*\"t_min\":"
	pattern="$pattern.*\"t_max\":"

	grep "$pattern" $file

> +
> +	return $?

If we used && throughout this method, would this return not be
necessary?

> +static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +	struct tr2tmr_block merged;
> +
> +	memset(&merged, 0, sizeof(merged));
> +
> +	/*
> +	 * Sum across all of the per-thread stopwatch timer data into
> +	 * a single composite block of timer values.
> +	 */
> +	tr2tls_aggregate_timer_blocks(&merged);
> +
> +	/*
> +	 * Emit "summary" timer events for each composite timer value
> +	 * that had activity.
> +	 */
> +	for_each_wanted_builtin (j, tgt_j)
> +		if (tgt_j->pfn_timer)
> +			tr2tmr_emit_block(tgt_j->pfn_timer,
> +					  us_elapsed_absolute,
> +					  &merged, "summary");

I'd put braces at the for-loop level, even though this is semantically
correct without them.

> +}
> +
> +static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +
> +	for_each_wanted_builtin (j, tgt_j)
> +		if (tgt_j->pfn_timer)
> +			tr2tls_emit_timer_blocks_by_thread(tgt_j->pfn_timer,
> +							   us_elapsed_absolute);

(same here)

> +/*
> + * Define the set of stopwatch timers.
> + *
> + * We can add more at any time, but they must be defined at compile
> + * time (to avoid the need to dynamically allocate and synchronize
> + * them between different threads).
> + *
> + * These must start at 0 and be contiguous (because we use them
> + * elsewhere as array indexes).

I was worried at first about using an array here, but this is essentially
one chunk of global memory per process that will not be very large, even
if we add a lot of timer IDs here. If we use this API enough that that
memory is a problem, then we can refactor the memory to be a hashmap that
only populates entries for IDs that are used by the process.

> + * Any values added to this enum must also be added to the timer definitions
> + * array.  See `trace2/tr2_tmr.c:tr2tmr_def_block[]`.
> + */
> +enum trace2_timer_id {
> +	/*
> +	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
> +	 * These can be used for ad hoc testing, but should not be used
> +	 * for permanent analysis code.
> +	 */
> +	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
> +	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
> +
> +
> +	/* Add additional timer definitions before here. */
> +	TRACE2_NUMBER_OF_TIMERS
> +};

...

> +static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = {
> +	[TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 },
> +	[TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 },
> +};

Although this will always be populated, so maybe my thoughts about how
to reduce memory load in the hypothetical future are worthless.

> +void tr2tmr_start(enum trace2_timer_id tid)
> +{
> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
> +	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
> +
> +	t->recursion_count++;
> +	if (t->recursion_count > 1)
> +		return; /* ignore recursive starts */
> +
> +	t->start_us = getnanotime() / 1000;

Using nanotime gives us the best precision available, and dividing
by 1000 will lose some precision. This is likely why we saw some
0.000000 values for t_min in some of your experiments. That should
be rare for real uses of this API (such as wrapping lstat() calls).

But why do we divide by 1000 here at all? 2^63 nanoseconds is
still 292 years, so we don't risk overflow. You specify uint64_t
so this isn't different on 32-bit machines.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] Trace2 stopwatch timers and global counters
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-12-20 15:01 ` [PATCH 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
@ 2021-12-21 14:51 ` Derrick Stolee
  2021-12-21 23:27   ` Matheus Tavares
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
  10 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee @ 2021-12-21 14:51 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget, git
  Cc: Jeff Hostetler, Matheus Tavares Bernardino

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> Extend Trace2 to provide multiple "stopwatch timers" and "global counters".

>  3. Rationale
> 
> Timers and counters are an alternative to the existing "region" and "data"
> events. The latter are intended to trace the major flow (or phases) of the
> program and possibly capture the amount of work performed within a loop, for
> example. The former are offered as a way to measure activity that is not
> localized, such as the time spent in zlib or lstat, which may be called from
> many different parts of the program.
> 
> There are currently several places in the Git code where we want to measure
> such activity -- changed-path Bloom filter stats, topo-walk commit counts,
> and tree-walk counts and max-depths. A conversation in [1] suggested that we
> should investigate a more general mechanism to collect stats so that each
> instance doesn't need to recreate their own atexit handling mechanism.
> 
> This is an attempt to address that and let us easily explore other areas in
> the future.
> 
> This patch series does not attempt to refactor those three instances to use
> the new timers and counters. That should be a separate effort -- in part
> because we may want to retool them rather than just translate them. For
> example, rather than just translating the existing four Bloom filter counts
> (in revision.c) into Trace2 counters, we may instead want to have a "happy
> path timer" and a "sad path timer" if that would provide more insight.

I'm excited for these API features. It might be nice to have an RFC-
quality series demonstrating how these examples could work with the
new API. Makes sense to delay in case there were recommended changes
to the API from review in this v1.

I also like your attention to thread contexts. I think these timers
would be very interesting to use in parallel checkout. CC'ing Matheus
for his thoughts on where he would want timer summaries for that
feature. I would probably want the per-thread summary to know if we
are blocked on one really long thread while the others finish quickly.
Within that: what are the things causing us to be slow? Is it zlib?
Is it lstat()?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/9] Trace2 stopwatch timers and global counters
  2021-12-21 14:51 ` [PATCH 0/9] Trace2 stopwatch timers and " Derrick Stolee
@ 2021-12-21 23:27   ` Matheus Tavares
  0 siblings, 0 replies; 55+ messages in thread
From: Matheus Tavares @ 2021-12-21 23:27 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> >
> >  3. Rationale
> >
> > Timers and counters are an alternative to the existing "region" and "data"
> > events. The latter are intended to trace the major flow (or phases) of the
> > program and possibly capture the amount of work performed within a loop, for
> > example. The former are offered as a way to measure activity that is not
> > localized, such as the time spent in zlib or lstat, which may be called from
> > many different parts of the program.
>
> I'm excited for these API features.

Me too! This would have been very useful on some experiments I had to
run in the past.

Thanks for working on it, Jeff :)

> I also like your attention to thread contexts. I think these timers
> would be very interesting to use in parallel checkout. CC'ing Matheus
> for his thoughts on where he would want timer summaries for that
> feature.

For parallel checkout, I think it would be interesting to have timer
summaries for open/close, fstat/lstat, write, and
inflation/delta-reconstruction. Perhaps pkt-line routines too, so that
we can see how much time we spend in inter-process communication.

It would be nice to have timer information for disk reading as well
(more on that below), but I don't think it is possible since we read
the objects through mmap() and thus, we cannot easily isolate the
actual reading time from the decompression time :(

> I would probably want the per-thread summary to know if we
> are blocked on one really long thread while the others finish quickly.

That would be interesting. Parallel checkout actually uses
subprocesses, but I can see the per-thread summary being useful on
grep, for example. (Nevertheless, the use case you mentioned for the
timers -- to evaluate the work balance on parallel checkout -- seems
very interesting.)

> Within that: what are the things causing us to be slow? Is it zlib?
> Is it lstat()?

On my tests, the bottleneck on checkout heavily depended on the
underlying storage type. On HDDs, the bottleneck was object reading
(i.e. page faults on mmap()-ed files), with about 70% to 80% of the
checkout runtime.

On SSDs, reading was much faster, so CPU (i.e. inflation) became the
bottleneck, with 50% of the runtime. (Inflation only lost to reading
when checking out from *many* loose objects.)

Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat()
(which makes the NFS client flush previously cached writes to the
server) were the bottlenecks, with about 40% of the total runtime
each.

These numbers come from a (sequential) `git checkout .` execution on
an empty working tree of the Linux kernel (v5.12), and they were
gathered using eBPF-based profilers. For other operations, especially
ones that require many file removals or more laborious tree merging in
unpack_trees(), I suspect the bottlenecks may change.

If anyone would be interested in seeing the flamegraphs and other
plots for these profiling numbers, I have them at:
https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling

And there is a bit more context at:
https://matheustavares.gitlab.io/posts/parallel-checkout

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-21  7:22   ` Junio C Hamano
@ 2021-12-22 16:28     ` Jeff Hostetler
  2021-12-22 19:57       ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 16:28 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler



On 12/21/21 2:22 AM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
>> The thread name is set when the thread is created and should not be
>> be modified afterwards.  Replace the strbuf with an allocated pointer
>> to make that more clear.
> 
> Sounds good.  Use of strbuf is perfectly fine while you compute the
> final value of the string, but as a more permanent location to store
> the result, it often is unsuitable (and strbuf_split_buf() is a prime
> example of how *not* to design your API function around the type).
> 
>> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
>> index 7da94aba522..cd8b9f2f0a0 100644
>> --- a/trace2/tr2_tls.c
>> +++ b/trace2/tr2_tls.c
>> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   					     uint64_t us_thread_start)
>>   {
>>   	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>> +	struct strbuf buf_name = STRBUF_INIT;
>>   
>>   	/*
>>   	 * Implicitly "tr2tls_push_self()" to capture the thread's start
>> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   
>>   	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>>   
>> -	strbuf_init(&ctx->thread_name, 0);
>>   	if (ctx->thread_id)
>> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
>> -	strbuf_addstr(&ctx->thread_name, thread_name);
>> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
>> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
>> +	strbuf_addstr(&buf_name, thread_name);
>> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
>> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
>> +
>> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
> 
> This is not exactly a new problem, but if we use a mechanism to
> allow arbitrary long string (like composing with strbuf and
> detaching the resulting string as is), instead of having a fixed
> name[] array embedded in the ctx structure, I wonder if applying the
> maximum length this early makes sense.  Such a truncation would
> allow more than one ctx structures to share the same name, which
> somehow feels error prone, inviting a mistake to use .thread_name
> member as an identifier, when its only intended use is to give a
> human-readable and not necessarily unique label.  Of course, if the
> maximum is reasonably low, like a few dozen bytes, it may even make
> sense to embed an array of the fixed size and not worry about an
> extra pointer.
> 

I'll convert it to a flex-array at the bottom of the CTX structure
and then defer the truncation to the _perf target (which only does
that to keep the columns lined up).

That will simplify things considerably.

Thanks
Jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-20 19:35       ` Ævar Arnfjörð Bjarmason
@ 2021-12-22 16:32         ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 16:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler



On 12/20/21 2:35 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler wrote:
> 
>> On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote:
>>> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
>>>
>>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>>
>>
>> Yes, it was.  To me a "const char *" in a structure means that
>> the structure does not own the pointer and must not free it.
>> Whereas as "char *" means that the structure might own it and
>> should maybe free it when the structure is freed.  My usage here
>> is that the structure does own it (because it took it from the
>> temporary strbuf using strbuf_detach()) and so it must free it.
>> Therefore it should not be "const".  This has nothing to do with
>> whether or not we allow the thread name to be changed after the
>> fact.  (We don't, but that is a different issue).
> 
> We use the pattern of having a "const char *" that's really a "char *"
> with a cast to free() in many existing APIs for this scenario.


As I mention later in this thread, I'm going to convert the
field into a flex-array, so most of the discussion in this
part of the thread no longer applies.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  2021-12-22 16:28     ` Jeff Hostetler
@ 2021-12-22 19:57       ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2021-12-22 19:57 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> I'll convert it to a flex-array at the bottom of the CTX structure
> and then defer the truncation to the _perf target (which only does
> that to keep the columns lined up).
>
> That will simplify things considerably.

I am not sure if the complexity of flex-array is worth it.

You have been storing an up-to-24-byte human readable name by
embedding a strbuf that has two size_t plus a pointer (i.e. 24-bytes
even on Windows), and the posted patch changes it to a pointer plus
a on-heap allocation with malloc() overhead.

An embedded fixed-size thread_name[TR2_MAX_THREAD_NAME+1] member
may be the simplest thing to do, I suspect.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] trace2: add stopwatch timers
  2021-12-20 16:42   ` Ævar Arnfjörð Bjarmason
@ 2021-12-22 21:38     ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 21:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 12/20/21 11:42 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>> [...]
>> +
>> +void trace2_timer_stop(enum trace2_timer_id tid)
>> +{
>> +	if (!trace2_enabled)
>> +		return;
>> +
>> +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
>> +		BUG("invalid timer id: %d", tid);
> 
> nit / style: maybe assert() instead for cases where assert() produces
> better info than BUG(). I.e. it would quote the whole expression, and
> show you what condition it violated....

I'd rather leave it a BUG() so that we always get the
guard code.  assert() goes away in non-debug builds and
a little while later "tid" will be used as a subscript.

I'll add the function name to the BUG message to make
it a little clearer.


[...]
> 
> Perhaps more readable/easily understood as just a (untested):
> 
>      if (!t->interval_count || us_interval >= t->min_us)
> 	    t->min_us = us_interval;
>      if (!t->interval_count || us_interval >= t->max_us)
> 	    t->max_us = us_interval;
> 
> I.e. to avoid duplicating the identical assignment...
[...]

I'll look at something here to make this a little less
messy.  Probably add a MIN() and MAX() to the mixture.

> 
>> +	/*
>> +	 * Number of nested starts on the stack in this thread.  (We
>> +	 * ignore recursive starts and use this to track the recursive
>> +	 * calls.)
>> +	 */
>> +	unsigned int recursion_count;
> 
> Earlier we have various forms of:
> 
>      if (t->recursion_count > 1)
> 
> But since it's unsigned can we just make those a:
> 
>      if (t->recursion_count)
> 

The places that are > 0, yes.  But the > 1 instances
are different since we're counting how many calls are
on the stack and want to handle recursive calls differently
than the first.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/9] trace2: add stopwatch timers
  2021-12-21 14:45   ` Derrick Stolee
@ 2021-12-22 21:57     ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 21:57 UTC (permalink / raw)
  To: Derrick Stolee, Jeff Hostetler via GitGitGadget, git; +Cc: Jeff Hostetler



On 12/21/21 9:45 AM, Derrick Stolee wrote:
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Add a stopwatch timer mechanism to Git.
[...]
> 
>> +# Exercise the stopwatch timer "test" in a loop and confirm that it was
>> +# we have as many start/stop intervals as expected.  We cannot really test
>> +# the (elapsed, min, max) timer values, so we assume they are good.
> 
> We can't check their values, but we could check that their labels are
> emitted.

good point.  i'll add that to the patterns in the grep.


> 
[...]
>> +	grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file
> 
> Adding more detail to this pattern.
> 
> This helper could probably benefit from constructing the regex across
> multiple string concatenations, so we can see the different pieces.
> Something like
> 
> 	pattern="\"event\":\"timer\""
> 	pattern="$pattern.*\"thread\":\"${thread}\""
> 	pattern="$pattern.*\"name\":\"${name}\""
> 	pattern="$pattern.*\"count\":\"${count}\""
> 	pattern="$pattern.*\"t_total\":"
> 	pattern="$pattern.*\"t_min\":"
> 	pattern="$pattern.*\"t_max\":"
> 
> 	grep "$pattern" $file
> 

yeah, that helps a lot.  thanks.


[...]
>> +/*
>> + * Define the set of stopwatch timers.
>> + *
>> + * We can add more at any time, but they must be defined at compile
>> + * time (to avoid the need to dynamically allocate and synchronize
>> + * them between different threads).
>> + *
>> + * These must start at 0 and be contiguous (because we use them
>> + * elsewhere as array indexes).
> 
> I was worried at first about using an array here, but this is essentially
> one chunk of global memory per process that will not be very large, even

s/process/thread/

> if we add a lot of timer IDs here. If we use this API enough that that
> memory is a problem, then we can refactor the memory to be a hashmap that
> only populates entries for IDs that are used by the process.

we're only talking about 48 bytes per timer being added to the thread
context.  and it is allocated, not stack based, so i'm not worried
about it.

and besides, we get constant time lookups when starting/stopping
a timer.  And when we get ready to sum across the thread pool, we
can do it efficiently.

> 
>> + * Any values added to this enum must also be added to the timer definitions
>> + * array.  See `trace2/tr2_tmr.c:tr2tmr_def_block[]`.
>> + */
>> +enum trace2_timer_id {
>> +	/*
>> +	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
>> +	 * These can be used for ad hoc testing, but should not be used
>> +	 * for permanent analysis code.
>> +	 */
>> +	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
>> +	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
>> +
>> +
>> +	/* Add additional timer definitions before here. */
>> +	TRACE2_NUMBER_OF_TIMERS
>> +};
> 
> ....
> 
>> +static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = {
>> +	[TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 },
>> +	[TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 },
>> +};
> 
> Although this will always be populated, so maybe my thoughts about how
> to reduce memory load in the hypothetical future are worthless.

yeah, i think this model works well for us.  and it is lock free.

> 
>> +void tr2tmr_start(enum trace2_timer_id tid)
>> +{
>> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
>> +	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
>> +
>> +	t->recursion_count++;
>> +	if (t->recursion_count > 1)
>> +		return; /* ignore recursive starts */
>> +
>> +	t->start_us = getnanotime() / 1000;
> 
> Using nanotime gives us the best precision available, and dividing
> by 1000 will lose some precision. This is likely why we saw some
> 0.000000 values for t_min in some of your experiments. That should
> be rare for real uses of this API (such as wrapping lstat() calls).
> 
> But why do we divide by 1000 here at all? 2^63 nanoseconds is
> still 292 years, so we don't risk overflow. You specify uint64_t
> so this isn't different on 32-bit machines.

When I did the original Trace2 parts, I made absolute and relative
elapsed times be in microseconds.  With the overhead of logging
and etc, the lower bits weren't really useful.  And then I converted
those to "%9.6f" in the trace logs, so that we always have "seconds"
in the traces.

I just copied that model when I did timers.  But I could see keeping
nanoseconds around for these timers (since they don't log on every
start/stop, like regions do).

<grin> While drafting this reply I've been fixing up the code in
parallel.  I converted timers to report integer ns values rather
than floats.  And in every line of trace output the total/min/max
timer values all end in 000 -- because (at least on MacOS) getnanotime()
calls gettimeofday() and computes (tv.tv_sec * 1000000000 + tv.tv_usec).
</grin>

So maybe I did the original "%9.6" for other reasons....
I'll try it on another OS later and see if it is useful.


Thanks,
Jeff



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-21  7:30   ` Junio C Hamano
@ 2021-12-22 21:59     ` Jeff Hostetler
  2021-12-22 22:56       ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 21:59 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler



On 12/21/21 2:30 AM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Defer freeing of the Trace2 thread CTX data until program exit.
>> Create a global list of thread CTX data to own the storage.
>>
>> TLS CTX data is allocated when a thread is created and associated
>> with that thread.  Previously, that storage was deleted when the
>> thread exited.  Now we simply disassociate the CTX data from the
>> thread when it exits and let the global CTX list manage the cleanup.
> 
> By the way, TLS CTX sounds embarrassingly close and confusing to
> some function that we may find in say openssl or some crypto stuff
> X-<.  Was there a strong reason to avoid calling these functions and
> types something like tr2_thread_ctx instead of tr2tls_thread_ctx?
> 

I hadn't really thought about the term "TLS" in the context
of crypto -- I had "thread local storage" on the brain.  I guess
I've spent too much of my youth using Win32 thread APIs. :-)

Let me take a look at removing those terms.

Jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 9/9] trace2: add global counters
  2021-12-20 17:14   ` Ævar Arnfjörð Bjarmason
@ 2021-12-22 22:18     ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 22:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 12/20/21 12:14 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>> [...]

>> +	int nr_threads = 0;
>> +	int k;
>> +	pthread_t *pids = NULL;
>> +
>> +	if (argc != 3)
>> +		die("%s", usage_error);
>> +	if (get_i(&data.v1, argv[0]))
>> +		die("%s", usage_error);
>> +	if (get_i(&data.v2, argv[1]))
>> +		die("%s", usage_error);
>> +	if (get_i(&nr_threads, argv[2]))
>> +		die("%s", usage_error);
> 
> A partial nit on existing code, as this just extends the pattern, but
> couldn't much of this get_i() etc. just be made redundant by simply
> using the parse-options.c API here?  I.e. OPTION_INTEGER and using named
> arguments would do the validation or you.

I suppose.  It just seemed like a little overkill setting things
up for such a simple and isolated test.  And the cut-n-paste was
quick enough for my purposes.

> 
>> +# Exercise the global counter in a loop and confirm that we get the
>> +# expected sum in an event record.
>> +#
>> +
>> +have_counter_event () {
>> +	thread=$1
>> +	name=$2
>> +	value=$3
>> +	file=$4
>> +
>> +	grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file
>> +
>> +	return $?
>> +}
> 
> It looks like there's no helper, but this is the Nth thing I see where
> wish our "test_region" helper were just a bit more generalized. I.e.:
> 
>      test_trace2 --match=counter --match=thread=$thread --match=name=$name --match=value=$value <trace>
> 
> With test_region just being a wrapper for something like:
> 
>      test_trace2 --match=region_enter --match=category=$category --match=label=$label <trace> &&
>      test_trace2 --match=region_leave --match=category=$category --match=label=$label <trace>

Yes, that would be nice.  But I don't think we should
start that in the middle of this patch series.  Perhaps
you could start a top-level message with a fleshed out
proposal and let everyone discuss it.

> 
>> +static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute)
>> +{
>> +	struct tr2_tgt *tgt_j;
>> +	int j;
>> +	struct tr2ctr_block merged;
>> +
>> +	memset(&merged, 0, sizeof(merged));
> 
> nit: more memset v.s. "{ 0 }".

Yeah, but lldb wouldn't stop complaining until it was "= { { { 0 } } }"

Jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 8/9] trace2: add counter events to perf and event target formats
  2021-12-20 16:51   ` Ævar Arnfjörð Bjarmason
@ 2021-12-22 22:56     ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 22:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler



On 12/20/21 11:51 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>> [...]
> 
>> +static void fn_counter(uint64_t us_elapsed_absolute,
>> +		       const char *thread_name,
>> +		       const char *category,
>> +		       const char *counter_name,
>> +		       uint64_t value)
>> +{
>> +	const char *event_name = "counter";
>> +	struct strbuf buf_payload = STRBUF_INIT;
>> +
>> +	strbuf_addf(&buf_payload, "name:%s", counter_name);
>> +	strbuf_addf(&buf_payload, " value:%"PRIu64, value);
> 
> Odd to have these be two seperate strbuf_addf()...

yeah, i'll combine.  and in the body of fn_timer in 6/9.


> ....but more generally, and I see from e.g. the existing fn_version_fl
> that you're just using existing patterns, but it seems odd not to have a
> trivial varargs fmt helper for perf_io_write_fl that would avoid the
> whole strbuf/addf/release dance.
[...]

yeah, cut-n-paste was used here and i was maintaining
consistency with the other functions -- rather than inventing
something new and refactoring stuff that didn't need be refactored
in the middle of an on-going patch series.


> I did a quick experiment to do that, patch on "master" below. A lot of
> the boilerplate could be simplified by factoring out the
> sq_quote_buf_pretty() case, and even this approach (re)allocs in a way
> that looks avoidable in many cases if perf_fmt_prepare() were improved
> (but it looks like it nedes its if/while loops in some cases still):
> 
[...]
>   
> +__attribute__((format (printf, 8, 9)))
> +static void perf_io_write_fl_fmt(const char *file, int line, const char *event_name,
> +				 const struct repository *repo,
> +				 uint64_t *p_us_elapsed_absolute,
> +				 uint64_t *p_us_elapsed_relative,
> +				 const char *category,
> +				 const char *fmt, ...)
> +{
> +	va_list ap;
> +	struct strbuf sb = STRBUF_INIT;
> +
> +	va_start(ap, fmt);
> +	strbuf_vaddf(&sb, fmt, ap);
> +	va_end(ap);
> +
> +	perf_io_write_fl(file, line, event_name, repo, p_us_elapsed_absolute,
> +			 p_us_elapsed_relative, category, &sb);
> +
> +	strbuf_release(&sb);
> +}
> +
>   static void fn_version_fl(const char *file, int line)
>   {
>   	const char *event_name = "version";
> -	struct strbuf buf_payload = STRBUF_INIT;
> -
> -	strbuf_addstr(&buf_payload, git_version_string);
>   
> -	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
> -			 &buf_payload);
> -	strbuf_release(&buf_payload);
> +	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
> +			     "%s", git_version_string);
>   }
[...]

Yes, it might be nice to have a _fmt() version as you suggest
and simplify many of the existing fn_*() function bodies.

It seems like I keep saying this today, but can we discuss that
in a new top-level topic and not down inside commit 8/9 of this
series?

Thanks,
Jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-22 21:59     ` Jeff Hostetler
@ 2021-12-22 22:56       ` Junio C Hamano
  2021-12-22 23:04         ` Jeff Hostetler
  2021-12-23  7:38         ` Johannes Sixt
  0 siblings, 2 replies; 55+ messages in thread
From: Junio C Hamano @ 2021-12-22 22:56 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> I hadn't really thought about the term "TLS" in the context
> of crypto -- I had "thread local storage" on the brain.  I guess
> I've spent too much of my youth using Win32 thread APIs. :-)
>
> Let me take a look at removing those terms.

Nah, it may be just me.  As long as what TLS stands for is clear in
the context, it is fine.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-22 22:56       ` Junio C Hamano
@ 2021-12-22 23:04         ` Jeff Hostetler
  2021-12-23  7:38         ` Johannes Sixt
  1 sibling, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-22 23:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler



On 12/22/21 5:56 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> I hadn't really thought about the term "TLS" in the context
>> of crypto -- I had "thread local storage" on the brain.  I guess
>> I've spent too much of my youth using Win32 thread APIs. :-)
>>
>> Let me take a look at removing those terms.
> 
> Nah, it may be just me.  As long as what TLS stands for is clear in
> the context, it is fine.
> 

ok thanks.  i took a quick look at scrubbing the
code of TLS and even though most of the uses are
in private (or protected) tr2_*.[ch] files, it
will be a large churn-type change and i'm not
sure it's worth the effort.

thanks
jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-22 22:56       ` Junio C Hamano
  2021-12-22 23:04         ` Jeff Hostetler
@ 2021-12-23  7:38         ` Johannes Sixt
  2021-12-23 18:18           ` Junio C Hamano
  1 sibling, 1 reply; 55+ messages in thread
From: Johannes Sixt @ 2021-12-23  7:38 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler
  Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Am 22.12.21 um 23:56 schrieb Junio C Hamano:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> I hadn't really thought about the term "TLS" in the context
>> of crypto -- I had "thread local storage" on the brain.  I guess
>> I've spent too much of my youth using Win32 thread APIs. :-)
>>
>> Let me take a look at removing those terms.
> 
> Nah, it may be just me.  As long as what TLS stands for is clear in
> the context, it is fine.

No, really, my first reaction was, too: what the hack has crypto to do
with trace2? Are we now sending around trace output by email?

Please use "TLS" next to "CTX" only when it means "Transport Layer
Security".

-- Hannes

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-23  7:38         ` Johannes Sixt
@ 2021-12-23 18:18           ` Junio C Hamano
  2021-12-27 18:51             ` Jeff Hostetler
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2021-12-23 18:18 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff Hostetler, Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Johannes Sixt <j6t@kdbg.org> writes:

> Am 22.12.21 um 23:56 schrieb Junio C Hamano:
>> Jeff Hostetler <git@jeffhostetler.com> writes:
>> 
>>> I hadn't really thought about the term "TLS" in the context
>>> of crypto -- I had "thread local storage" on the brain.  I guess
>>> I've spent too much of my youth using Win32 thread APIs. :-)
>>>
>>> Let me take a look at removing those terms.
>> 
>> Nah, it may be just me.  As long as what TLS stands for is clear in
>> the context, it is fine.
>
> No, really, my first reaction was, too: what the hack has crypto to do
> with trace2? Are we now sending around trace output by email?

Ok, then it is not just me ;-)
>
> Please use "TLS" next to "CTX" only when it means "Transport Layer
> Security".

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/9] trace2: defer free of TLS CTX until program exit.
  2021-12-23 18:18           ` Junio C Hamano
@ 2021-12-27 18:51             ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-27 18:51 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Sixt
  Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler



On 12/23/21 1:18 PM, Junio C Hamano wrote:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> Am 22.12.21 um 23:56 schrieb Junio C Hamano:
>>> Jeff Hostetler <git@jeffhostetler.com> writes:
>>>
>>>> I hadn't really thought about the term "TLS" in the context
>>>> of crypto -- I had "thread local storage" on the brain.  I guess
>>>> I've spent too much of my youth using Win32 thread APIs. :-)
>>>>
>>>> Let me take a look at removing those terms.
>>>
>>> Nah, it may be just me.  As long as what TLS stands for is clear in
>>> the context, it is fine.
>>
>> No, really, my first reaction was, too: what the hack has crypto to do
>> with trace2? Are we now sending around trace output by email?
> 
> Ok, then it is not just me ;-)
>>
>> Please use "TLS" next to "CTX" only when it means "Transport Layer
>> Security".

I'll make a note to go thru and remove/change these
terms in a future series rather than mix it in with
this one.

Jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 0/9] Trace2 stopwatch timers and global counters
  2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
                   ` (9 preceding siblings ...)
  2021-12-21 14:51 ` [PATCH 0/9] Trace2 stopwatch timers and " Derrick Stolee
@ 2021-12-28 19:36 ` Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
                     ` (9 more replies)
  10 siblings, 10 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler

Here is version 2 of my series to add stopwatch timers and global counters
to Trace2. I think this version address all of the comments made on V1.

 * I moved the Trace2 "thread_name" field into a flex-array at the bottom of
   the thread local storage block. This avoids the issue of whether it
   should be allocated and by whom and its const-ness.

 * I moved the truncation of the "thread_name" into the "_perf" target
   (which was the only target that actually cared) so that columns still
   line up.

 * Started phasing out the TLS and CTX acronyms in the Trace2 code. There is
   an ambiguity between "thread local storage" and "transport layer
   security" that caused some confusion. In this patch series, I eliminated
   new uses of the TLS term. A future series will be needed to actually
   rename variables, functions, and data types to fully eliminate the TLS
   term.

 * In V1 I included a change to the "_event" target version number. I've
   rolled this back in favor of Ævar's new proposal describing when/why we
   change it. (That proposal is independent of this series.)

 * In V1 I had reported timer values {total, min, max} in floating point
   seconds with microsecond precision (using a "%9.6f" format) and was
   internally accumulating interval times in microseconds. After some
   discussion, I've changed this to accumulate in nanoseconds and report
   integer nanoseconds. This may avoid some accumulated round off error.
   (However, on some platforms getnanotime() only has microsecond accuracy,
   so this increased precision may be misleading.)

 * Refactor the pattern model used in the unit tests to make it easier to
   visually parse.

 * Some cosmetic cleanup of the private timer and counter API.

There were additional requests/comments that I have not addressed in this
version because I think they should be in their own top-level topic in a
future series rather than appended onto this series:

 * The full elimination of the TLS and CTX terms.

 * Ævar proposed a new test_trace2 test function to parse trace output. This
   would be similar to (or a generalization of) the test_region function
   that we already have in test-lib-functions.sh.

 * Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
   varargs function to reduce the amount of copy-n-pasted code in many of
   the "fn" event handlers. This looks like a good change based on the
   mockup but is a large refactor.

 * Ævar proposed a new rationale for when/why we change the "_event" version
   number. That text can be added to the design document independently.

Jeff Hostetler (9):
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex
    array
  trace2: defer free of thread local storage until program exit.
  trace2: add thread-name override to event target
  trace2: add thread-name override to perf target
  trace2: add timer events to perf and event target formats
  trace2: add stopwatch timers
  trace2: add counter events to perf and event target formats
  trace2: add global counters

 Documentation/technical/api-trace2.txt | 157 +++++++++++++++++++++
 Makefile                               |   2 +
 t/helper/test-trace2.c                 | 183 +++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  88 ++++++++++++
 t/t0212-trace2-event.sh                |  86 ++++++++++++
 trace2.c                               | 106 ++++++++++++++
 trace2.h                               |  75 ++++++++++
 trace2/tr2_ctr.c                       |  67 +++++++++
 trace2/tr2_ctr.h                       |  79 +++++++++++
 trace2/tr2_tgt.h                       |  39 ++++++
 trace2/tr2_tgt_event.c                 | 111 +++++++++++----
 trace2/tr2_tgt_normal.c                |   2 +
 trace2/tr2_tgt_perf.c                  | 114 ++++++++++-----
 trace2/tr2_tls.c                       | 119 +++++++++++++---
 trace2/tr2_tls.h                       |  51 +++++--
 trace2/tr2_tmr.c                       | 136 ++++++++++++++++++
 trace2/tr2_tmr.h                       | 139 +++++++++++++++++++
 17 files changed, 1465 insertions(+), 89 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h


base-commit: e773545c7fe7eca21b134847f4fc2cbc9547fa14
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1099%2Fjeffhostetler%2Ftrace2-stopwatch-v2-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1099/jeffhostetler/trace2-stopwatch-v2-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1099

Range-diff vs v1:

  1:  96f6896a13e =  1:  96f6896a13e trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2:  3a4fe07e40e !  2:  ff8df1b148e trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
     @@ Metadata
      Author: Jeff Hostetler <jeffhost@microsoft.com>
      
       ## Commit message ##
     -    trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
     +    trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array
      
     -    Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
     -    The thread name is set when the thread is created and should not be
     -    be modified afterwards.  Replace the strbuf with an allocated pointer
     -    to make that more clear.
     +    Move the thread name to a flex array at the bottom of the Trace2
     +    thread local storage data and get rid of the strbuf.
      
     -    This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/
     +    Let the flex array have the full computed value of the thread name
     +    without truncation.
     +
     +    Change the PERF target to truncate the thread name so that the columns
     +    still line up.
      
          Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
      
     @@ trace2/tr2_tgt_event.c: static void event_fmt_prepare(const char *event_name, co
       	 * In brief mode, only emit <time> on these 2 event types.
      
       ## trace2/tr2_tgt_perf.c ##
     +@@ trace2/tr2_tgt_perf.c: static int tr2env_perf_be_brief;
     + 
     + #define TR2FMT_PERF_FL_WIDTH (28)
     + #define TR2FMT_PERF_MAX_EVENT_NAME (12)
     ++#define TR2FMT_PERF_MAX_THREAD_NAME (24)
     + #define TR2FMT_PERF_REPO_WIDTH (3)
     + #define TR2FMT_PERF_CATEGORY_WIDTH (12)
     + 
      @@ trace2/tr2_tgt_perf.c: static void perf_fmt_prepare(const char *event_name,
     + 	}
       
       	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
     - 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
     +-	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
      -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
     -+		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
     - 		    event_name);
     +-		    event_name);
     ++	strbuf_addf(buf, "%-*.*s | %-*s | ", TR2FMT_PERF_MAX_THREAD_NAME,
     ++		    TR2FMT_PERF_MAX_THREAD_NAME, ctx->thread_name,
     ++		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
       
       	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
     + 	if (repo)
      
       ## trace2/tr2_tls.c ##
     -@@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
     +@@ trace2/tr2_tls.c: void tr2tls_start_process_clock(void)
     + struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
       					     uint64_t us_thread_start)
       {
     - 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
     +-	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
     ++	struct tr2tls_thread_ctx *ctx;
      +	struct strbuf buf_name = STRBUF_INIT;
     ++	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
     ++
     ++	if (thread_id)
     ++		strbuf_addf(&buf_name, "th%02d:", thread_id);
     ++	strbuf_addstr(&buf_name, thread_name);
     ++
     ++	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
     ++	strbuf_release(&buf_name);
     ++
     ++	ctx->thread_id = thread_id;
       
       	/*
       	 * Implicitly "tr2tls_push_self()" to capture the thread's start
      @@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
     + 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
     + 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
       
     - 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
     - 
     +-	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
     +-
      -	strbuf_init(&ctx->thread_name, 0);
     - 	if (ctx->thread_id)
     +-	if (ctx->thread_id)
      -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
      -	strbuf_addstr(&ctx->thread_name, thread_name);
      -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
      -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
     -+		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
     -+	strbuf_addstr(&buf_name, thread_name);
     -+	if (buf_name.len > TR2_MAX_THREAD_NAME)
     -+		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
     -+
     -+	ctx->thread_name = strbuf_detach(&buf_name, NULL);
     - 
     +-
       	pthread_setspecific(tr2tls_key, ctx);
       
     + 	return ctx;
      @@ trace2/tr2_tls.c: void tr2tls_unset_self(void)
       
       	pthread_setspecific(tr2tls_key, NULL);
       
      -	strbuf_release(&ctx->thread_name);
     -+	free(ctx->thread_name);
       	free(ctx->array_us_start);
       	free(ctx);
       }
     @@ trace2/tr2_tls.c: void tr2tls_pop_self(void)
      
       ## trace2/tr2_tls.h ##
      @@
     - #define TR2_MAX_THREAD_NAME (24)
       
     + #include "strbuf.h"
     + 
     +-/*
     +- * Arbitry limit for thread names for column alignment.
     +- */
     +-#define TR2_MAX_THREAD_NAME (24)
     +-
       struct tr2tls_thread_ctx {
      -	struct strbuf thread_name;
     -+	char *thread_name;
       	uint64_t *array_us_start;
       	size_t alloc;
       	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
     + 	int thread_id;
     ++	char thread_name[FLEX_ARRAY];
     + };
     + 
     + /*
     +@@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
     +  * non-zero thread-ids to help distinguish messages from concurrent
     +  * threads.
     +  *
     +- * Truncate the thread name if necessary to help with column alignment
     +- * in printf-style messages.
     +- *
     +  * In this and all following functions the term "self" refers to the
     +  * current thread.
     +  */
  3:  e0c41e1fc78 !  3:  11c8d8cdf1a trace2: defer free of TLS CTX until program exit.
     @@ Metadata
      Author: Jeff Hostetler <jeffhost@microsoft.com>
      
       ## Commit message ##
     -    trace2: defer free of TLS CTX until program exit.
     +    trace2: defer free of thread local storage until program exit.
      
     -    Defer freeing of the Trace2 thread CTX data until program exit.
     -    Create a global list of thread CTX data to own the storage.
     +    Defer freeing of the Trace2 per-thread thread local storage until
     +    program exit.  Create a global list to own them.
      
     -    TLS CTX data is allocated when a thread is created and associated
     -    with that thread.  Previously, that storage was deleted when the
     -    thread exited.  Now we simply disassociate the CTX data from the
     -    thread when it exits and let the global CTX list manage the cleanup.
     +    Trace2 thread local storage data is allocated when a thread is created
     +    and associated with that thread.  Previously, that storage was deleted
     +    when the thread exited.  Now at thread exit, we simply disassociate
     +    the data from the thread and let the global list manage the cleanup.
      
          This will be used by a later commit when we add "counters" and
     -    stopwatch-style "timers" to the Trace2 API.  We will add those
     -    fields to the CTX block and allow threads to efficiently (without
     -    locks) accumulate counter and timer data using TLS.  At program
     -    exit, the main thread can run thru the global list and compute
     -    totals before it frees them.
     +    stopwatch-style "timers" to the Trace2 API.  We will add those fields
     +    to the thread local storage block and allow each thread to efficiently
     +    (without locks) accumulate counter and timer data.  At program exit,
     +    the main thread will run thru the global list and compute and report
     +    totals before freeing the list.
      
          Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
      
     @@ trace2/tr2_tls.c: static uint64_t tr2tls_us_start_process;
       static pthread_mutex_t tr2tls_mutex;
       static pthread_key_t tr2tls_key;
       
     --static int tr2_next_thread_id; /* modify under lock */
      +/*
      + * This list owns all of the thread-specific CTX data.
      + *
      + * While a thread is alive it is associated with a CTX (owned by this
      + * list) and that CTX is installed in the thread's TLS data area.
     ++ * When a thread exits, it is disassociated from its CTX, but the (now
     ++ * dormant) CTX is held in this list until program exit.
      + *
      + * Similarly, `tr2tls_thread_main` points to a CTX contained within
      + * this list.
      + */
      +static struct tr2tls_thread_ctx *tr2tls_ctx_list; /* modify under lock */
     + static int tr2_next_thread_id; /* modify under lock */
       
       void tr2tls_start_process_clock(void)
     - {
      @@ trace2/tr2_tls.c: struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
       	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
       	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
       
     --	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
     ++	/*
     ++	 * Link this CTX into the CTX list and make it the head.
     ++	 */
      +	pthread_mutex_lock(&tr2tls_mutex);
     -+	if (tr2tls_ctx_list)
     -+		ctx->thread_id = tr2tls_ctx_list->thread_id + 1;
      +	ctx->next_ctx = tr2tls_ctx_list;
      +	tr2tls_ctx_list = ctx;
      +	pthread_mutex_unlock(&tr2tls_mutex);
     ++
     + 	pthread_setspecific(tr2tls_key, ctx);
       
     - 	if (ctx->thread_id)
     - 		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
     + 	return ctx;
      @@ trace2/tr2_tls.c: int tr2tls_is_main_thread(void)
       
       void tr2tls_unset_self(void)
     @@ trace2/tr2_tls.c: int tr2tls_is_main_thread(void)
      -
       	pthread_setspecific(tr2tls_key, NULL);
      -
     --	free(ctx->thread_name);
      -	free(ctx->array_us_start);
      -	free(ctx);
       }
     @@ trace2/tr2_tls.c: void tr2tls_init(void)
      +	while (ctx) {
      +		struct tr2tls_thread_ctx *next = ctx->next_ctx;
      +
     -+		free(ctx->thread_name);
      +		free(ctx->array_us_start);
      +		free(ctx);
      +
     @@ trace2/tr2_tls.c: void tr2tls_init(void)
      
       ## trace2/tr2_tls.h ##
      @@
     - #define TR2_MAX_THREAD_NAME (24)
     + #include "strbuf.h"
       
       struct tr2tls_thread_ctx {
      +	struct tr2tls_thread_ctx *next_ctx;
     - 	char *thread_name;
       	uint64_t *array_us_start;
       	size_t alloc;
     + 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
      @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx *tr2tls_get_self(void);
       int tr2tls_is_main_thread(void);
       
  4:  e5021ab7f58 !  4:  531a1ee45c2 trace2: add thread-name override to event target
     @@ Metadata
       ## Commit message ##
          trace2: add thread-name override to event target
      
     -    Teach the Trace2 event target to allow the thread-name field to
     -    be specified rather than always inherited from the TLS CTX.
     +    Teach the message formatter in the Trace2 event target to take an
     +    optional thread-name argument.  This overrides the thread name
     +    inherited from the thread local storage data.
      
          This will be used in a future commit for global events that should
          not be tied to a particular thread, such as a global stopwatch timer
  5:  51f53633889 !  5:  82c445b75f1 trace2: add thread-name override to perf target
     @@ Metadata
       ## Commit message ##
          trace2: add thread-name override to perf target
      
     -    Teach the Trace2 perf target to allow the thread-name field be
     -    specified rather than always inherited from the TLS CTX.
     +    Teach the message formatter in the Trace2 perf target to accept an
     +    optional thread name argument.  This will override the thread name
     +    inherited from the thread local storage data block.
      
          This will be used in a future commit for global events that should
          not be tied to a particular thread, such as a global stopwatch timer.
     @@ trace2/tr2_tgt_perf.c: static void perf_fmt_prepare(const char *event_name,
      @@ trace2/tr2_tgt_perf.c: static void perf_fmt_prepare(const char *event_name,
       
       	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
     - 	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
     --		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
     -+		    thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
     - 		    event_name);
     + 	strbuf_addf(buf, "%-*.*s | %-*s | ", TR2FMT_PERF_MAX_THREAD_NAME,
     +-		    TR2FMT_PERF_MAX_THREAD_NAME, ctx->thread_name,
     ++		    TR2FMT_PERF_MAX_THREAD_NAME, thread_name,
     + 		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
       
       	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
      @@ trace2/tr2_tgt_perf.c: static void perf_io_write_fl(const char *file, int line, const char *event_name,
  6:  c5d5ff05e6c !  6:  62a5c8b0356 trace2: add timer events to perf and event target formats
     @@ Commit message
          Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
      
       ## Documentation/technical/api-trace2.txt ##
     -@@ Documentation/technical/api-trace2.txt: only present on the "start" and "atexit" events.
     - {
     - 	"event":"version",
     - 	...
     --	"evt":"3",		       # EVENT format version
     -+	"evt":"4",		       # EVENT format version
     - 	"exe":"2.20.1.155.g426c96fcdb" # git version
     - }
     - ------------
      @@ Documentation/technical/api-trace2.txt: The "value" field may be an integer or a string.
       }
       ------------
     @@ Documentation/technical/api-trace2.txt: The "value" field may be an integer or a
      +	...
      +	"name":"test",      # timer name
      +	"count":42,         # number of start+stop intervals
     -+	"t_total":1.234,    # sum of all intervals (by thread or globally)
     -+	"t_min":0.1,        # shortest interval
     -+	"t_max":0.9,        # longest interval
     ++	"ns_total":1234,    # sum of all intervals in nanoseconds
     ++	"ns_min":11,        # shortest interval in nanoseconds
     ++	"ns_max":789,       # longest interval in nanoseconds
      +}
      +------------
      ++
     @@ trace2/tr2_tgt.h: typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int
       					 const char *fmt, va_list ap);
       
      +/*
     -+ * Stopwatch timer event.  This function writes the previously accumlated
     ++ * Stopwatch timer event.  This function writes the previously accumulated
      + * stopwatch timer values to the event streams.  Unlike other Trace2 API
      + * events, this is decoupled from the data collection.
      + *
      + * This does not take a (file,line) pair because a timer event reports
     -+ * the cummulative time spend in the timer over a series of intervals
     ++ * the cumulative time spend in the timer over a series of intervals
      + * -- it does not represent a single usage (like region or data events
      + * do).
      + *
      + * The thread name is optional.  If non-null it will override the
     -+ * value inherited from the caller's TLS CTX.  This allows data
     -+ * for global timers to be reported without associating it with a
     -+ * single thread.
     ++ * value inherited from the caller's thread local storage.  This
     ++ * allows timer data to be aggregated and reported without associating
     ++ * it to a specific thread.
      + */
      +typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
      +				  const char *thread_name,
      +				  const char *category,
      +				  const char *timer_name,
      +				  uint64_t interval_count,
     -+				  uint64_t us_total_time,
     -+				  uint64_t us_min_time,
     -+				  uint64_t us_max_time);
     ++				  uint64_t ns_total_time,
     ++				  uint64_t ns_min_time,
     ++				  uint64_t ns_max_time);
      +
       /*
        * "vtable" for a TRACE2 target.  Use NULL if a target does not want
     @@ trace2/tr2_tgt.h: struct tr2_tgt {
       
      
       ## trace2/tr2_tgt_event.c ##
     -@@ trace2/tr2_tgt_event.c: static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
     -  * interpretation of existing events or fields. Smaller changes, such as adding
     -  * a new field to an existing event, do not require an increment to the EVENT
     -  * format version.
     -+ *
     -+ * Verison 1: original version
     -+ * Version 2: added "too_many_files" event
     -+ * Version 3: added "child_ready" event
     -+ * Version 4: added "timer" event
     -  */
     --#define TR2_EVENT_VERSION "3"
     -+#define TR2_EVENT_VERSION "4"
     - 
     - /*
     -  * Region nesting limit for messages written to the event target.
      @@ trace2/tr2_tgt_event.c: static void fn_data_json_fl(const char *file, int line,
       	}
       }
     @@ trace2/tr2_tgt_event.c: static void fn_data_json_fl(const char *file, int line,
      +		     const char *category,
      +		     const char *timer_name,
      +		     uint64_t interval_count,
     -+		     uint64_t us_total_time,
     -+		     uint64_t us_min_time,
     -+		     uint64_t us_max_time)
     ++		     uint64_t ns_total_time,
     ++		     uint64_t ns_min_time,
     ++		     uint64_t ns_max_time)
      +{
      +	const char *event_name = "timer";
      +	struct json_writer jw = JSON_WRITER_INIT;
      +	double t_abs = (double)us_elapsed_absolute / 1000000.0;
      +
     -+	double t_total = (double)us_total_time / 1000000.0;
     -+	double t_min   = (double)us_min_time   / 1000000.0;
     -+	double t_max   = (double)us_max_time   / 1000000.0;
     -+
      +	jw_object_begin(&jw, 0);
      +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
      +	jw_object_double(&jw, "t_abs", 6, t_abs);
      +	jw_object_string(&jw, "name", timer_name);
      +	jw_object_intmax(&jw, "count", interval_count);
     -+	jw_object_double(&jw, "t_total", 6, t_total);
     -+	jw_object_double(&jw, "t_min", 6, t_min);
     -+	jw_object_double(&jw, "t_max", 6, t_max);
     ++	jw_object_intmax(&jw, "ns_total", ns_total_time);
     ++	jw_object_intmax(&jw, "ns_min", ns_min_time);
     ++	jw_object_intmax(&jw, "ns_max", ns_max_time);
      +
      +	jw_end(&jw);
      +
     @@ trace2/tr2_tgt_perf.c: static void fn_printf_va_fl(const char *file, int line,
      +		     const char *category,
      +		     const char *timer_name,
      +		     uint64_t interval_count,
     -+		     uint64_t us_total_time,
     -+		     uint64_t us_min_time,
     -+		     uint64_t us_max_time)
     ++		     uint64_t ns_total_time,
     ++		     uint64_t ns_min_time,
     ++		     uint64_t ns_max_time)
      +{
      +	const char *event_name = "timer";
      +	struct strbuf buf_payload = STRBUF_INIT;
      +
     -+	double t_total = (double)us_total_time / 1000000.0;
     -+	double t_min   = (double)us_min_time   / 1000000.0;
     -+	double t_max   = (double)us_max_time   / 1000000.0;
     -+
     -+	strbuf_addf(&buf_payload, "name:%s", timer_name);
     -+	strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count);
     -+	strbuf_addf(&buf_payload, " total:%9.6f", t_total);
     -+	strbuf_addf(&buf_payload, " min:%9.6f", t_min);
     -+	strbuf_addf(&buf_payload, " max:%9.6f", t_max);
     ++	strbuf_addf(&buf_payload, ("name:%s"
     ++				   " count:%"PRIu64
     ++				   " ns_total:%"PRIu64
     ++				   " ns_min:%"PRIu64
     ++				   " ns_max:%"PRIu64),
     ++		    timer_name, interval_count, ns_total_time, ns_min_time,
     ++		    ns_max_time);
      +
      +	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
      +			 &us_elapsed_absolute, NULL,
  7:  dd4f0576254 !  7:  36e57a22d70 trace2: add stopwatch timers
     @@ Commit message
          event is logged (one per timer) at program exit.
      
          Optionally, timer data may also be reported by thread for certain
     -    timers.  (See trace2/tr2_tmr.c:tr2tmr_def_block[].)
     +    timers.  (See trace2/tr2_tmr.c:tr2_timer_def_block[].)
      
          Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
      
     @@ Documentation/technical/api-trace2.txt: at offset 508.
      +in some circumstances.
      ++
      +Timers are defined in `enum trace2_timer_id` in trace2.h and in
     -+`trace2/tr2_tmr.c:tr2tmr_def_block[]`.
     ++`trace2/tr2_tmr.c:tr2_timer_def_block[]`.
      ++
      +----------------
      +static void *unpack_compressed_entry(struct packed_git *p,
     @@ Documentation/technical/api-trace2.txt: at offset 508.
      +...
      +$ cat ~/log.perf
      +...
     -+d0 | summary                  | timer        |     |  0.111026 |           | test         | name:test1 count:4 total: 0.000393 min: 0.000006 max: 0.000302
     ++d0 | summary                  | timer        |     |  0.111026 |           | test         | name:test1 count:4 ns_total:393000 ns_min:6000 ns_max:302000
      +d0 | main                     | atexit       |     |  0.111026 |           |              | code:0
      +----------------
      ++
     @@ t/helper/test-trace2.c: static int ut_007bug(int argc, const char **argv)
      +	return NULL;
      +}
      +
     -+
      +/*
      + * Multi-threaded timer test.  Create several threads that each create
      + * several intervals using the TEST2 timer.  The test script can verify
     @@ t/t0211-trace2-perf.sh: test_expect_success 'using global config, perf stream, r
       
      +# Exercise the stopwatch timer "test" in a loop and confirm that it was
      +# we have as many start/stop intervals as expected.  We cannot really test
     -+# the (elapsed, min, max) timer values, so we assume they are good.
     -+#
     ++# the actual (total, min, max) timer values, so we assume they are good,
     ++# but we can test the keys for them.
     ++
     ++have_timer_event () {
     ++	thread=$1
     ++	name=$2
     ++	count=$3
     ++	file=$4
     ++
     ++	pattern="d0|${thread}|timer||_T_ABS_||test"
     ++	pattern="${pattern}|name:${name}"
     ++	pattern="${pattern} count:${count}"
     ++	pattern="${pattern} ns_total:.*"
     ++	pattern="${pattern} ns_min:.*"
     ++	pattern="${pattern} ns_max:.*"
     ++
     ++	grep "${pattern}" ${file}
     ++
     ++	return $?
     ++}
     ++
      +test_expect_success 'test stopwatch timers - summary only' '
      +	test_when_finished "rm trace.perf actual" &&
      +	test_config_global trace2.perfBrief 1 &&
      +	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
      +	test-tool trace2 008timer 5 10 &&
      +	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
     -+	grep "d0|summary|timer||_T_ABS_||test|name:test1 count:5" actual
     ++
     ++	have_timer_event "summary" "test1" 5 actual
      +'
      +
      +test_expect_success 'test stopwatch timers - summary and threads' '
     @@ t/t0211-trace2-perf.sh: test_expect_success 'using global config, perf stream, r
      +	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
      +	test-tool trace2 009timer 5 10 3 &&
      +	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
     -+	grep "d0|th01:ut_009|timer||_T_ABS_||test|name:test2 count:5" actual &&
     -+	grep "d0|th02:ut_009|timer||_T_ABS_||test|name:test2 count:5" actual &&
     -+	grep "d0|th02:ut_009|timer||_T_ABS_||test|name:test2 count:5" actual &&
     -+	grep "d0|summary|timer||_T_ABS_||test|name:test2 count:15" actual
     ++
     ++	have_timer_event "th01:ut_009" "test2" 5 actual &&
     ++	have_timer_event "th02:ut_009" "test2" 5 actual &&
     ++	have_timer_event "th03:ut_009" "test2" 5 actual &&
     ++	have_timer_event "summary" "test2" 15 actual
      +'
      +
       test_done
     @@ t/t0212-trace2-event.sh: test_expect_success 'discard traces when there are too
      +	count=$3
      +	file=$4
      +
     -+	grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file
     ++	pattern="\"event\":\"timer\""
     ++	pattern="${pattern}.*\"thread\":\"${thread}\""
     ++	pattern="${pattern}.*\"name\":\"${name}\""
     ++	pattern="${pattern}.*\"count\":${count}"
     ++	pattern="${pattern}.*\"ns_total\":[0-9]*"
     ++	pattern="${pattern}.*\"ns_min\":[0-9]*"
     ++	pattern="${pattern}.*\"ns_max\":[0-9]*"
     ++
     ++	grep "${pattern}" ${file}
      +
      +	return $?
      +}
     @@ t/t0212-trace2-event.sh: test_expect_success 'discard traces when there are too
      +	test_config_global trace2.eventBrief 1 &&
      +	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
      +	test-tool trace2 008timer 5 10 &&
     ++
      +	have_timer_event "summary" "test1" 5 trace.event
      +'
      +
     @@ t/t0212-trace2-event.sh: test_expect_success 'discard traces when there are too
      +	test_config_global trace2.eventBrief 1 &&
      +	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
      +	test-tool trace2 009timer 5 10 3 &&
     ++
      +	have_timer_event "th01:ut_009" "test2" 5 trace.event &&
      +	have_timer_event "th02:ut_009" "test2" 5 trace.event &&
      +	have_timer_event "th03:ut_009" "test2" 5 trace.event &&
     @@ trace2.c: static void tr2_tgt_disable_builtins(void)
      +{
      +	struct tr2_tgt *tgt_j;
      +	int j;
     -+	struct tr2tmr_block merged;
     ++	struct tr2_timer_block merged = { { { 0 } } };
      +
     -+	memset(&merged, 0, sizeof(merged));
     -+
     -+	/*
     -+	 * Sum across all of the per-thread stopwatch timer data into
     -+	 * a single composite block of timer values.
     -+	 */
     -+	tr2tls_aggregate_timer_blocks(&merged);
     ++	tr2_summarize_timers(&merged);
      +
      +	/*
      +	 * Emit "summary" timer events for each composite timer value
     @@ trace2.c: static void tr2_tgt_disable_builtins(void)
      +	 */
      +	for_each_wanted_builtin (j, tgt_j)
      +		if (tgt_j->pfn_timer)
     -+			tr2tmr_emit_block(tgt_j->pfn_timer,
     -+					  us_elapsed_absolute,
     -+					  &merged, "summary");
     ++			tr2_emit_timer_block(tgt_j->pfn_timer,
     ++					     us_elapsed_absolute,
     ++					     &merged, "summary");
      +}
      +
      +static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
     @@ trace2.c: static void tr2_tgt_disable_builtins(void)
      +
      +	for_each_wanted_builtin (j, tgt_j)
      +		if (tgt_j->pfn_timer)
     -+			tr2tls_emit_timer_blocks_by_thread(tgt_j->pfn_timer,
     -+							   us_elapsed_absolute);
     ++			tr2_emit_timers_by_thread(tgt_j->pfn_timer,
     ++						  us_elapsed_absolute);
      +}
      +
       static int tr2main_exit_code;
     @@ trace2.c: const char *trace2_session_id(void)
      +		return;
      +
      +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
     -+		BUG("invalid timer id: %d", tid);
     ++		BUG("trace2_timer_start: invalid timer id: %d", tid);
      +
     -+	tr2tmr_start(tid);
     ++	tr2_start_timer(tid);
      +}
      +
      +void trace2_timer_stop(enum trace2_timer_id tid)
     @@ trace2.c: const char *trace2_session_id(void)
      +		return;
      +
      +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
     -+		BUG("invalid timer id: %d", tid);
     ++		BUG("trace2_timer_stop: invalid timer id: %d", tid);
      +
     -+	tr2tmr_stop(tid);
     ++	tr2_stop_timer(tid);
      +}
      
       ## trace2.h ##
     @@ trace2.h: void trace2_collect_process_info(enum trace2_process_info_reason reaso
      + * elsewhere as array indexes).
      + *
      + * Any values added to this enum must also be added to the timer definitions
     -+ * array.  See `trace2/tr2_tmr.c:tr2tmr_def_block[]`.
     ++ * array.  See `trace2/tr2_tmr.c:tr2_timer_def_block[]`.
      + */
      +enum trace2_timer_id {
      +	/*
     @@ trace2/tr2_tls.c: int tr2tls_locked_increment(int *p)
       	return current_value;
       }
      +
     -+void tr2tls_aggregate_timer_blocks(struct tr2tmr_block *merged)
     ++void tr2_summarize_timers(struct tr2_timer_block *merged)
      +{
      +	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
      +
      +	while (ctx) {
      +		struct tr2tls_thread_ctx *next = ctx->next_ctx;
      +
     -+		tr2tmr_aggregate_timers(merged, &ctx->timers);
     ++		tr2_merge_timer_block(merged, &ctx->timers);
      +
      +		ctx = next;
      +	}
      +}
      +
     -+void tr2tls_emit_timer_blocks_by_thread(tr2_tgt_evt_timer_t *pfn,
     -+					uint64_t us_elapsed_absolute)
     ++void tr2_emit_timers_by_thread(tr2_tgt_evt_timer_t *pfn,
     ++			       uint64_t us_elapsed_absolute)
      +{
      +	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
      +
      +	while (ctx) {
      +		struct tr2tls_thread_ctx *next = ctx->next_ctx;
      +
     -+		tr2tmr_emit_block(pfn, us_elapsed_absolute, &ctx->timers,
     -+				  ctx->thread_name);
     ++		tr2_emit_timer_block(pfn, us_elapsed_absolute, &ctx->timers,
     ++				     ctx->thread_name);
      +
      +		ctx = next;
      +	}
     @@ trace2/tr2_tls.h
       #include "strbuf.h"
      +#include "trace2/tr2_tmr.h"
       
     - /*
     -  * Arbitry limit for thread names for column alignment.
     + struct tr2tls_thread_ctx {
     + 	struct tr2tls_thread_ctx *next_ctx;
      @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
       	size_t alloc;
       	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
       	int thread_id;
      +
     -+	struct tr2tmr_block timers;
     ++	struct tr2_timer_block timers;
     ++
     + 	char thread_name[FLEX_ARRAY];
       };
       
      +/*
     -+ * Iterate over the global list of TLS CTX data and aggregate the timer
     -+ * data into the given timer block.
     ++ * Iterate over the global list of threads and aggregate the timer
     ++ * data into the given timer block.  The resulting block will contain
     ++ * the global summary of timer usage.
      + */
     -+void tr2tls_aggregate_timer_blocks(struct tr2tmr_block *merged);
     ++void tr2_summarize_timers(struct tr2_timer_block *merged);
      +
      +/*
     -+ * Iterate over the global list of TLS CTX data (the complete set of
     -+ * threads that have used Trace2 resources) data and emit "per-thread"
     -+ * timer data for each.
     ++ * Iterate over the global list of threads and emit "per-thread"
     ++ * timer data.
      + */
     -+void tr2tls_emit_timer_blocks_by_thread(tr2_tgt_evt_timer_t *pfn,
     -+					uint64_t us_elapsed_absolute);
     ++void tr2_emit_timers_by_thread(tr2_tgt_evt_timer_t *pfn,
     ++			       uint64_t us_elapsed_absolute);
      +
       /*
        * Create TLS data for the current thread.  This gives us a place to
     @@ trace2/tr2_tmr.c (new)
      +#include "trace2/tr2_tls.h"
      +#include "trace2/tr2_tmr.h"
      +
     ++#define MY_MAX(a, b) ((a) > (b) ? (a) : (b))
     ++#define MY_MIN(a, b) ((a) < (b) ? (a) : (b))
     ++
      +/*
      + * Define metadata for each stopwatch timer.  This list must match the
      + * set defined in "enum trace2_timer_id".
      + */
     -+struct tr2tmr_def {
     ++struct tr2_timer_def {
      +	const char *category;
      +	const char *name;
      +
      +	unsigned int want_thread_events:1;
      +};
      +
     -+static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = {
     ++static struct tr2_timer_def tr2_timer_def_block[TRACE2_NUMBER_OF_TIMERS] = {
      +	[TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 },
      +	[TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 },
      +};
      +
     -+void tr2tmr_start(enum trace2_timer_id tid)
     ++void tr2_start_timer(enum trace2_timer_id tid)
      +{
      +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
     -+	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
     ++	struct tr2_timer *t = &ctx->timers.timer[tid];
      +
      +	t->recursion_count++;
      +	if (t->recursion_count > 1)
      +		return; /* ignore recursive starts */
      +
     -+	t->start_us = getnanotime() / 1000;
     ++	t->start_ns = getnanotime();
      +}
      +
     -+void tr2tmr_stop(enum trace2_timer_id tid)
     ++void tr2_stop_timer(enum trace2_timer_id tid)
      +{
      +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
     -+	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
     -+	uint64_t us_now;
     -+	uint64_t us_interval;
     ++	struct tr2_timer *t = &ctx->timers.timer[tid];
     ++	uint64_t ns_now;
     ++	uint64_t ns_interval;
      +
      +	assert(t->recursion_count > 0);
      +
      +	t->recursion_count--;
     -+	if (t->recursion_count > 0)
     -+		return; /* still in recursive call */
     ++	if (t->recursion_count)
     ++		return; /* still in recursive call(s) */
      +
     -+	us_now = getnanotime() / 1000;
     -+	us_interval = us_now - t->start_us;
     ++	ns_now = getnanotime();
     ++	ns_interval = ns_now - t->start_ns;
      +
     -+	t->total_us += us_interval;
     ++	t->total_ns += ns_interval;
      +
     ++	/*
     ++	 * min_ns was initialized to zero (in the xcalloc()) rather
     ++	 * than "(unsigned)-1" when the block of timers was allocated,
     ++	 * so we should always set both the min_ns and max_ns values
     ++	 * the first time that the timer is used.
     ++	 */
      +	if (!t->interval_count) {
     -+		t->min_us = us_interval;
     -+		t->max_us = us_interval;
     ++		t->min_ns = ns_interval;
     ++		t->max_ns = ns_interval;
      +	} else {
     -+		if (us_interval < t->min_us)
     -+			t->min_us = us_interval;
     -+		if (us_interval > t->max_us)
     -+			t->max_us = us_interval;
     ++		t->min_ns = MY_MIN(ns_interval, t->min_ns);
     ++		t->max_ns = MY_MAX(ns_interval, t->max_ns);
      +	}
      +
      +	t->interval_count++;
      +}
      +
     -+void tr2tmr_aggregate_timers(struct tr2tmr_block *merged,
     -+			     const struct tr2tmr_block *src)
     ++void tr2_merge_timer_block(struct tr2_timer_block *merged,
     ++			   const struct tr2_timer_block *src)
      +{
      +	enum trace2_timer_id tid;
      +
      +	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
     -+		struct tr2tmr_timer *t_merged = &merged->timer[tid];
     -+		const struct tr2tmr_timer *t = &src->timer[tid];
     ++		struct tr2_timer *t_merged = &merged->timer[tid];
     ++		const struct tr2_timer *t = &src->timer[tid];
      +
      +		t_merged->is_aggregate = 1;
      +
     -+		if (t->recursion_count > 0) {
     ++		if (t->recursion_count) {
      +			/*
      +			 * A thread exited with a stopwatch running.
      +			 *
     @@ trace2/tr2_tmr.c (new)
      +			 * for the open interval.  I'm going to ignore it
      +			 * and keep going because we may have valid data
      +			 * for previously closed intervals on this timer.
     ++			 *
     ++			 * That is, I'm going to ignore the value of
     ++			 * "now - start_ns".
      +			 */
      +		}
      +
      +		if (!t->interval_count)
      +			continue; /* this timer was not used by this thread. */
      +
     -+		t_merged->total_us += t->total_us;
     ++		t_merged->total_ns += t->total_ns;
      +
      +		if (!t_merged->interval_count) {
     -+			t_merged->min_us = t->min_us;
     -+			t_merged->max_us = t->max_us;
     ++			t_merged->min_ns = t->min_ns;
     ++			t_merged->max_ns = t->max_ns;
      +		} else {
     -+			if (t->min_us < t_merged->min_us)
     -+				t_merged->min_us = t->min_us;
     -+			if (t->max_us > t_merged->max_us)
     -+				t_merged->max_us = t->max_us;
     ++			t_merged->min_ns = MY_MIN(t->min_ns, t_merged->min_ns);
     ++			t_merged->max_ns = MY_MAX(t->max_ns, t_merged->max_ns);
      +		}
      +
      +		t_merged->interval_count += t->interval_count;
     @@ trace2/tr2_tmr.c (new)
      +	merged->is_aggregate = 1;
      +}
      +
     -+void tr2tmr_emit_block(tr2_tgt_evt_timer_t *pfn, uint64_t us_elapsed_absolute,
     -+		       const struct tr2tmr_block *blk, const char *thread_name)
     ++void tr2_emit_timer_block(tr2_tgt_evt_timer_t *pfn,
     ++			  uint64_t us_elapsed_absolute,
     ++			  const struct tr2_timer_block *blk,
     ++			  const char *thread_name)
      +{
      +	enum trace2_timer_id tid;
      +
      +	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
     -+		const struct tr2tmr_timer *t = &blk->timer[tid];
     -+		const struct tr2tmr_def *d = &tr2tmr_def_block[tid];
     ++		const struct tr2_timer *t = &blk->timer[tid];
     ++		const struct tr2_timer_def *d = &tr2_timer_def_block[tid];
      +
      +		if (!t->interval_count)
      +			continue; /* timer was not used */
     @@ trace2/tr2_tmr.c (new)
      +			continue; /* per-thread events not wanted */
      +
      +		pfn(us_elapsed_absolute, thread_name, d->category, d->name,
     -+		    t->interval_count, t->total_us, t->min_us, t->max_us);
     ++		    t->interval_count, t->total_ns, t->min_ns, t->max_ns);
      +	}
      +}
      
     @@ trace2/tr2_tmr.h (new)
      + * dynamically allocating a timer on demand and sharing that
      + * definition with other threads.
      + *
     -+ * Timer values are stored in a fixed size "timer block" inside the
     -+ * TLS CTX.  This allows data to be collected on a thread-by-thread
     -+ * basis without locking.
     ++ * Timer values are stored in a fixed size "timer block" inside thread
     ++ * local storage.  This allows data to be collected on a
     ++ * thread-by-thread basis without locking.
     ++ *
     ++ * Using this "timer block" model costs ~48 bytes per timer per thread
     ++ * (we have about six uint64 fields per timer).  This does increase
     ++ * the size of the thread local storage block, but it is allocated (at
     ++ * thread create time) and not on the thread stack, so I'm not worried
     ++ * about the size.  Using an array of timers in this block gives us
     ++ * constant time access to each timer within each thread, so we don't
     ++ * need to do expensive lookups (like hashmaps) to start/stop a timer.
      + *
      + * We define (at compile time) a set of "timer ids" to access the
     -+ * various timers inside the fixed size "timer block".
     ++ * various timers inside the fixed size "timer block".  See
     ++ * `trace2_timer_id` in `trace2/trace2.h`.
      + *
     -+ * Timer definitions include the Trace2 "category" and similar fields.
     -+ * This eliminates the need to include those args on the various timer
     -+ * APIs.
     ++ * Timer definitions also include "category", "name", and similar
     ++ * fields.  These are defined in a parallel table in `tr2_tmr.c` and
     ++ * eliminate the need to include those args in the various timer APIs.
      + *
      + * Timer results are summarized and emitted by the main thread at
     -+ * program exit by iterating over the global list of CTX data.
     ++ * program exit by iterating over the global list of thread local
     ++ * storage data blocks.
      + */
      +
      +/*
      + * The definition of an individual timer and used by an individual
      + * thread.
      + */
     -+struct tr2tmr_timer {
     ++struct tr2_timer {
      +	/*
     -+	 * Total elapsed time for this timer in this thread.
     ++	 * Total elapsed time for this timer in this thread in nanoseconds.
      +	 */
     -+	uint64_t total_us;
     ++	uint64_t total_ns;
      +
      +	/*
      +	 * The maximum and minimum interval values observed for this
      +	 * timer in this thread.
      +	 */
     -+	uint64_t min_us;
     -+	uint64_t max_us;
     ++	uint64_t min_ns;
     ++	uint64_t max_ns;
      +
      +	/*
      +	 * The value of the clock when this timer was started in this
      +	 * thread.  (Undefined when the timer is not active in this
      +	 * thread.)
      +	 */
     -+	uint64_t start_us;
     ++	uint64_t start_ns;
      +
      +	/*
      +	 * Number of times that this timer has been started and stopped
     @@ trace2/tr2_tmr.h (new)
      +};
      +
      +/*
     -+ * A compile-time fixed-size block of timers to insert into the TLS CTX.
     ++ * A compile-time fixed-size block of timers to insert into thread
     ++ * local storage.
      + *
      + * We use this simple wrapper around the array of timer instances to
      + * avoid C syntax quirks and the need to pass around an additional size_t
      + * argument.
      + */
     -+struct tr2tmr_block {
     -+	struct tr2tmr_timer timer[TRACE2_NUMBER_OF_TIMERS];
     ++struct tr2_timer_block {
     ++	struct tr2_timer timer[TRACE2_NUMBER_OF_TIMERS];
      +
      +	/*
      +	 * Has data from multiple threads been combined into this block.
     @@ trace2/tr2_tmr.h (new)
      + * Private routines used by trace2.c to actually start/stop an individual
      + * timer in the current thread.
      + */
     -+void tr2tmr_start(enum trace2_timer_id tid);
     -+void tr2tmr_stop(enum trace2_timer_id tid);
     ++void tr2_start_timer(enum trace2_timer_id tid);
     ++void tr2_stop_timer(enum trace2_timer_id tid);
      +
      +/*
     -+ * Accumulate timer data from source block into the merged block.
     ++ * Accumulate timer data for all of the individual timers in the source
     ++ * block into the corresponding timers in the merged block.
     ++ *
     ++ * This will aggregate data from one block (from an individual thread)
     ++ * into the merge block.
      + */
     -+void tr2tmr_aggregate_timers(struct tr2tmr_block *merged,
     -+			     const struct tr2tmr_block *src);
     ++void tr2_merge_timer_block(struct tr2_timer_block *merged,
     ++			   const struct tr2_timer_block *src);
      +
      +/*
      + * Send stopwatch data for all of the timers in this block to the
     -+ * target.
     ++ * trace target destination.
      + *
     -+ * This will generate an event record for each timer that had activity
     -+ * during the program's execution.
     ++ * This will generate an event record for each timer in the block that
     ++ * had activity during the program's execution.  (If this is called
     ++ * with a per-thread block, we emit the per-thread data; if called
     ++ * with a aggregate block, we emit summary data.)
      + */
     -+void tr2tmr_emit_block(tr2_tgt_evt_timer_t *pfn, uint64_t us_elapsed_absolute,
     -+		       const struct tr2tmr_block *blk, const char *thread_name);
     ++void tr2_emit_timer_block(tr2_tgt_evt_timer_t *pfn,
     ++			  uint64_t us_elapsed_absolute,
     ++			  const struct tr2_timer_block *blk,
     ++			  const char *thread_name);
      +
      +#endif /* TR2_TM_H */
  8:  3e39c8172f5 !  8:  0ef23190759 trace2: add counter events to perf and event target formats
     @@ Documentation/technical/api-trace2.txt: may exceed the "atexit" elapsed time of
      
       ## trace2/tr2_tgt.h ##
      @@ trace2/tr2_tgt.h: typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
     - 				  uint64_t us_min_time,
     - 				  uint64_t us_max_time);
     + 				  uint64_t ns_min_time,
     + 				  uint64_t ns_max_time);
       
      +/*
      + * Item counter event.
     @@ trace2/tr2_tgt.h: struct tr2_tgt {
       
      
       ## trace2/tr2_tgt_event.c ##
     -@@ trace2/tr2_tgt_event.c: static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
     -  * Verison 1: original version
     -  * Version 2: added "too_many_files" event
     -  * Version 3: added "child_ready" event
     -- * Version 4: added "timer" event
     -+ * Version 4: added "timer" and "counter" events
     -  */
     - #define TR2_EVENT_VERSION "4"
     - 
      @@ trace2/tr2_tgt_event.c: static void fn_timer(uint64_t us_elapsed_absolute,
       	jw_release(&jw);
       }
     @@ trace2/tr2_tgt_perf.c: static void fn_timer(uint64_t us_elapsed_absolute,
      +	const char *event_name = "counter";
      +	struct strbuf buf_payload = STRBUF_INIT;
      +
     -+	strbuf_addf(&buf_payload, "name:%s", counter_name);
     -+	strbuf_addf(&buf_payload, " value:%"PRIu64, value);
     ++	strbuf_addf(&buf_payload, "name:%s value:%"PRIu64, counter_name, value);
      +
      +	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
      +			 &us_elapsed_absolute, NULL,
  9:  596caede216 !  9:  4d6155e4e4c trace2: add global counters
     @@ Documentation/technical/api-trace2.txt: d0 | main                     | atexit
      +	int i;
      +	unsigned long src_offset = start_offset;
      +
     -+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, nr);
     ++	trace2_counter_increment(TRACE2_COUNTER_ID_TEST2, nr);
      +
      +	for (i = offset; i < offset + nr; i++) {
      +		...
     @@ Makefile: LIB_OBJS += trace.o
       LIB_OBJS += trace2/tr2_tmr.o
      
       ## t/helper/test-trace2.c ##
     +@@ t/helper/test-trace2.c: static int ut_009timer(int argc, const char **argv)
     + 	const char *usage_error =
     + 		"expect <count> <ms_delay> <threads>";
     + 
     +-	struct ut_009_data data = { 0, 0 };
     ++	struct ut_009_data data = { 0 };
     + 	int nr_threads = 0;
     + 	int k;
     + 	pthread_t *pids = NULL;
      @@ t/helper/test-trace2.c: static int ut_009timer(int argc, const char **argv)
       	return 0;
       }
     @@ t/helper/test-trace2.c: static struct unit_test ut_table[] = {
      
       ## t/t0211-trace2-perf.sh ##
      @@ t/t0211-trace2-perf.sh: test_expect_success 'test stopwatch timers - summary and threads' '
     - 	grep "d0|summary|timer||_T_ABS_||test|name:test2 count:15" actual
     + 	have_timer_event "summary" "test2" 15 actual
       '
       
      +# Exercise the global counter "test" in a loop and confirm that we get an
      +# event with the sum.
      +#
     ++
     ++have_counter_event () {
     ++	thread=$1
     ++	name=$2
     ++	value=$3
     ++	file=$4
     ++
     ++	pattern="d0|${thread}|counter||_T_ABS_||test"
     ++	pattern="${pattern}|name:${name}"
     ++	pattern="${pattern} value:${value}"
     ++
     ++	grep "${pattern}" ${file}
     ++
     ++	return $?
     ++}
     ++
      +test_expect_success 'test global counters - global, single-thead' '
      +	test_when_finished "rm trace.perf actual" &&
      +	test_config_global trace2.perfBrief 1 &&
      +	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
      +	test-tool trace2 010counter 2 3 5 7 11 13  &&
      +	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
     -+	grep "d0|summary|counter||_T_ABS_||test|name:test1 value:41" actual
     ++
     ++	have_counter_event "summary" "test1" 41 actual
      +'
      +
      +test_expect_success 'test global counters - global+threads' '
     @@ t/t0211-trace2-perf.sh: test_expect_success 'test stopwatch timers - summary and
      +	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
      +	test-tool trace2 011counter 5 10 3 &&
      +	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
     -+	grep "d0|th01:ut_011|counter||_T_ABS_||test|name:test2 value:15" actual &&
     -+	grep "d0|th02:ut_011|counter||_T_ABS_||test|name:test2 value:15" actual &&
     -+	grep "d0|th02:ut_011|counter||_T_ABS_||test|name:test2 value:15" actual &&
     -+	grep "d0|summary|counter||_T_ABS_||test|name:test2 value:45" actual
     ++
     ++	have_counter_event "th01:ut_011" "test2" 15 actual &&
     ++	have_counter_event "th02:ut_011" "test2" 15 actual &&
     ++	have_counter_event "th03:ut_011" "test2" 15 actual &&
     ++	have_counter_event "summary" "test2" 45 actual
      +'
      +
       test_done
     @@ t/t0212-trace2-event.sh: test_expect_success 'test stopwatch timers - global+thr
      +	value=$3
      +	file=$4
      +
     -+	grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file
     ++	pattern="\"event\":\"counter\""
     ++	pattern="${pattern}.*\"thread\":\"${thread}\""
     ++	pattern="${pattern}.*\"name\":\"${name}\""
     ++	pattern="${pattern}.*\"value\":${value}"
     ++
     ++	grep "${pattern}" ${file}
      +
      +	return $?
      +}
     @@ t/t0212-trace2-event.sh: test_expect_success 'test stopwatch timers - global+thr
      +	test_config_global trace2.eventBrief 1 &&
      +	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
      +	test-tool trace2 010counter 2 3 5 7 11 13 &&
     ++
      +	have_counter_event "summary" "test1" 41 trace.event
      +'
      +
     @@ t/t0212-trace2-event.sh: test_expect_success 'test stopwatch timers - global+thr
      +	test_config_global trace2.eventBrief 1 &&
      +	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
      +	test-tool trace2 011counter 5 10 3 &&
     ++
      +	have_counter_event "th01:ut_011" "test2" 15 trace.event &&
      +	have_counter_event "th02:ut_011" "test2" 15 trace.event &&
      +	have_counter_event "th03:ut_011" "test2" 15 trace.event &&
     @@ trace2.c
       #include "trace2/tr2_sid.h"
       #include "trace2/tr2_sysenv.h"
      @@ trace2.c: static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
     - 							   us_elapsed_absolute);
     + 						  us_elapsed_absolute);
       }
       
      +static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute)
      +{
      +	struct tr2_tgt *tgt_j;
      +	int j;
     -+	struct tr2ctr_block merged;
     -+
     -+	memset(&merged, 0, sizeof(merged));
     ++	struct tr2_counter_block merged = { { { 0 } } };
      +
      +	/*
      +	 * Sum across all of the per-thread counter data into
     @@ trace2.c: static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
      +	 */
      +	for_each_wanted_builtin (j, tgt_j)
      +		if (tgt_j->pfn_counter)
     -+			tr2ctr_emit_block(tgt_j->pfn_counter,
     -+					  us_elapsed_absolute,
     -+					  &merged, "summary");
     ++			tr2_emit_counter_block(tgt_j->pfn_counter,
     ++					       us_elapsed_absolute,
     ++					       &merged, "summary");
      +}
      +
      +static void tr2main_emit_thread_counters(uint64_t us_elapsed_absolute)
     @@ trace2.c: static void tr2main_atexit_handler(void)
       			tgt_j->pfn_atexit(us_elapsed_absolute,
      @@ trace2.c: void trace2_timer_stop(enum trace2_timer_id tid)
       
     - 	tr2tmr_stop(tid);
     + 	tr2_stop_timer(tid);
       }
      +
      +void trace2_counter_add(enum trace2_counter_id cid, uint64_t value)
     @@ trace2.c: void trace2_timer_stop(enum trace2_timer_id tid)
      +	if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS)
      +		BUG("invalid counter id: %d", cid);
      +
     -+	tr2ctr_add(cid, value);
     ++	tr2_counter_increment(cid, value);
      +}
      
       ## trace2.h ##
     @@ trace2.h: enum trace2_timer_id {
      + * as array indexes).
      + *
      + * Any value added to this enum must also be added to the counter
     -+ * definitions array.  See `trace2/tr2_ctr.c:tr2ctr_def_block[]`.
     ++ * definitions array.  See `trace2/tr2_ctr.c:tr2_counter_def_block[]`.
      + */
      +enum trace2_counter_id {
      +	/*
     @@ trace2/tr2_ctr.c (new)
      + * Define metadata for each global counter.  This list must match the
      + * set defined in "enum trace2_counter_id".
      + */
     -+struct tr2ctr_def {
     ++struct tr2_counter_def {
      +	const char *category;
      +	const char *name;
      +
      +	unsigned int want_thread_events:1;
      +};
      +
     -+static struct tr2ctr_def tr2ctr_def_block[TRACE2_NUMBER_OF_COUNTERS] = {
     ++static struct tr2_counter_def tr2_counter_def_block[TRACE2_NUMBER_OF_COUNTERS] = {
      +	[TRACE2_COUNTER_ID_TEST1] = { "test", "test1", 0 },
      +	[TRACE2_COUNTER_ID_TEST2] = { "test", "test2", 1 },
      +};
      +
     -+void tr2ctr_add(enum trace2_counter_id cid, uint64_t value)
     ++void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value)
      +{
      +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
     -+	struct tr2ctr_counter *c = &ctx->counters.counter[cid];
     ++	struct tr2_counter *c = &ctx->counters.counter[cid];
      +
      +	c->value += value;
      +}
      +
     -+void tr2ctr_aggregate_counters(struct tr2ctr_block *merged,
     -+			       const struct tr2ctr_block *src)
     ++void tr2_merge_counter_block(struct tr2_counter_block *merged,
     ++			     const struct tr2_counter_block *src)
      +{
      +	enum trace2_counter_id cid;
      +
      +	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
     -+		struct tr2ctr_counter *c_merged = &merged->counter[cid];
     -+		const struct tr2ctr_counter *c = &src->counter[cid];
     ++		struct tr2_counter *c_merged = &merged->counter[cid];
     ++		const struct tr2_counter *c = &src->counter[cid];
      +
      +		c_merged->is_aggregate = 1;
      +
     @@ trace2/tr2_ctr.c (new)
      +	merged->is_aggregate = 1;
      +}
      +
     -+void tr2ctr_emit_block(tr2_tgt_evt_counter_t *pfn, uint64_t us_elapsed_absolute,
     -+		       const struct tr2ctr_block *blk, const char *thread_name)
     ++void tr2_emit_counter_block(tr2_tgt_evt_counter_t *pfn,
     ++			    uint64_t us_elapsed_absolute,
     ++			    const struct tr2_counter_block *blk,
     ++			    const char *thread_name)
      +{
      +	enum trace2_counter_id cid;
      +
      +	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
     -+		const struct tr2ctr_counter *c = &blk->counter[cid];
     -+		const struct tr2ctr_def *d = &tr2ctr_def_block[cid];
     ++		const struct tr2_counter *c = &blk->counter[cid];
     ++		const struct tr2_counter_def *d = &tr2_counter_def_block[cid];
      +
      +		if (!c->value)
      +			continue; /* counter was not used */
     @@ trace2/tr2_ctr.h (new)
      + *
      + * A counter defintion table provides the counter category and name
      + * so we can eliminate those arguments from the public counter API.
     ++ * These are defined in a parallel tabel in `tr2_ctr.c`.
      + *
     -+ * Each active thread maintains a counter block in its TLS CTX and
     -+ * increments them without locking.  At program exit, the counter
     -+ * blocks from all of the individual CTXs are added together to give
     -+ * the final summary value for the each global counter.
     ++ * Each thread has a private block of counters in its thread local
     ++ * storage data so no locks are required for a thread to increment
     ++ * it's version of the counter.  At program exit, the counter blocks
     ++ * from all of the per-thread counters are added together to give the
     ++ * final summary value for the each global counter.
      + */
      +
      +/*
      + * The definition of an individual counter.
      + */
     -+struct tr2ctr_counter {
     ++struct tr2_counter {
      +	uint64_t value;
      +
      +	unsigned int is_aggregate:1;
     @@ trace2/tr2_ctr.h (new)
      +/*
      + * Compile time fixed block of all defined counters.
      + */
     -+struct tr2ctr_block {
     -+	struct tr2ctr_counter counter[TRACE2_NUMBER_OF_COUNTERS];
     ++struct tr2_counter_block {
     ++	struct tr2_counter counter[TRACE2_NUMBER_OF_COUNTERS];
      +
      +	unsigned int is_aggregate:1;
      +};
     @@ trace2/tr2_ctr.h (new)
      +/*
      + * Add "value" to the global counter.
      + */
     -+void tr2ctr_add(enum trace2_counter_id cid, uint64_t value);
     ++void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value);
      +
      +/*
      + * Accumulate counter data from the source block into the merged block.
      + */
     -+void tr2ctr_aggregate_counters(struct tr2ctr_block *merged,
     -+			       const struct tr2ctr_block *src);
     ++void tr2_merge_counter_block(struct tr2_counter_block *merged,
     ++			       const struct tr2_counter_block *src);
      +
      +/*
      + * Send counter data for all counters in this block to the target.
      + *
      + * This will generate an event record for each counter that had activity.
      + */
     -+void tr2ctr_emit_block(tr2_tgt_evt_counter_t *pfn, uint64_t us_elapsed_absolute,
     -+		       const struct tr2ctr_block *blk, const char *thread_name);
     ++void tr2_emit_counter_block(tr2_tgt_evt_counter_t *pfn,
     ++			    uint64_t us_elapsed_absolute,
     ++			    const struct tr2_counter_block *blk,
     ++			    const char *thread_name);
      +
      +#endif /* TR2_CTR_H */
      
     @@ trace2/tr2_tls.c
       #include "trace2/tr2_tls.h"
       #include "trace2/tr2_tmr.h"
       
     -@@ trace2/tr2_tls.c: void tr2tls_emit_timer_blocks_by_thread(tr2_tgt_evt_timer_t *pfn,
     +@@ trace2/tr2_tls.c: void tr2_emit_timers_by_thread(tr2_tgt_evt_timer_t *pfn,
       		ctx = next;
       	}
       }
      +
     -+void tr2tls_aggregate_counter_blocks(struct tr2ctr_block *merged)
     ++void tr2tls_aggregate_counter_blocks(struct tr2_counter_block *merged)
      +{
      +	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
      +
      +	while (ctx) {
      +		struct tr2tls_thread_ctx *next = ctx->next_ctx;
      +
     -+		tr2ctr_aggregate_counters(merged, &ctx->counters);
     ++		tr2_merge_counter_block(merged, &ctx->counters);
      +
      +		ctx = next;
      +	}
     @@ trace2/tr2_tls.c: void tr2tls_emit_timer_blocks_by_thread(tr2_tgt_evt_timer_t *p
      +	while (ctx) {
      +		struct tr2tls_thread_ctx *next = ctx->next_ctx;
      +
     -+		tr2ctr_emit_block(pfn, us_elapsed_absolute, &ctx->counters,
     -+				  ctx->thread_name);
     ++		tr2_emit_counter_block(pfn, us_elapsed_absolute, &ctx->counters,
     ++				       ctx->thread_name);
      +
      +		ctx = next;
      +	}
     @@ trace2/tr2_tls.h
      +#include "trace2/tr2_ctr.h"
       #include "trace2/tr2_tmr.h"
       
     - /*
     + struct tr2tls_thread_ctx {
      @@ trace2/tr2_tls.h: struct tr2tls_thread_ctx {
     - 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
       	int thread_id;
       
     -+	struct tr2ctr_block counters;
     - 	struct tr2tmr_block timers;
     + 	struct tr2_timer_block timers;
     ++	struct tr2_counter_block counters;
     + 
     + 	char thread_name[FLEX_ARRAY];
       };
       
      +/*
     -+ * Iterate over the global list of TLS CTX data and aggregate the
     -+ * counter data into the given counter block.
     ++ * Iterate over the global list of threads and aggregate the
     ++ * counter data into the given counter block.  The resulting block
     ++ * will contain the global counter sums.
      + */
     -+void tr2tls_aggregate_counter_blocks(struct tr2ctr_block *merged);
     ++void tr2tls_aggregate_counter_blocks(struct tr2_counter_block *merged);
      +
      +/*
     -+ * Iterate over the global list of TLS CTX data (the complete set of
     -+ * threads that have used Trace2 resources) data and emit "per-thread"
     ++ * Iterate over the global list of threads and emit "per-thread"
      + * counter data for each.
      + */
      +void tr2tls_emit_counter_blocks_by_thread(tr2_tgt_evt_counter_t *pfn,
      +					  uint64_t us_elapsed_absolute);
      +
       /*
     -  * Iterate over the global list of TLS CTX data and aggregate the timer
     -  * data into the given timer block.
     +  * Iterate over the global list of threads and aggregate the timer
     +  * data into the given timer block.  The resulting block will contain

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-29  0:48     ` Ævar Arnfjörð Bjarmason
  2021-12-28 19:36   ` [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array Jeff Hostetler via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().

This was discussed in: https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..a90bd639d48 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -11,8 +11,8 @@
 struct tr2tls_thread_ctx {
 	struct strbuf thread_name;
 	uint64_t *array_us_start;
-	int alloc;
-	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
+	size_t alloc;
+	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
 };
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-29  1:11     ` Ævar Arnfjörð Bjarmason
  2021-12-28 19:36   ` [PATCH v2 3/9] trace2: defer free of thread local storage until program exit Jeff Hostetler via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Move the thread name to a flex array at the bottom of the Trace2
thread local storage data and get rid of the strbuf.

Let the flex array have the full computed value of the thread name
without truncation.

Change the PERF target to truncate the thread name so that the columns
still line up.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  |  7 ++++---
 trace2/tr2_tls.c       | 25 +++++++++++++------------
 trace2/tr2_tls.h       | 10 +---------
 4 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 3a0014417cc..ca48d00aebc 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index e4acca13d64..fd6cce3efe5 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -23,6 +23,7 @@ static int tr2env_perf_be_brief;
 
 #define TR2FMT_PERF_FL_WIDTH (28)
 #define TR2FMT_PERF_MAX_EVENT_NAME (12)
+#define TR2FMT_PERF_MAX_THREAD_NAME (24)
 #define TR2FMT_PERF_REPO_WIDTH (3)
 #define TR2FMT_PERF_CATEGORY_WIDTH (12)
 
@@ -105,9 +106,9 @@ static void perf_fmt_prepare(const char *event_name,
 	}
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
-	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
-		    event_name);
+	strbuf_addf(buf, "%-*.*s | %-*s | ", TR2FMT_PERF_MAX_THREAD_NAME,
+		    TR2FMT_PERF_MAX_THREAD_NAME, ctx->thread_name,
+		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
 	if (repo)
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..ed99a234b95 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void)
 struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start)
 {
-	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
+	struct tr2tls_thread_ctx *ctx;
+	struct strbuf buf_name = STRBUF_INIT;
+	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
+
+	if (thread_id)
+		strbuf_addf(&buf_name, "th%02d:", thread_id);
+	strbuf_addstr(&buf_name, thread_name);
+
+	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
+	strbuf_release(&buf_name);
+
+	ctx->thread_id = thread_id;
 
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
@@ -45,15 +56,6 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
 
-	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
-
-	strbuf_init(&ctx->thread_name, 0);
-	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
-
 	pthread_setspecific(tr2tls_key, ctx);
 
 	return ctx;
@@ -95,7 +97,6 @@ void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +114,7 @@ void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index a90bd639d48..64d97c5ac03 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -3,17 +3,12 @@
 
 #include "strbuf.h"
 
-/*
- * Arbitry limit for thread names for column alignment.
- */
-#define TR2_MAX_THREAD_NAME (24)
-
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
 	uint64_t *array_us_start;
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+	char thread_name[FLEX_ARRAY];
 };
 
 /*
@@ -25,9 +20,6 @@ struct tr2tls_thread_ctx {
  * non-zero thread-ids to help distinguish messages from concurrent
  * threads.
  *
- * Truncate the thread name if necessary to help with column alignment
- * in printf-style messages.
- *
  * In this and all following functions the term "self" refers to the
  * current thread.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 3/9] trace2: defer free of thread local storage until program exit.
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Defer freeing of the Trace2 per-thread thread local storage until
program exit.  Create a global list to own them.

Trace2 thread local storage data is allocated when a thread is created
and associated with that thread.  Previously, that storage was deleted
when the thread exited.  Now at thread exit, we simply disassociate
the data from the thread and let the global list manage the cleanup.

This will be used by a later commit when we add "counters" and
stopwatch-style "timers" to the Trace2 API.  We will add those fields
to the thread local storage block and allow each thread to efficiently
(without locks) accumulate counter and timer data.  At program exit,
the main thread will run thru the global list and compute and report
totals before freeing the list.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tls.c | 38 +++++++++++++++++++++++++++++++-------
 trace2/tr2_tls.h |  3 ++-
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index ed99a234b95..78538d5e522 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -15,6 +15,18 @@ static uint64_t tr2tls_us_start_process;
 static pthread_mutex_t tr2tls_mutex;
 static pthread_key_t tr2tls_key;
 
+/*
+ * This list owns all of the thread-specific CTX data.
+ *
+ * While a thread is alive it is associated with a CTX (owned by this
+ * list) and that CTX is installed in the thread's TLS data area.
+ * When a thread exits, it is disassociated from its CTX, but the (now
+ * dormant) CTX is held in this list until program exit.
+ *
+ * Similarly, `tr2tls_thread_main` points to a CTX contained within
+ * this list.
+ */
+static struct tr2tls_thread_ctx *tr2tls_ctx_list; /* modify under lock */
 static int tr2_next_thread_id; /* modify under lock */
 
 void tr2tls_start_process_clock(void)
@@ -56,6 +68,14 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
 
+	/*
+	 * Link this CTX into the CTX list and make it the head.
+	 */
+	pthread_mutex_lock(&tr2tls_mutex);
+	ctx->next_ctx = tr2tls_ctx_list;
+	tr2tls_ctx_list = ctx;
+	pthread_mutex_unlock(&tr2tls_mutex);
+
 	pthread_setspecific(tr2tls_key, ctx);
 
 	return ctx;
@@ -91,14 +111,7 @@ int tr2tls_is_main_thread(void)
 
 void tr2tls_unset_self(void)
 {
-	struct tr2tls_thread_ctx *ctx;
-
-	ctx = tr2tls_get_self();
-
 	pthread_setspecific(tr2tls_key, NULL);
-
-	free(ctx->array_us_start);
-	free(ctx);
 }
 
 void tr2tls_push_self(uint64_t us_now)
@@ -162,11 +175,22 @@ void tr2tls_init(void)
 
 void tr2tls_release(void)
 {
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
 	tr2tls_unset_self();
 	tr2tls_thread_main = NULL;
 
 	pthread_mutex_destroy(&tr2tls_mutex);
 	pthread_key_delete(tr2tls_key);
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		free(ctx->array_us_start);
+		free(ctx);
+
+		ctx = next;
+	}
 }
 
 int tr2tls_locked_increment(int *p)
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 64d97c5ac03..889010ec1ff 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -4,6 +4,7 @@
 #include "strbuf.h"
 
 struct tr2tls_thread_ctx {
+	struct tr2tls_thread_ctx *next_ctx;
 	uint64_t *array_us_start;
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
@@ -37,7 +38,7 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void);
 int tr2tls_is_main_thread(void);
 
 /*
- * Free our TLS data.
+ * Disassociate thread's TLS CTX data from the thread.
  */
 void tr2tls_unset_self(void);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 4/9] trace2: add thread-name override to event target
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-12-28 19:36   ` [PATCH v2 3/9] trace2: defer free of thread local storage until program exit Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach the message formatter in the Trace2 event target to take an
optional thread-name argument.  This overrides the thread name
inherited from the thread local storage data.

This will be used in a future commit for global events that should
not be tied to a particular thread, such as a global stopwatch timer
that aggregates data from all threads.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_event.c | 59 ++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 28 deletions(-)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index ca48d00aebc..4ce50944298 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -81,14 +81,17 @@ static void fn_term(void)
  */
 static void event_fmt_prepare(const char *event_name, const char *file,
 			      int line, const struct repository *repo,
-			      struct json_writer *jw)
+			      struct json_writer *jw,
+			      const char *thread_name_override)
 {
-	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 	struct tr2_tbuf tb_now;
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name);
+	jw_object_string(jw, "thread",
+			 ((thread_name_override && *thread_name_override)
+			  ? thread_name_override
+			  : tr2tls_get_self()->thread_name));
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
@@ -114,7 +117,7 @@ static void fn_too_many_files_fl(const char *file, int line)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_end(&jw);
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
@@ -127,7 +130,7 @@ static void fn_version_fl(const char *file, int line)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "evt", TR2_EVENT_VERSION);
 	jw_object_string(&jw, "exe", git_version_string);
 	jw_end(&jw);
@@ -147,7 +150,7 @@ static void fn_start_fl(const char *file, int line,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_inline_begin_array(&jw, "argv");
 	jw_array_argv(&jw, argv);
@@ -166,7 +169,7 @@ static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_intmax(&jw, "code", code);
 	jw_end(&jw);
@@ -182,7 +185,7 @@ static void fn_signal(uint64_t us_elapsed_absolute, int signo)
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_intmax(&jw, "signo", signo);
 	jw_end(&jw);
@@ -198,7 +201,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code)
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_intmax(&jw, "code", code);
 	jw_end(&jw);
@@ -231,7 +234,7 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	maybe_add_string_va(&jw, "msg", fmt, ap);
 	/*
 	 * Also emit the format string as a field in case
@@ -253,7 +256,7 @@ static void fn_command_path_fl(const char *file, int line, const char *pathname)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "path", pathname);
 	jw_end(&jw);
 
@@ -268,7 +271,7 @@ static void fn_command_ancestry_fl(const char *file, int line, const char **pare
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_inline_begin_array(&jw, "ancestry");
 
 	while ((parent_name = *parent_names++))
@@ -288,7 +291,7 @@ static void fn_command_name_fl(const char *file, int line, const char *name,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "name", name);
 	if (hierarchy && *hierarchy)
 		jw_object_string(&jw, "hierarchy", hierarchy);
@@ -304,7 +307,7 @@ static void fn_command_mode_fl(const char *file, int line, const char *mode)
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "name", mode);
 	jw_end(&jw);
 
@@ -319,7 +322,7 @@ static void fn_alias_fl(const char *file, int line, const char *alias,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "alias", alias);
 	jw_object_inline_begin_array(&jw, "argv");
 	jw_array_argv(&jw, argv);
@@ -338,7 +341,7 @@ static void fn_child_start_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "child_id", cmd->trace2_child_id);
 	if (cmd->trace2_hook_name) {
 		jw_object_string(&jw, "child_class", "hook");
@@ -371,7 +374,7 @@ static void fn_child_exit_fl(const char *file, int line,
 	double t_rel = (double)us_elapsed_child / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "child_id", cid);
 	jw_object_intmax(&jw, "pid", pid);
 	jw_object_intmax(&jw, "code", code);
@@ -392,7 +395,7 @@ static void fn_child_ready_fl(const char *file, int line,
 	double t_rel = (double)us_elapsed_child / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "child_id", cid);
 	jw_object_intmax(&jw, "pid", pid);
 	jw_object_string(&jw, "ready", ready);
@@ -411,7 +414,7 @@ static void fn_thread_start_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_end(&jw);
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
@@ -427,7 +430,7 @@ static void fn_thread_exit_fl(const char *file, int line,
 	double t_rel = (double)us_elapsed_thread / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_double(&jw, "t_rel", 6, t_rel);
 	jw_end(&jw);
 
@@ -442,7 +445,7 @@ static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "exec_id", exec_id);
 	if (exe)
 		jw_object_string(&jw, "exe", exe);
@@ -463,7 +466,7 @@ static void fn_exec_result_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_intmax(&jw, "exec_id", exec_id);
 	jw_object_intmax(&jw, "code", code);
 	jw_end(&jw);
@@ -479,7 +482,7 @@ static void fn_param_fl(const char *file, int line, const char *param,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, NULL, &jw);
+	event_fmt_prepare(event_name, file, line, NULL, &jw, NULL);
 	jw_object_string(&jw, "param", param);
 	jw_object_string(&jw, "value", value);
 	jw_end(&jw);
@@ -495,7 +498,7 @@ static void fn_repo_fl(const char *file, int line,
 	struct json_writer jw = JSON_WRITER_INIT;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, file, line, repo, &jw);
+	event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 	jw_object_string(&jw, "worktree", repo->worktree);
 	jw_end(&jw);
 
@@ -516,7 +519,7 @@ static void fn_region_enter_printf_va_fl(const char *file, int line,
 		struct json_writer jw = JSON_WRITER_INIT;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
 		if (category)
 			jw_object_string(&jw, "category", category);
@@ -542,7 +545,7 @@ static void fn_region_leave_printf_va_fl(
 		double t_rel = (double)us_elapsed_region / 1000000.0;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_double(&jw, "t_rel", 6, t_rel);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
 		if (category)
@@ -570,7 +573,7 @@ static void fn_data_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 		double t_rel = (double)us_elapsed_region / 1000000.0;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_double(&jw, "t_abs", 6, t_abs);
 		jw_object_double(&jw, "t_rel", 6, t_rel);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
@@ -598,7 +601,7 @@ static void fn_data_json_fl(const char *file, int line,
 		double t_rel = (double)us_elapsed_region / 1000000.0;
 
 		jw_object_begin(&jw, 0);
-		event_fmt_prepare(event_name, file, line, repo, &jw);
+		event_fmt_prepare(event_name, file, line, repo, &jw, NULL);
 		jw_object_double(&jw, "t_abs", 6, t_abs);
 		jw_object_double(&jw, "t_rel", 6, t_rel);
 		jw_object_intmax(&jw, "nesting", ctx->nr_open_regions);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 5/9] trace2: add thread-name override to perf target
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-12-28 19:36   ` [PATCH v2 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-29  1:48     ` Ævar Arnfjörð Bjarmason
  2021-12-28 19:36   ` [PATCH v2 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach the message formatter in the Trace2 perf target to accept an
optional thread name argument.  This will override the thread name
inherited from the thread local storage data block.

This will be used in a future commit for global events that should
not be tied to a particular thread, such as a global stopwatch timer.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 trace2/tr2_tgt_perf.c | 64 +++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 29 deletions(-)

diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index fd6cce3efe5..c008fd08ae8 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -65,9 +65,14 @@ static void perf_fmt_prepare(const char *event_name,
 			     int line, const struct repository *repo,
 			     uint64_t *p_us_elapsed_absolute,
 			     uint64_t *p_us_elapsed_relative,
-			     const char *category, struct strbuf *buf)
+			     const char *category, struct strbuf *buf,
+			     const char *thread_name_override)
 {
 	int len;
+	const char *thread_name =
+		((thread_name_override && *thread_name_override)
+		 ? thread_name_override
+		 : ctx->thread_name);
 
 	strbuf_setlen(buf, 0);
 
@@ -107,7 +112,7 @@ static void perf_fmt_prepare(const char *event_name,
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
 	strbuf_addf(buf, "%-*.*s | %-*s | ", TR2FMT_PERF_MAX_THREAD_NAME,
-		    TR2FMT_PERF_MAX_THREAD_NAME, ctx->thread_name,
+		    TR2FMT_PERF_MAX_THREAD_NAME, thread_name,
 		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
@@ -141,14 +146,15 @@ static void perf_io_write_fl(const char *file, int line, const char *event_name,
 			     uint64_t *p_us_elapsed_absolute,
 			     uint64_t *p_us_elapsed_relative,
 			     const char *category,
-			     const struct strbuf *buf_payload)
+			     const struct strbuf *buf_payload,
+			     const char *thread_name_override)
 {
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 	struct strbuf buf_line = STRBUF_INIT;
 
 	perf_fmt_prepare(event_name, ctx, file, line, repo,
 			 p_us_elapsed_absolute, p_us_elapsed_relative, category,
-			 &buf_line);
+			 &buf_line, thread_name_override);
 	strbuf_addbuf(&buf_line, buf_payload);
 	tr2_dst_write_line(&tr2dst_perf, &buf_line);
 	strbuf_release(&buf_line);
@@ -162,7 +168,7 @@ static void fn_version_fl(const char *file, int line)
 	strbuf_addstr(&buf_payload, git_version_string);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -175,7 +181,7 @@ static void fn_start_fl(const char *file, int line,
 	sq_append_quote_argv_pretty(&buf_payload, argv);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -188,7 +194,7 @@ static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	strbuf_addf(&buf_payload, "code:%d", code);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -200,7 +206,7 @@ static void fn_signal(uint64_t us_elapsed_absolute, int signo)
 	strbuf_addf(&buf_payload, "signo:%d", signo);
 
 	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
+			 &us_elapsed_absolute, NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -212,7 +218,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code)
 	strbuf_addf(&buf_payload, "code:%d", code);
 
 	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
+			 &us_elapsed_absolute, NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -238,7 +244,7 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt,
 	maybe_append_string_va(&buf_payload, fmt, ap);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -250,7 +256,7 @@ static void fn_command_path_fl(const char *file, int line, const char *pathname)
 	strbuf_addstr(&buf_payload, pathname);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -265,7 +271,7 @@ static void fn_command_ancestry_fl(const char *file, int line, const char **pare
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -280,7 +286,7 @@ static void fn_command_name_fl(const char *file, int line, const char *name,
 		strbuf_addf(&buf_payload, " (%s)", hierarchy);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -292,7 +298,7 @@ static void fn_command_mode_fl(const char *file, int line, const char *mode)
 	strbuf_addstr(&buf_payload, mode);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -307,7 +313,7 @@ static void fn_alias_fl(const char *file, int line, const char *alias,
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -343,7 +349,7 @@ static void fn_child_start_fl(const char *file, int line,
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -357,7 +363,7 @@ static void fn_child_exit_fl(const char *file, int line,
 	strbuf_addf(&buf_payload, "[ch%d] pid:%d code:%d", cid, pid, code);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
+			 &us_elapsed_child, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -371,7 +377,7 @@ static void fn_child_ready_fl(const char *file, int line,
 	strbuf_addf(&buf_payload, "[ch%d] pid:%d ready:%s", cid, pid, ready);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
+			 &us_elapsed_child, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -382,7 +388,7 @@ static void fn_thread_start_fl(const char *file, int line,
 	struct strbuf buf_payload = STRBUF_INIT;
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -394,7 +400,7 @@ static void fn_thread_exit_fl(const char *file, int line,
 	struct strbuf buf_payload = STRBUF_INIT;
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_thread, NULL, &buf_payload);
+			 &us_elapsed_thread, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -415,7 +421,7 @@ static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	strbuf_addch(&buf_payload, ']');
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -431,7 +437,7 @@ static void fn_exec_result_fl(const char *file, int line,
 		strbuf_addf(&buf_payload, " err:%s", strerror(code));
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -444,7 +450,7 @@ static void fn_param_fl(const char *file, int line, const char *param,
 	strbuf_addf(&buf_payload, "%s:%s", param, value);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -458,7 +464,7 @@ static void fn_repo_fl(const char *file, int line,
 	sq_quote_buf_pretty(&buf_payload, repo->worktree);
 
 	perf_io_write_fl(file, line, event_name, repo, NULL, NULL, NULL,
-			 &buf_payload);
+			 &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -480,7 +486,7 @@ static void fn_region_enter_printf_va_fl(const char *file, int line,
 	}
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 NULL, category, &buf_payload);
+			 NULL, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -500,7 +506,7 @@ static void fn_region_leave_printf_va_fl(
 	}
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
+			 &us_elapsed_region, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -515,7 +521,7 @@ static void fn_data_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 	strbuf_addf(&buf_payload, "%s:%s", key, value);
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
+			 &us_elapsed_region, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -531,7 +537,7 @@ static void fn_data_json_fl(const char *file, int line,
 	strbuf_addf(&buf_payload, "%s:%s", key, value->json.buf);
 
 	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
+			 &us_elapsed_region, category, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
@@ -545,7 +551,7 @@ static void fn_printf_va_fl(const char *file, int line,
 	maybe_append_string_va(&buf_payload, fmt, ap);
 
 	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
+			 NULL, NULL, &buf_payload, NULL);
 	strbuf_release(&buf_payload);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 6/9] trace2: add timer events to perf and event target formats
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-12-28 19:36   ` [PATCH v2 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach Trace2 "perf" and "event" formats to handle "timer" events for
stopwatch timers.  Update API documentation accordingly.

In a future commit, stopwatch timers will be added to the Trace2 API
and it will emit these "timer" events.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 23 ++++++++++++++++++++
 trace2/tr2_tgt.h                       | 25 ++++++++++++++++++++++
 trace2/tr2_tgt_event.c                 | 29 ++++++++++++++++++++++++++
 trace2/tr2_tgt_normal.c                |  1 +
 trace2/tr2_tgt_perf.c                  | 27 ++++++++++++++++++++++++
 5 files changed, 105 insertions(+)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index bb13ca3db8b..351d140879e 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -815,6 +815,29 @@ The "value" field may be an integer or a string.
 }
 ------------
 
+`"timer"`::
+	This event is generated at the end of the program and contains
+	statistics for a global stopwatch timer.
++
+------------
+{
+	"event":"timer",
+	...
+	"name":"test",      # timer name
+	"count":42,         # number of start+stop intervals
+	"ns_total":1234,    # sum of all intervals in nanoseconds
+	"ns_min":11,        # shortest interval in nanoseconds
+	"ns_max":789,       # longest interval in nanoseconds
+}
+------------
++
+Stopwatch timer data is independently collected by each thread and then
+aggregated for the whole program, so the total time reported here
+may exceed the "atexit" elapsed time of the program.
++
+Timer events may represent an individual thread or a summation across
+the entire program.  Summation events will have a unique thread name.
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index 65f94e15748..a41f91d09b5 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -96,6 +96,30 @@ typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int line,
 					 uint64_t us_elapsed_absolute,
 					 const char *fmt, va_list ap);
 
+/*
+ * Stopwatch timer event.  This function writes the previously accumulated
+ * stopwatch timer values to the event streams.  Unlike other Trace2 API
+ * events, this is decoupled from the data collection.
+ *
+ * This does not take a (file,line) pair because a timer event reports
+ * the cumulative time spend in the timer over a series of intervals
+ * -- it does not represent a single usage (like region or data events
+ * do).
+ *
+ * The thread name is optional.  If non-null it will override the
+ * value inherited from the caller's thread local storage.  This
+ * allows timer data to be aggregated and reported without associating
+ * it to a specific thread.
+ */
+typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
+				  const char *thread_name,
+				  const char *category,
+				  const char *timer_name,
+				  uint64_t interval_count,
+				  uint64_t ns_total_time,
+				  uint64_t ns_min_time,
+				  uint64_t ns_max_time);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -132,6 +156,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_fl_t                   *pfn_data_fl;
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
+	tr2_tgt_evt_timer_t                     *pfn_timer;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 4ce50944298..fe89e80bb1a 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -615,6 +615,34 @@ static void fn_data_json_fl(const char *file, int line,
 	}
 }
 
+static void fn_timer(uint64_t us_elapsed_absolute,
+		     const char *thread_name,
+		     const char *category,
+		     const char *timer_name,
+		     uint64_t interval_count,
+		     uint64_t ns_total_time,
+		     uint64_t ns_min_time,
+		     uint64_t ns_max_time)
+{
+	const char *event_name = "timer";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_abs = (double)us_elapsed_absolute / 1000000.0;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	jw_object_double(&jw, "t_abs", 6, t_abs);
+	jw_object_string(&jw, "name", timer_name);
+	jw_object_intmax(&jw, "count", interval_count);
+	jw_object_intmax(&jw, "ns_total", ns_total_time);
+	jw_object_intmax(&jw, "ns_min", ns_min_time);
+	jw_object_intmax(&jw, "ns_max", ns_max_time);
+
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	&tr2dst_event,
 
@@ -646,4 +674,5 @@ struct tr2_tgt tr2_tgt_event = {
 	fn_data_fl,
 	fn_data_json_fl,
 	NULL, /* printf */
+	fn_timer,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 58d9e430f05..23a7e78dcaa 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -355,4 +355,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	NULL, /* data */
 	NULL, /* data_json */
 	fn_printf_va_fl,
+	NULL, /* timer */
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index c008fd08ae8..c07ffad1a32 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -555,6 +555,32 @@ static void fn_printf_va_fl(const char *file, int line,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_timer(uint64_t us_elapsed_absolute,
+		     const char *thread_name,
+		     const char *category,
+		     const char *timer_name,
+		     uint64_t interval_count,
+		     uint64_t ns_total_time,
+		     uint64_t ns_min_time,
+		     uint64_t ns_max_time)
+{
+	const char *event_name = "timer";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, ("name:%s"
+				   " count:%"PRIu64
+				   " ns_total:%"PRIu64
+				   " ns_min:%"PRIu64
+				   " ns_max:%"PRIu64),
+		    timer_name, interval_count, ns_total_time, ns_min_time,
+		    ns_max_time);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL,
+			 category, &buf_payload, thread_name);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	&tr2dst_perf,
 
@@ -586,4 +612,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	fn_data_fl,
 	fn_data_json_fl,
 	fn_printf_va_fl,
+	fn_timer,
 };
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 7/9] trace2: add stopwatch timers
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-12-28 19:36   ` [PATCH v2 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add a stopwatch timer mechanism to Git.

Timers are an alternative to regions.  Timers can capture a series of
intervals, such as calls to a library routine or a span of code.  They
are intended for code that is not necessarily associated with a
particular phase of the command.

Timer data is accumulated throughout the command and a timer "summary"
event is logged (one per timer) at program exit.

Optionally, timer data may also be reported by thread for certain
timers.  (See trace2/tr2_tmr.c:tr2_timer_def_block[].)

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt |  48 +++++++++
 Makefile                               |   1 +
 t/helper/test-trace2.c                 |  97 +++++++++++++++++
 t/t0211-trace2-perf.sh                 |  46 ++++++++
 t/t0212-trace2-event.sh                |  45 ++++++++
 trace2.c                               |  56 ++++++++++
 trace2.h                               |  42 ++++++++
 trace2/tr2_tls.c                       |  29 ++++++
 trace2/tr2_tls.h                       |  18 ++++
 trace2/tr2_tmr.c                       | 136 ++++++++++++++++++++++++
 trace2/tr2_tmr.h                       | 139 +++++++++++++++++++++++++
 11 files changed, 657 insertions(+)
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 351d140879e..616001bcbb0 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -1230,6 +1230,54 @@ at offset 508.
 This example also shows that thread names are assigned in a racy manner
 as each thread starts and allocates TLS storage.
 
+Timer Events::
+
+	Trace2 also provides global stopwatch timers as an alternative
+	to regions.  These make it possible to measure the time spent
+	in a span of code or a library routine called from many places
+	and not	associated with a single phase of the overall command.
++
+At the end of the program, a single summary timer event is emitted; this
+aggregates timer usage across all threads.  These events have "summary"
+as their thread name.
++
+For some timers, individual (per-thread) timer events are also generated.
+These may be helpful in understanding how work is balanced between threads
+in some circumstances.
++
+Timers are defined in `enum trace2_timer_id` in trace2.h and in
+`trace2/tr2_tmr.c:tr2_timer_def_block[]`.
++
+----------------
+static void *unpack_compressed_entry(struct packed_git *p,
+				    struct pack_window **w_curs,
+				    off_t curpos,
+				    unsigned long size)
+{
+	...
+	trace2_timer_start(TRACE2_TIMER_ID__TEST1);
+	git_inflate_init(&stream);
+	...
+	git_inflate_end(&stream);
+	trace2_timer_stop(TRACE2_TIMER_ID__TEST1);
+	...
+}
+----------------
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+...
+d0 | summary                  | timer        |     |  0.111026 |           | test         | name:test1 count:4 ns_total:393000 ns_min:6000 ns_max:302000
+d0 | main                     | atexit       |     |  0.111026 |           |              | code:0
+----------------
++
+In this example, the "test1" timer was started 4 times and ran for
+0.000393 seconds.
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index ed75ed422b5..8b657f0162a 100644
--- a/Makefile
+++ b/Makefile
@@ -1022,6 +1022,7 @@ LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
+LIB_OBJS += trace2/tr2_tmr.o
 LIB_OBJS += trace2/tr2_sysenv.o
 LIB_OBJS += trace2/tr2_tbuf.o
 LIB_OBJS += trace2/tr2_tgt_event.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f93633f895a..51d022422bf 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -206,6 +206,101 @@ static int ut_007bug(int argc, const char **argv)
 	BUG("the bug message");
 }
 
+/*
+ * Single-threaded timer test.  Create several intervals using the
+ * TEST1 timer.  The test script can verify that an aggregate Trace2
+ * "timer" event is emitted indicating that we started+stopped the
+ * timer the requested number of times.
+ */
+static int ut_008timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay>";
+
+	int count = 0;
+	int delay = 0;
+	int k;
+
+	if (argc != 2)
+		die("%s", usage_error);
+	if (get_i(&count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&delay, argv[1]))
+		die("%s", usage_error);
+
+	for (k = 0; k < count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+		sleep_millisec(delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+	}
+
+	return 0;
+}
+
+struct ut_009_data {
+	int count;
+	int delay;
+};
+
+static void *ut_009timer_thread_proc(void *_ut_009_data)
+{
+	struct ut_009_data *data = _ut_009_data;
+	int k;
+
+	trace2_thread_start("ut_009");
+
+	for (k = 0; k < data->count; k++) {
+		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
+		sleep_millisec(data->delay);
+		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
+	}
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+/*
+ * Multi-threaded timer test.  Create several threads that each create
+ * several intervals using the TEST2 timer.  The test script can verify
+ * that an individual Trace2 "timer" event for each thread and an
+ * aggregate "timer" event are generated.
+ */
+static int ut_009timer(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <count> <ms_delay> <threads>";
+
+	struct ut_009_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.count, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.delay, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_009timer_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -223,6 +318,8 @@ static struct unit_test ut_table[] = {
 	{ ut_005exec,     "005exec",   "<git_command_args>" },
 	{ ut_006data,     "006data",   "[<category> <key> <value>]+" },
 	{ ut_007bug,      "007bug",    "" },
+	{ ut_008timer,    "008timer",  "<count> <ms_delay>" },
+	{ ut_009timer,    "009timer",  "<count> <ms_delay> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 22d0845544e..381c3eea458 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -173,4 +173,50 @@ test_expect_success 'using global config, perf stream, return code 0' '
 	test_cmp expect actual
 '
 
+# Exercise the stopwatch timer "test" in a loop and confirm that it was
+# we have as many start/stop intervals as expected.  We cannot really test
+# the actual (total, min, max) timer values, so we assume they are good,
+# but we can test the keys for them.
+
+have_timer_event () {
+	thread=$1
+	name=$2
+	count=$3
+	file=$4
+
+	pattern="d0|${thread}|timer||_T_ABS_||test"
+	pattern="${pattern}|name:${name}"
+	pattern="${pattern} count:${count}"
+	pattern="${pattern} ns_total:.*"
+	pattern="${pattern} ns_min:.*"
+	pattern="${pattern} ns_max:.*"
+
+	grep "${pattern}" ${file}
+
+	return $?
+}
+
+test_expect_success 'test stopwatch timers - summary only' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 008timer 5 10 &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_timer_event "summary" "test1" 5 actual
+'
+
+test_expect_success 'test stopwatch timers - summary and threads' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 009timer 5 10 3 &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_timer_event "th01:ut_009" "test2" 5 actual &&
+	have_timer_event "th02:ut_009" "test2" 5 actual &&
+	have_timer_event "th03:ut_009" "test2" 5 actual &&
+	have_timer_event "summary" "test2" 15 actual
+'
+
 test_done
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index 6d3374ff773..277688fdbc4 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -323,4 +323,49 @@ test_expect_success 'discard traces when there are too many files' '
 	head -n2 trace_target_dir/git-trace2-discard | tail -n1 | grep \"event\":\"too_many_files\"
 '
 
+# Exercise the stopwatch timer "test" in a loop and confirm that it was
+# we have as many start/stop intervals as expected.  We cannot really test
+# the (t_timer, t_min, t_max) timer values, so we assume they are good.
+#
+
+have_timer_event () {
+	thread=$1
+	name=$2
+	count=$3
+	file=$4
+
+	pattern="\"event\":\"timer\""
+	pattern="${pattern}.*\"thread\":\"${thread}\""
+	pattern="${pattern}.*\"name\":\"${name}\""
+	pattern="${pattern}.*\"count\":${count}"
+	pattern="${pattern}.*\"ns_total\":[0-9]*"
+	pattern="${pattern}.*\"ns_min\":[0-9]*"
+	pattern="${pattern}.*\"ns_max\":[0-9]*"
+
+	grep "${pattern}" ${file}
+
+	return $?
+}
+
+test_expect_success 'test stopwatch timers - global, single-thread' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 008timer 5 10 &&
+
+	have_timer_event "summary" "test1" 5 trace.event
+'
+
+test_expect_success 'test stopwatch timers - global+threads' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 009timer 5 10 3 &&
+
+	have_timer_event "th01:ut_009" "test2" 5 trace.event &&
+	have_timer_event "th02:ut_009" "test2" 5 trace.event &&
+	have_timer_event "th03:ut_009" "test2" 5 trace.event &&
+	have_timer_event "summary" "test2" 15 trace.event
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index b2d471526fd..23289dd6eb4 100644
--- a/trace2.c
+++ b/trace2.c
@@ -13,6 +13,7 @@
 #include "trace2/tr2_sysenv.h"
 #include "trace2/tr2_tgt.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 static int trace2_enabled;
 
@@ -83,6 +84,36 @@ static void tr2_tgt_disable_builtins(void)
 		tgt_j->pfn_term();
 }
 
+static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+	struct tr2_timer_block merged = { { { 0 } } };
+
+	tr2_summarize_timers(&merged);
+
+	/*
+	 * Emit "summary" timer events for each composite timer value
+	 * that had activity.
+	 */
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tr2_emit_timer_block(tgt_j->pfn_timer,
+					     us_elapsed_absolute,
+					     &merged, "summary");
+}
+
+static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_timer)
+			tr2_emit_timers_by_thread(tgt_j->pfn_timer,
+						  us_elapsed_absolute);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -110,6 +141,9 @@ static void tr2main_atexit_handler(void)
 	 */
 	tr2tls_pop_unwind_self();
 
+	tr2main_emit_thread_timers(us_elapsed_absolute);
+	tr2main_emit_summary_timers(us_elapsed_absolute);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -841,3 +875,25 @@ const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
 }
+
+void trace2_timer_start(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_start: invalid timer id: %d", tid);
+
+	tr2_start_timer(tid);
+}
+
+void trace2_timer_stop(enum trace2_timer_id tid)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
+		BUG("trace2_timer_stop: invalid timer id: %d", tid);
+
+	tr2_stop_timer(tid);
+}
diff --git a/trace2.h b/trace2.h
index 0cc7b5f5312..22da5c5516c 100644
--- a/trace2.h
+++ b/trace2.h
@@ -51,6 +51,7 @@ struct json_writer;
  * [] trace2_region*    -- emit region nesting messages.
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
+ * [] trace2_timer*     -- start/stop stopwatch timer (messages are deferred).
  */
 
 /*
@@ -531,4 +532,45 @@ void trace2_collect_process_info(enum trace2_process_info_reason reason);
 
 const char *trace2_session_id(void);
 
+/*
+ * Define the set of stopwatch timers.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we use them
+ * elsewhere as array indexes).
+ *
+ * Any values added to this enum must also be added to the timer definitions
+ * array.  See `trace2/tr2_tmr.c:tr2_timer_def_block[]`.
+ */
+enum trace2_timer_id {
+	/*
+	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
+
+
+	/* Add additional timer definitions before here. */
+	TRACE2_NUMBER_OF_TIMERS
+};
+
+/*
+ * Start/Stop a stopwatch timer in the current thread.
+ *
+ * The time spent in each start/stop interval will be accumulated and
+ * a "timer" event will be emitted when the program exits.
+ *
+ * Note: Since the stopwatch API routines do not generate individual
+ * events, they do not take (file, line) arguments.  Similarly, the
+ * category and timer name values are defined at compile-time in the
+ * timer definitions array, so they are not needed here in the API.
+ */
+void trace2_timer_start(enum trace2_timer_id tid);
+void trace2_timer_stop(enum trace2_timer_id tid);
+
 #endif /* TRACE2_H */
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 78538d5e522..675f6aeef31 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "thread-utils.h"
 #include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
 
 /*
  * Initialize size of the thread stack for nested regions.
@@ -204,3 +205,31 @@ int tr2tls_locked_increment(int *p)
 
 	return current_value;
 }
+
+void tr2_summarize_timers(struct tr2_timer_block *merged)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2_merge_timer_block(merged, &ctx->timers);
+
+		ctx = next;
+	}
+}
+
+void tr2_emit_timers_by_thread(tr2_tgt_evt_timer_t *pfn,
+			       uint64_t us_elapsed_absolute)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2_emit_timer_block(pfn, us_elapsed_absolute, &ctx->timers,
+				     ctx->thread_name);
+
+		ctx = next;
+	}
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 889010ec1ff..72e37beb1e7 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_tmr.h"
 
 struct tr2tls_thread_ctx {
 	struct tr2tls_thread_ctx *next_ctx;
@@ -9,9 +10,26 @@ struct tr2tls_thread_ctx {
 	size_t alloc;
 	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
 	int thread_id;
+
+	struct tr2_timer_block timers;
+
 	char thread_name[FLEX_ARRAY];
 };
 
+/*
+ * Iterate over the global list of threads and aggregate the timer
+ * data into the given timer block.  The resulting block will contain
+ * the global summary of timer usage.
+ */
+void tr2_summarize_timers(struct tr2_timer_block *merged);
+
+/*
+ * Iterate over the global list of threads and emit "per-thread"
+ * timer data.
+ */
+void tr2_emit_timers_by_thread(tr2_tgt_evt_timer_t *pfn,
+			       uint64_t us_elapsed_absolute);
+
 /*
  * Create TLS data for the current thread.  This gives us a place to
  * put per-thread data, such as thread start time, function nesting
diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c
new file mode 100644
index 00000000000..c7edcfd55fb
--- /dev/null
+++ b/trace2/tr2_tmr.c
@@ -0,0 +1,136 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_tmr.h"
+
+#define MY_MAX(a, b) ((a) > (b) ? (a) : (b))
+#define MY_MIN(a, b) ((a) < (b) ? (a) : (b))
+
+/*
+ * Define metadata for each stopwatch timer.  This list must match the
+ * set defined in "enum trace2_timer_id".
+ */
+struct tr2_timer_def {
+	const char *category;
+	const char *name;
+
+	unsigned int want_thread_events:1;
+};
+
+static struct tr2_timer_def tr2_timer_def_block[TRACE2_NUMBER_OF_TIMERS] = {
+	[TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 },
+	[TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 },
+};
+
+void tr2_start_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timers.timer[tid];
+
+	t->recursion_count++;
+	if (t->recursion_count > 1)
+		return; /* ignore recursive starts */
+
+	t->start_ns = getnanotime();
+}
+
+void tr2_stop_timer(enum trace2_timer_id tid)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_timer *t = &ctx->timers.timer[tid];
+	uint64_t ns_now;
+	uint64_t ns_interval;
+
+	assert(t->recursion_count > 0);
+
+	t->recursion_count--;
+	if (t->recursion_count)
+		return; /* still in recursive call(s) */
+
+	ns_now = getnanotime();
+	ns_interval = ns_now - t->start_ns;
+
+	t->total_ns += ns_interval;
+
+	/*
+	 * min_ns was initialized to zero (in the xcalloc()) rather
+	 * than "(unsigned)-1" when the block of timers was allocated,
+	 * so we should always set both the min_ns and max_ns values
+	 * the first time that the timer is used.
+	 */
+	if (!t->interval_count) {
+		t->min_ns = ns_interval;
+		t->max_ns = ns_interval;
+	} else {
+		t->min_ns = MY_MIN(ns_interval, t->min_ns);
+		t->max_ns = MY_MAX(ns_interval, t->max_ns);
+	}
+
+	t->interval_count++;
+}
+
+void tr2_merge_timer_block(struct tr2_timer_block *merged,
+			   const struct tr2_timer_block *src)
+{
+	enum trace2_timer_id tid;
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		struct tr2_timer *t_merged = &merged->timer[tid];
+		const struct tr2_timer *t = &src->timer[tid];
+
+		t_merged->is_aggregate = 1;
+
+		if (t->recursion_count) {
+			/*
+			 * A thread exited with a stopwatch running.
+			 *
+			 * NEEDSWORK: should we assert or throw a warning
+			 * for the open interval.  I'm going to ignore it
+			 * and keep going because we may have valid data
+			 * for previously closed intervals on this timer.
+			 *
+			 * That is, I'm going to ignore the value of
+			 * "now - start_ns".
+			 */
+		}
+
+		if (!t->interval_count)
+			continue; /* this timer was not used by this thread. */
+
+		t_merged->total_ns += t->total_ns;
+
+		if (!t_merged->interval_count) {
+			t_merged->min_ns = t->min_ns;
+			t_merged->max_ns = t->max_ns;
+		} else {
+			t_merged->min_ns = MY_MIN(t->min_ns, t_merged->min_ns);
+			t_merged->max_ns = MY_MAX(t->max_ns, t_merged->max_ns);
+		}
+
+		t_merged->interval_count += t->interval_count;
+	}
+
+	merged->is_aggregate = 1;
+}
+
+void tr2_emit_timer_block(tr2_tgt_evt_timer_t *pfn,
+			  uint64_t us_elapsed_absolute,
+			  const struct tr2_timer_block *blk,
+			  const char *thread_name)
+{
+	enum trace2_timer_id tid;
+
+	for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) {
+		const struct tr2_timer *t = &blk->timer[tid];
+		const struct tr2_timer_def *d = &tr2_timer_def_block[tid];
+
+		if (!t->interval_count)
+			continue; /* timer was not used */
+
+		if (!d->want_thread_events && !t->is_aggregate)
+			continue; /* per-thread events not wanted */
+
+		pfn(us_elapsed_absolute, thread_name, d->category, d->name,
+		    t->interval_count, t->total_ns, t->min_ns, t->max_ns);
+	}
+}
diff --git a/trace2/tr2_tmr.h b/trace2/tr2_tmr.h
new file mode 100644
index 00000000000..1963e6ac475
--- /dev/null
+++ b/trace2/tr2_tmr.h
@@ -0,0 +1,139 @@
+#ifndef TR2_TM_H
+#define TR2_TM_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow "stopwatch" timers.
+ *
+ * Timers can be used to measure "interesting" activity that does not
+ * fit the "region" model, such as code called from many different
+ * regions (like zlib) and/or where data for individual calls are not
+ * interesting or are too numerous to be efficiently logged.
+ *
+ * Timer values are accumulated during program execution and emitted
+ * to the Trace2 logs at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set of
+ * timers and timer ids.  This lets us avoid the complexities of
+ * dynamically allocating a timer on demand and sharing that
+ * definition with other threads.
+ *
+ * Timer values are stored in a fixed size "timer block" inside thread
+ * local storage.  This allows data to be collected on a
+ * thread-by-thread basis without locking.
+ *
+ * Using this "timer block" model costs ~48 bytes per timer per thread
+ * (we have about six uint64 fields per timer).  This does increase
+ * the size of the thread local storage block, but it is allocated (at
+ * thread create time) and not on the thread stack, so I'm not worried
+ * about the size.  Using an array of timers in this block gives us
+ * constant time access to each timer within each thread, so we don't
+ * need to do expensive lookups (like hashmaps) to start/stop a timer.
+ *
+ * We define (at compile time) a set of "timer ids" to access the
+ * various timers inside the fixed size "timer block".  See
+ * `trace2_timer_id` in `trace2/trace2.h`.
+ *
+ * Timer definitions also include "category", "name", and similar
+ * fields.  These are defined in a parallel table in `tr2_tmr.c` and
+ * eliminate the need to include those args in the various timer APIs.
+ *
+ * Timer results are summarized and emitted by the main thread at
+ * program exit by iterating over the global list of thread local
+ * storage data blocks.
+ */
+
+/*
+ * The definition of an individual timer and used by an individual
+ * thread.
+ */
+struct tr2_timer {
+	/*
+	 * Total elapsed time for this timer in this thread in nanoseconds.
+	 */
+	uint64_t total_ns;
+
+	/*
+	 * The maximum and minimum interval values observed for this
+	 * timer in this thread.
+	 */
+	uint64_t min_ns;
+	uint64_t max_ns;
+
+	/*
+	 * The value of the clock when this timer was started in this
+	 * thread.  (Undefined when the timer is not active in this
+	 * thread.)
+	 */
+	uint64_t start_ns;
+
+	/*
+	 * Number of times that this timer has been started and stopped
+	 * in this thread.  (Recursive starts are ignored.)
+	 */
+	size_t interval_count;
+
+	/*
+	 * Number of nested starts on the stack in this thread.  (We
+	 * ignore recursive starts and use this to track the recursive
+	 * calls.)
+	 */
+	unsigned int recursion_count;
+
+	/*
+	 * Has data from multiple threads been combined into this object.
+	 */
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * A compile-time fixed-size block of timers to insert into thread
+ * local storage.
+ *
+ * We use this simple wrapper around the array of timer instances to
+ * avoid C syntax quirks and the need to pass around an additional size_t
+ * argument.
+ */
+struct tr2_timer_block {
+	struct tr2_timer timer[TRACE2_NUMBER_OF_TIMERS];
+
+	/*
+	 * Has data from multiple threads been combined into this block.
+	 */
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * Private routines used by trace2.c to actually start/stop an individual
+ * timer in the current thread.
+ */
+void tr2_start_timer(enum trace2_timer_id tid);
+void tr2_stop_timer(enum trace2_timer_id tid);
+
+/*
+ * Accumulate timer data for all of the individual timers in the source
+ * block into the corresponding timers in the merged block.
+ *
+ * This will aggregate data from one block (from an individual thread)
+ * into the merge block.
+ */
+void tr2_merge_timer_block(struct tr2_timer_block *merged,
+			   const struct tr2_timer_block *src);
+
+/*
+ * Send stopwatch data for all of the timers in this block to the
+ * trace target destination.
+ *
+ * This will generate an event record for each timer in the block that
+ * had activity during the program's execution.  (If this is called
+ * with a per-thread block, we emit the per-thread data; if called
+ * with a aggregate block, we emit summary data.)
+ */
+void tr2_emit_timer_block(tr2_tgt_evt_timer_t *pfn,
+			  uint64_t us_elapsed_absolute,
+			  const struct tr2_timer_block *blk,
+			  const char *thread_name);
+
+#endif /* TR2_TM_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 8/9] trace2: add counter events to perf and event target formats
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-12-28 19:36   ` [PATCH v2 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-28 19:36   ` [PATCH v2 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
  2021-12-29  1:54   ` [PATCH v2 0/9] Trace2 stopwatch timers and " Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach Trace2 "perf" and "event" formats to handle "counter" events
for global counters.  Update the API documentation accordingly.

In a future commit, global counters will be added to the Trace2 API
and it will emit these "counter" events at program exit.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 19 +++++++++++++++++++
 trace2/tr2_tgt.h                       | 14 ++++++++++++++
 trace2/tr2_tgt_event.c                 | 23 +++++++++++++++++++++++
 trace2/tr2_tgt_normal.c                |  1 +
 trace2/tr2_tgt_perf.c                  | 18 ++++++++++++++++++
 5 files changed, 75 insertions(+)

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 616001bcbb0..bdba0f92280 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -838,6 +838,25 @@ may exceed the "atexit" elapsed time of the program.
 Timer events may represent an individual thread or a summation across
 the entire program.  Summation events will have a unique thread name.
 
+`"counter"`::
+	This event is generated at the end of the program and contains
+	the value of a global counter.
++
+------------
+{
+	"event":"counter",
+	...
+	"name":"test",      # counter name
+	"value":42,         # value of the counter
+}
+------------
++
+A global counter can be incremented throughout the execution of the
+program.  It will be reported in a "counter" event just prior to exit.
++
+Counter events may represent an individual thread or a summation across
+the entire program.  Summation events will have a unique thread name.
+
 == Example Trace2 API Usage
 
 Here is a hypothetical usage of the Trace2 API showing the intended
diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h
index a41f91d09b5..66f34b9258f 100644
--- a/trace2/tr2_tgt.h
+++ b/trace2/tr2_tgt.h
@@ -120,6 +120,19 @@ typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
 				  uint64_t ns_min_time,
 				  uint64_t ns_max_time);
 
+/*
+ * Item counter event.
+ *
+ * This also does not take a (file,line) pair.
+ *
+ * The thread name is optional.
+ */
+typedef void(tr2_tgt_evt_counter_t)(uint64_t us_elapsed_absolute,
+				    const char *thread_name,
+				    const char *category,
+				    const char *counter_name,
+				    uint64_t value);
+
 /*
  * "vtable" for a TRACE2 target.  Use NULL if a target does not want
  * to emit that message.
@@ -157,6 +170,7 @@ struct tr2_tgt {
 	tr2_tgt_evt_data_json_fl_t              *pfn_data_json_fl;
 	tr2_tgt_evt_printf_va_fl_t              *pfn_printf_va_fl;
 	tr2_tgt_evt_timer_t                     *pfn_timer;
+	tr2_tgt_evt_counter_t                   *pfn_counter;
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index fe89e80bb1a..907bff80827 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -643,6 +643,28 @@ static void fn_timer(uint64_t us_elapsed_absolute,
 	jw_release(&jw);
 }
 
+static void fn_counter(uint64_t us_elapsed_absolute,
+		       const char *thread_name,
+		       const char *category,
+		       const char *counter_name,
+		       uint64_t value)
+{
+	const char *event_name = "counter";
+	struct json_writer jw = JSON_WRITER_INIT;
+	double t_abs = (double)us_elapsed_absolute / 1000000.0;
+
+	jw_object_begin(&jw, 0);
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	jw_object_double(&jw, "t_abs", 6, t_abs);
+	jw_object_string(&jw, "name", counter_name);
+	jw_object_intmax(&jw, "value", value);
+
+	jw_end(&jw);
+
+	tr2_dst_write_line(&tr2dst_event, &jw.json);
+	jw_release(&jw);
+}
+
 struct tr2_tgt tr2_tgt_event = {
 	&tr2dst_event,
 
@@ -675,4 +697,5 @@ struct tr2_tgt tr2_tgt_event = {
 	fn_data_json_fl,
 	NULL, /* printf */
 	fn_timer,
+	fn_counter,
 };
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index 23a7e78dcaa..1778232f6e9 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -356,4 +356,5 @@ struct tr2_tgt tr2_tgt_normal = {
 	NULL, /* data_json */
 	fn_printf_va_fl,
 	NULL, /* timer */
+	NULL, /* counter */
 };
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index c07ffad1a32..911cf6e6eab 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -581,6 +581,23 @@ static void fn_timer(uint64_t us_elapsed_absolute,
 	strbuf_release(&buf_payload);
 }
 
+static void fn_counter(uint64_t us_elapsed_absolute,
+		       const char *thread_name,
+		       const char *category,
+		       const char *counter_name,
+		       uint64_t value)
+{
+	const char *event_name = "counter";
+	struct strbuf buf_payload = STRBUF_INIT;
+
+	strbuf_addf(&buf_payload, "name:%s value:%"PRIu64, counter_name, value);
+
+	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL,
+			 category, &buf_payload, thread_name);
+	strbuf_release(&buf_payload);
+}
+
 struct tr2_tgt tr2_tgt_perf = {
 	&tr2dst_perf,
 
@@ -613,4 +630,5 @@ struct tr2_tgt tr2_tgt_perf = {
 	fn_data_json_fl,
 	fn_printf_va_fl,
 	fn_timer,
+	fn_counter,
 };
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 9/9] trace2: add global counters
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-12-28 19:36   ` [PATCH v2 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
@ 2021-12-28 19:36   ` Jeff Hostetler via GitGitGadget
  2021-12-29  1:54   ` [PATCH v2 0/9] Trace2 stopwatch timers and " Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2021-12-28 19:36 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Jeff Hostetler,
	Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add global counters to Trace2.

Create a mechanism in Trace2 to count an activity and emit a single
"counter" event at the end of the program.  This is an alternative
to the existing "data" events that are emitted immediately.

Create an array of counters (indexed by `enum trace2_counter_id`)
to allow various activites to be tracked as desired.

Preload the array with two counters for testing purposes.

Create unit tests to demonstrate and verify.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/api-trace2.txt | 67 ++++++++++++++++++++
 Makefile                               |  1 +
 t/helper/test-trace2.c                 | 88 +++++++++++++++++++++++++-
 t/t0211-trace2-perf.sh                 | 42 ++++++++++++
 t/t0212-trace2-event.sh                | 41 ++++++++++++
 trace2.c                               | 50 +++++++++++++++
 trace2.h                               | 33 ++++++++++
 trace2/tr2_ctr.c                       | 67 ++++++++++++++++++++
 trace2/tr2_ctr.h                       | 79 +++++++++++++++++++++++
 trace2/tr2_tls.c                       | 29 +++++++++
 trace2/tr2_tls.h                       | 16 +++++
 11 files changed, 512 insertions(+), 1 deletion(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h

diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index bdba0f92280..a3ea867ff92 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -1297,6 +1297,73 @@ d0 | main                     | atexit       |     |  0.111026 |           |
 In this example, the "test1" timer was started 4 times and ran for
 0.000393 seconds.
 
+Counter Events::
+
+	Trace2 also provides global counters as an alternative to regions
+	and data events.  These make it possible to count an activity of
+	interest, such a call to a library routine, during the program
+	and get a single counter event at the end.
++
+At the end of the program, a single summary event is emitted; this
+value is aggregated across all threads.  These events have "summary"
+as their thread name.
++
+For some counters, individual (per-thread) counter events are also
+generated.  This may be helpful in understanding how work is balanced
+between threads in some circumstances.
++
+----------------
+static void *load_cache_entries_thread(void *_data)
+{
+	struct load_cache_entries_thread_data *p = _data;
+	int i;
+
+	trace2_thread_start("load_cache_entries");
+	...
+	trace2_thread_exit();
+}
+
+static unsigned long load_cache_entry_block(struct index_state *istate,
+			struct mem_pool *ce_mem_pool, int offset, int nr, const char *mmap,
+			unsigned long start_offset, const struct cache_entry *previous_ce)
+{
+	int i;
+	unsigned long src_offset = start_offset;
+
+	trace2_counter_increment(TRACE2_COUNTER_ID_TEST2, nr);
+
+	for (i = offset; i < offset + nr; i++) {
+		...
+	}
+}
+----------------
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+...
+d0 | main                     | exit         |     | 53.977680 |           |              | code:0
+d0 | th12:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193220
+d0 | th11:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th10:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th09:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th08:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th07:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th06:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th05:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th04:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th03:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | th02:load_cache_entries  | counter      |     | 53.977708 |           | test         | name:test2 value:193221
+d0 | summary                  | counter      |     | 53.977708 |           | test         | name:test2 value:2125430
+d0 | main                     | atexit       |     | 53.977708 |           |              | code:0
+----------------
++
+This example shows the value computed by each of the 11
+`load_cache_entries` threads and the total across all threads.
+
 == Future Work
 
 === Relationship to the Existing Trace Api (api-trace.txt)
diff --git a/Makefile b/Makefile
index 8b657f0162a..cc5bd8593f1 100644
--- a/Makefile
+++ b/Makefile
@@ -1020,6 +1020,7 @@ LIB_OBJS += trace.o
 LIB_OBJS += trace2.o
 LIB_OBJS += trace2/tr2_cfg.o
 LIB_OBJS += trace2/tr2_cmd_name.o
+LIB_OBJS += trace2/tr2_ctr.o
 LIB_OBJS += trace2/tr2_dst.o
 LIB_OBJS += trace2/tr2_sid.o
 LIB_OBJS += trace2/tr2_tmr.o
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index 51d022422bf..a7dbecfda9a 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -270,7 +270,7 @@ static int ut_009timer(int argc, const char **argv)
 	const char *usage_error =
 		"expect <count> <ms_delay> <threads>";
 
-	struct ut_009_data data = { 0, 0 };
+	struct ut_009_data data = { 0 };
 	int nr_threads = 0;
 	int k;
 	pthread_t *pids = NULL;
@@ -301,6 +301,90 @@ static int ut_009timer(int argc, const char **argv)
 	return 0;
 }
 
+/*
+ * Single-threaded counter test.  Add several values to the TEST1 counter.
+ * The test script can verify that an aggregate Trace2 "counter" event is
+ * emitted containing the sum of the values provided.
+ */
+static int ut_010counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> [<v2> [...]]";
+	int value;
+	int k;
+
+	if (argc < 1)
+		die("%s", usage_error);
+
+	for (k = 0; k < argc; k++) {
+		if (get_i(&value, argv[k]))
+			die("invalid value[%s] -- %s",
+			    argv[k], usage_error);
+		trace2_counter_add(TRACE2_COUNTER_ID_TEST1, value);
+	}
+
+	return 0;
+}
+
+struct ut_011_data {
+	int v1, v2;
+};
+
+static void *ut_011counter_thread_proc(void *_ut_011_data)
+{
+	struct ut_011_data *data = _ut_011_data;
+
+	trace2_thread_start("ut_011");
+
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v1);
+	trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v2);
+
+	trace2_thread_exit();
+	return NULL;
+}
+
+/*
+ * Multi-threaded counter test.  Create several threads that each
+ * increment the TEST2 global counter.  The test script can verify
+ * that an individual Trace2 "counter" event for each thread and an
+ * aggregate "counter" event are generated.
+ */
+static int ut_011counter(int argc, const char **argv)
+{
+	const char *usage_error =
+		"expect <v1> <v2> <threads>";
+
+	struct ut_011_data data = { 0, 0 };
+	int nr_threads = 0;
+	int k;
+	pthread_t *pids = NULL;
+
+	if (argc != 3)
+		die("%s", usage_error);
+	if (get_i(&data.v1, argv[0]))
+		die("%s", usage_error);
+	if (get_i(&data.v2, argv[1]))
+		die("%s", usage_error);
+	if (get_i(&nr_threads, argv[2]))
+		die("%s", usage_error);
+
+	CALLOC_ARRAY(pids, nr_threads);
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_create(&pids[k], NULL, ut_011counter_thread_proc, &data))
+			die("failed to create thread[%d]", k);
+	}
+
+	for (k = 0; k < nr_threads; k++) {
+		if (pthread_join(pids[k], NULL))
+			die("failed to join thread[%d]", k);
+	}
+
+	free(pids);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -320,6 +404,8 @@ static struct unit_test ut_table[] = {
 	{ ut_007bug,      "007bug",    "" },
 	{ ut_008timer,    "008timer",  "<count> <ms_delay>" },
 	{ ut_009timer,    "009timer",  "<count> <ms_delay> <threads>" },
+	{ ut_010counter,  "010counter","<v1> [<v2> [<v3> [...]]]" },
+	{ ut_011counter,  "011counter","<v1> <v2> <threads>" },
 };
 /* clang-format on */
 
diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index 381c3eea458..5f9a3533ce4 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -219,4 +219,46 @@ test_expect_success 'test stopwatch timers - summary and threads' '
 	have_timer_event "summary" "test2" 15 actual
 '
 
+# Exercise the global counter "test" in a loop and confirm that we get an
+# event with the sum.
+#
+
+have_counter_event () {
+	thread=$1
+	name=$2
+	value=$3
+	file=$4
+
+	pattern="d0|${thread}|counter||_T_ABS_||test"
+	pattern="${pattern}|name:${name}"
+	pattern="${pattern} value:${value}"
+
+	grep "${pattern}" ${file}
+
+	return $?
+}
+
+test_expect_success 'test global counters - global, single-thead' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 010counter 2 3 5 7 11 13  &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_counter_event "summary" "test1" 41 actual
+'
+
+test_expect_success 'test global counters - global+threads' '
+	test_when_finished "rm trace.perf actual" &&
+	test_config_global trace2.perfBrief 1 &&
+	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
+	test-tool trace2 011counter 5 10 3 &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
+
+	have_counter_event "th01:ut_011" "test2" 15 actual &&
+	have_counter_event "th02:ut_011" "test2" 15 actual &&
+	have_counter_event "th03:ut_011" "test2" 15 actual &&
+	have_counter_event "summary" "test2" 45 actual
+'
+
 test_done
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index 277688fdbc4..9e76ef5caa7 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -368,4 +368,45 @@ test_expect_success 'test stopwatch timers - global+threads' '
 	have_timer_event "summary" "test2" 15 trace.event
 '
 
+# Exercise the global counter in a loop and confirm that we get the
+# expected sum in an event record.
+#
+
+have_counter_event () {
+	thread=$1
+	name=$2
+	value=$3
+	file=$4
+
+	pattern="\"event\":\"counter\""
+	pattern="${pattern}.*\"thread\":\"${thread}\""
+	pattern="${pattern}.*\"name\":\"${name}\""
+	pattern="${pattern}.*\"value\":${value}"
+
+	grep "${pattern}" ${file}
+
+	return $?
+}
+
+test_expect_success 'test global counter - global, single-thread' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 010counter 2 3 5 7 11 13 &&
+
+	have_counter_event "summary" "test1" 41 trace.event
+'
+
+test_expect_success 'test global counter - global+threads' '
+	test_when_finished "rm trace.event" &&
+	test_config_global trace2.eventBrief 1 &&
+	test_config_global trace2.eventTarget "$(pwd)/trace.event" &&
+	test-tool trace2 011counter 5 10 3 &&
+
+	have_counter_event "th01:ut_011" "test2" 15 trace.event &&
+	have_counter_event "th02:ut_011" "test2" 15 trace.event &&
+	have_counter_event "th03:ut_011" "test2" 15 trace.event &&
+	have_counter_event "summary" "test2" 45 trace.event
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index 23289dd6eb4..aa6ed6dd3ee 100644
--- a/trace2.c
+++ b/trace2.c
@@ -8,6 +8,7 @@
 #include "version.h"
 #include "trace2/tr2_cfg.h"
 #include "trace2/tr2_cmd_name.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
 #include "trace2/tr2_sysenv.h"
@@ -114,6 +115,41 @@ static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
 						  us_elapsed_absolute);
 }
 
+static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+	struct tr2_counter_block merged = { { { 0 } } };
+
+	/*
+	 * Sum across all of the per-thread counter data into
+	 * a single composite block of counter values.
+	 */
+	tr2tls_aggregate_counter_blocks(&merged);
+
+	/*
+	 * Emit "summary" counter events for each composite counter value
+	 * that had activity.
+	 */
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tr2_emit_counter_block(tgt_j->pfn_counter,
+					       us_elapsed_absolute,
+					       &merged, "summary");
+}
+
+static void tr2main_emit_thread_counters(uint64_t us_elapsed_absolute)
+{
+	struct tr2_tgt *tgt_j;
+	int j;
+
+	for_each_wanted_builtin (j, tgt_j)
+		if (tgt_j->pfn_counter)
+			tr2tls_emit_counter_blocks_by_thread(
+				tgt_j->pfn_counter,
+				us_elapsed_absolute);
+}
+
 static int tr2main_exit_code;
 
 /*
@@ -144,6 +180,9 @@ static void tr2main_atexit_handler(void)
 	tr2main_emit_thread_timers(us_elapsed_absolute);
 	tr2main_emit_summary_timers(us_elapsed_absolute);
 
+	tr2main_emit_thread_counters(us_elapsed_absolute);
+	tr2main_emit_summary_counters(us_elapsed_absolute);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_atexit)
 			tgt_j->pfn_atexit(us_elapsed_absolute,
@@ -897,3 +936,14 @@ void trace2_timer_stop(enum trace2_timer_id tid)
 
 	tr2_stop_timer(tid);
 }
+
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value)
+{
+	if (!trace2_enabled)
+		return;
+
+	if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS)
+		BUG("invalid counter id: %d", cid);
+
+	tr2_counter_increment(cid, value);
+}
diff --git a/trace2.h b/trace2.h
index 22da5c5516c..d4ed602c19a 100644
--- a/trace2.h
+++ b/trace2.h
@@ -52,6 +52,7 @@ struct json_writer;
  * [] trace2_data*      -- emit region/thread/repo data messages.
  * [] trace2_printf*    -- legacy trace[1] messages.
  * [] trace2_timer*     -- start/stop stopwatch timer (messages are deferred).
+ * [] trace2_counter*   -- global counters (messages are deferrred).
  */
 
 /*
@@ -573,4 +574,36 @@ enum trace2_timer_id {
 void trace2_timer_start(enum trace2_timer_id tid);
 void trace2_timer_stop(enum trace2_timer_id tid);
 
+/*
+ * Define the set of global counters.
+ *
+ * We can add more at any time, but they must be defined at compile
+ * time (to avoid the need to dynamically allocate and synchronize
+ * them between different threads).
+ *
+ * These must start at 0 and be contiguous (because we them elsewhere
+ * as array indexes).
+ *
+ * Any value added to this enum must also be added to the counter
+ * definitions array.  See `trace2/tr2_ctr.c:tr2_counter_def_block[]`.
+ */
+enum trace2_counter_id {
+	/*
+	 * Define two counters for testing.  See `t/helper/test-trace2.c`.
+	 * These can be used for ad hoc testing, but should not be used
+	 * for permanent analysis code.
+	 */
+	TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */
+	TRACE2_COUNTER_ID_TEST2,     /* emits summary and thread events */
+
+
+	/* Add additional counter definitions before here. */
+	TRACE2_NUMBER_OF_COUNTERS
+};
+
+/*
+ * Increment global counter by value.
+ */
+void trace2_counter_add(enum trace2_counter_id cid, uint64_t value);
+
 #endif /* TRACE2_H */
diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c
new file mode 100644
index 00000000000..ce80ceb5476
--- /dev/null
+++ b/trace2/tr2_ctr.c
@@ -0,0 +1,67 @@
+#include "cache.h"
+#include "thread-utils.h"
+#include "trace2/tr2_tls.h"
+#include "trace2/tr2_ctr.h"
+
+/*
+ * Define metadata for each global counter.  This list must match the
+ * set defined in "enum trace2_counter_id".
+ */
+struct tr2_counter_def {
+	const char *category;
+	const char *name;
+
+	unsigned int want_thread_events:1;
+};
+
+static struct tr2_counter_def tr2_counter_def_block[TRACE2_NUMBER_OF_COUNTERS] = {
+	[TRACE2_COUNTER_ID_TEST1] = { "test", "test1", 0 },
+	[TRACE2_COUNTER_ID_TEST2] = { "test", "test2", 1 },
+};
+
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	struct tr2_counter *c = &ctx->counters.counter[cid];
+
+	c->value += value;
+}
+
+void tr2_merge_counter_block(struct tr2_counter_block *merged,
+			     const struct tr2_counter_block *src)
+{
+	enum trace2_counter_id cid;
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		struct tr2_counter *c_merged = &merged->counter[cid];
+		const struct tr2_counter *c = &src->counter[cid];
+
+		c_merged->is_aggregate = 1;
+
+		c_merged->value += c->value;
+	}
+
+	merged->is_aggregate = 1;
+}
+
+void tr2_emit_counter_block(tr2_tgt_evt_counter_t *pfn,
+			    uint64_t us_elapsed_absolute,
+			    const struct tr2_counter_block *blk,
+			    const char *thread_name)
+{
+	enum trace2_counter_id cid;
+
+	for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) {
+		const struct tr2_counter *c = &blk->counter[cid];
+		const struct tr2_counter_def *d = &tr2_counter_def_block[cid];
+
+		if (!c->value)
+			continue; /* counter was not used */
+
+		if (!d->want_thread_events && !c->is_aggregate)
+			continue; /* per-thread events not wanted */
+
+		pfn(us_elapsed_absolute, thread_name, d->category, d->name,
+		    c->value);
+	}
+}
diff --git a/trace2/tr2_ctr.h b/trace2/tr2_ctr.h
new file mode 100644
index 00000000000..fd6fbef89a2
--- /dev/null
+++ b/trace2/tr2_ctr.h
@@ -0,0 +1,79 @@
+#ifndef TR2_CTR_H
+#define TR2_CTR_H
+
+#include "trace2.h"
+#include "trace2/tr2_tgt.h"
+
+/*
+ * Define a mechanism to allow global "counters".
+ *
+ * Counters can be used count interesting activity that does not fit
+ * the "region and data" model, such as code called from many
+ * different regions and/or where you want to count a number of items,
+ * but don't have control of when the last item will be processed,
+ * such as counter the number of calls to `lstat()`.
+ *
+ * Counters differ from Trace2 "data" events.  Data events are emitted
+ * immediately and are appropriate for documenting loop counters and
+ * etc.  Counter values are accumulated during the program and the final
+ * counter value event is emitted at program exit.
+ *
+ * To make this model efficient, we define a compile-time fixed set
+ * of counters and counter ids.  This lets us avoid the complexities
+ * of dynamically allocating a counter and sharing that definition
+ * with other threads.
+ *
+ * We define (at compile time) a set of "counter ids" to access the
+ * various counters inside of a fixed size "counter block".
+ *
+ * A counter defintion table provides the counter category and name
+ * so we can eliminate those arguments from the public counter API.
+ * These are defined in a parallel tabel in `tr2_ctr.c`.
+ *
+ * Each thread has a private block of counters in its thread local
+ * storage data so no locks are required for a thread to increment
+ * it's version of the counter.  At program exit, the counter blocks
+ * from all of the per-thread counters are added together to give the
+ * final summary value for the each global counter.
+ */
+
+/*
+ * The definition of an individual counter.
+ */
+struct tr2_counter {
+	uint64_t value;
+
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * Compile time fixed block of all defined counters.
+ */
+struct tr2_counter_block {
+	struct tr2_counter counter[TRACE2_NUMBER_OF_COUNTERS];
+
+	unsigned int is_aggregate:1;
+};
+
+/*
+ * Add "value" to the global counter.
+ */
+void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value);
+
+/*
+ * Accumulate counter data from the source block into the merged block.
+ */
+void tr2_merge_counter_block(struct tr2_counter_block *merged,
+			       const struct tr2_counter_block *src);
+
+/*
+ * Send counter data for all counters in this block to the target.
+ *
+ * This will generate an event record for each counter that had activity.
+ */
+void tr2_emit_counter_block(tr2_tgt_evt_counter_t *pfn,
+			    uint64_t us_elapsed_absolute,
+			    const struct tr2_counter_block *blk,
+			    const char *thread_name);
+
+#endif /* TR2_CTR_H */
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 675f6aeef31..28ea55863d1 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "thread-utils.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tls.h"
 #include "trace2/tr2_tmr.h"
 
@@ -233,3 +234,31 @@ void tr2_emit_timers_by_thread(tr2_tgt_evt_timer_t *pfn,
 		ctx = next;
 	}
 }
+
+void tr2tls_aggregate_counter_blocks(struct tr2_counter_block *merged)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2_merge_counter_block(merged, &ctx->counters);
+
+		ctx = next;
+	}
+}
+
+void tr2tls_emit_counter_blocks_by_thread(tr2_tgt_evt_counter_t *pfn,
+					  uint64_t us_elapsed_absolute)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_ctx_list;
+
+	while (ctx) {
+		struct tr2tls_thread_ctx *next = ctx->next_ctx;
+
+		tr2_emit_counter_block(pfn, us_elapsed_absolute, &ctx->counters,
+				       ctx->thread_name);
+
+		ctx = next;
+	}
+}
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 72e37beb1e7..503829bbd44 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -2,6 +2,7 @@
 #define TR2_TLS_H
 
 #include "strbuf.h"
+#include "trace2/tr2_ctr.h"
 #include "trace2/tr2_tmr.h"
 
 struct tr2tls_thread_ctx {
@@ -12,10 +13,25 @@ struct tr2tls_thread_ctx {
 	int thread_id;
 
 	struct tr2_timer_block timers;
+	struct tr2_counter_block counters;
 
 	char thread_name[FLEX_ARRAY];
 };
 
+/*
+ * Iterate over the global list of threads and aggregate the
+ * counter data into the given counter block.  The resulting block
+ * will contain the global counter sums.
+ */
+void tr2tls_aggregate_counter_blocks(struct tr2_counter_block *merged);
+
+/*
+ * Iterate over the global list of threads and emit "per-thread"
+ * counter data for each.
+ */
+void tr2tls_emit_counter_blocks_by_thread(tr2_tgt_evt_counter_t *pfn,
+					  uint64_t us_elapsed_absolute);
+
 /*
  * Iterate over the global list of threads and aggregate the timer
  * data into the given timer block.  The resulting block will contain
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  2021-12-28 19:36   ` [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
@ 2021-12-29  0:48     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-29  0:48 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Derrick Stolee, Matheus Tavares,
	Johannes Sixt, Jeff Hostetler


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
> fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().
>
> This was discussed in: https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/

Let's keep commit messages self-contained when possible. It's fine to
reference on-list discussion (and I often do), but in this case all
that's being referenced just seems to be Taylor saying we might as well
change this while we're at it.

So I'd think a short sentence saying we generally prefer "size_t" for
these these days and we might as well change it here while we're at it
would suffice over the ML link.

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tls.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index b1e327a928e..a90bd639d48 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -11,8 +11,8 @@
>  struct tr2tls_thread_ctx {
>  	struct strbuf thread_name;
>  	uint64_t *array_us_start;
> -	int alloc;
> -	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
> +	size_t alloc;
> +	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
>  	int thread_id;
>  };


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array
  2021-12-28 19:36   ` [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array Jeff Hostetler via GitGitGadget
@ 2021-12-29  1:11     ` Ævar Arnfjörð Bjarmason
  2021-12-29 16:46       ` Jeff Hostetler
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-29  1:11 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Derrick Stolee, Matheus Tavares,
	Johannes Sixt, Jeff Hostetler


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Move the thread name to a flex array at the bottom of the Trace2
> thread local storage data and get rid of the strbuf.
>
> Let the flex array have the full computed value of the thread name
> without truncation.
>
> Change the PERF target to truncate the thread name so that the columns
> still line up.

This commit message really doesn't help in explaining what we're trying
to do here and why it's needed. I'm not saying it's not, but why not a
strbuf, why a flex array? The diff below also shows changes unrelated to
this.

I tried this local fixup on top of this series which works, so I wonder
if we're just trying to get rid of the strbuf to signal that this
shouldn't change why not just strbuf_detach() and keep a "const char
*thread_name"?

diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 28ea55863d1..35d49b27b2e 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -48,7 +48,7 @@ void tr2tls_start_process_clock(void)
 struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start)
 {
-	struct tr2tls_thread_ctx *ctx;
+	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(struct tr2tls_thread_ctx));
 	struct strbuf buf_name = STRBUF_INIT;
 	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
@@ -56,8 +56,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 		strbuf_addf(&buf_name, "th%02d:", thread_id);
 	strbuf_addstr(&buf_name, thread_name);
 
-	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
-	strbuf_release(&buf_name);
+	ctx->thread_name = strbuf_detach(&buf_name, NULL);
 
 	ctx->thread_id = thread_id;
 
@@ -188,6 +187,7 @@ void tr2tls_release(void)
 	while (ctx) {
 		struct tr2tls_thread_ctx *next = ctx->next_ctx;
 
+		free((char *)ctx->thread_name);
 		free(ctx->array_us_start);
 		free(ctx);
 
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 503829bbd44..bc6c6f12e38 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -6,6 +6,7 @@
 #include "trace2/tr2_tmr.h"
 
 struct tr2tls_thread_ctx {
+	const char *thread_name;
 	struct tr2tls_thread_ctx *next_ctx;
 	uint64_t *array_us_start;
 	size_t alloc;
@@ -14,8 +15,6 @@ struct tr2tls_thread_ctx {
 
 	struct tr2_timer_block timers;
 	struct tr2_counter_block counters;
-
-	char thread_name[FLEX_ARRAY];
 };
 
 /*

> [...]
> index 7da94aba522..ed99a234b95 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void)
>  struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  					     uint64_t us_thread_start)
>  {
> -	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> +	struct tr2tls_thread_ctx *ctx;
> +	struct strbuf buf_name = STRBUF_INIT;
> +	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);

Here's the looks-to-be-unrelated to this strbuf conversion code I
mentioned above.

> +
> +	if (thread_id)
> +		strbuf_addf(&buf_name, "th%02d:", thread_id);
> +	strbuf_addstr(&buf_name, thread_name);
> +
> +	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
> +	strbuf_release(&buf_name);
> +
> +	ctx->thread_id = thread_id;
>  
>  	/*
> [...]

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/9] trace2: add thread-name override to perf target
  2021-12-28 19:36   ` [PATCH v2 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
@ 2021-12-29  1:48     ` Ævar Arnfjörð Bjarmason
  2021-12-29 17:15       ` Jeff Hostetler
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-29  1:48 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Derrick Stolee, Matheus Tavares,
	Johannes Sixt, Jeff Hostetler


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach the message formatter in the Trace2 perf target to accept an
> optional thread name argument.  This will override the thread name
> inherited from the thread local storage data block.
>
> This will be used in a future commit for global events that should
> not be tied to a particular thread, such as a global stopwatch timer.

We already have a "ctx", and that "ctx" has a "thread_name", but here
and in the preceding commit we're adding a "thread_name" to every caller
of these functions in case we'd like to override it.

Wouldn't it make more sense to just pass a "ctx" to these functions? One
of them already takes it, here's an (obviously incomplete) fixup on top
of your series to make the one that doesn't take a "ctx", and for the
only non-NULL users of "thread_name" to just use a trivial helper to
pass in a "ctx" with a new "thread_name", then to swap it back.

It would make for a smaller diffstat for this already large series, or
we could do exactly what we're doing now, but avoid the churn of
adjusting every caller by introducing a new sister function for those
who want this parameter to be non-NULL.

(The below patch is "broken" in that __FILE__ and __LINE__ need to be
passed in as parameters, but this is just a trivial change for
show/commentary)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index b9eb2cdb77a..7aaec83dff7 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -82,16 +82,15 @@ static void fn_term(void)
 static void event_fmt_prepare(const char *event_name, const char *file,
 			      int line, const struct repository *repo,
 			      struct json_writer *jw,
-			      const char *thread_name_override)
+			      struct tr2tls_thread_ctx *ctx)
 {
 	struct tr2_tbuf tb_now;
+	if (!ctx)
+		ctx = tr2tls_get_self();
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread",
-			 ((thread_name_override && *thread_name_override)
-			  ? thread_name_override
-			  : tr2tls_get_self()->thread_name));
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
@@ -111,6 +110,20 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 		jw_object_intmax(jw, "repo", repo->trace2_repo_id);
 }
 
+static void event_fmt_prepare_tn(const char *event_name, const char *file,
+				 int line, const struct repository *repo,
+				 struct json_writer *jw,
+				 const char *thread_name)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	const char *tmp;
+
+	tmp = ctx->thread_name;
+	ctx->thread_name = thread_name;
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, jw, ctx);
+	ctx->thread_name = tmp;
+}
+
 static void fn_too_many_files_fl(const char *file, int line)
 {
 	const char *event_name = "too_many_files";
@@ -629,7 +642,7 @@ static void fn_timer(uint64_t us_elapsed_absolute,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	event_fmt_prepare_tn(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_string(&jw, "name", timer_name);
 	jw_object_intmax(&jw, "count", interval_count);
@@ -654,7 +667,7 @@ static void fn_counter(uint64_t us_elapsed_absolute,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	event_fmt_prepare_tn(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_string(&jw, "name", counter_name);
 	jw_object_intmax(&jw, "value", value);

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/9] Trace2 stopwatch timers and global counters
  2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-12-28 19:36   ` [PATCH v2 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
@ 2021-12-29  1:54   ` Ævar Arnfjörð Bjarmason
  2021-12-30 16:42     ` Jeff Hostetler
  9 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-29  1:54 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Derrick Stolee, Matheus Tavares,
	Johannes Sixt, Jeff Hostetler


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

I left some other comments on the series inline, just on the notes in
the CL:

>  * Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
>    varargs function to reduce the amount of copy-n-pasted code in many of
>    the "fn" event handlers. This looks like a good change based on the
>    mockup but is a large refactor.

FWIW what I meant with [1] was not that this series needed to take the
detour of refactoring trace2/tr2_tgt_perf.c to use such a helper, but
that for the function additions in this series it might make sense to
introduce one and use it for the new functions.

For this series I think it's probably not worth it, so I'm fine with
leaving this for some other time. Just pointing out that rather than
your reading of:

 1. We have some refactorable verbosity
 2. Refactor all callers
 3. Change existing code to use that refactoring
 4. Add new code to use the refactoring

It's also perfectly fine to do just:

 1. We have some refactorable verbosity
 2. Introduce a less verbose
 3. Add new code to use the helper

And leave the "refactor all callers" for some other time.

Anyway, I think for the two callers just leaving it entirely for this
series is the right thing to do. It was more of a "hrm, that's some odd
and avoidable verbosity..." comment on me read-through of v1.

1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

>  * Ævar proposed a new rationale for when/why we change the "_event" version
>    number. That text can be added to the design document independently.

Hrm, no. In [1] I linked to some earlier musings of mine about what we
should do about the TR2_EVENT_VERSION (mainly as an FYI since you added
it, but hadn't commented on that post).

But my main comment there was that the series wasn't progressing as
atomic changes. I.e. we promise to change the TR2_EVENT_VERSION version
every time we change the event format, but v1 first changed the format
and bumped the version, then made some more changes.

I think that's probably fine per-se within a git release cycle, but it
might be a symtom of commits that could be split up to be more atomic (I
don't know, didn't look in detail).

However, in this v2 of the series the TR2_EVENT_VERSION bump is entirely
gone.

Maybe that means that you so vehemently agree with my proposal in [1] it
that you'd like to start taking that view for trace2 changes right away
:-)

For me it's fine either way, I think TR2_EVENT_VERSION probably isn't
that important.

But if that's the case it should probably be called out more explictly
in the CL/commit. I.e. even if our "policy" (such as it is) about
TR2_EVENT_VERSION currently says X we're going to start doing Y here
intentionally.

And in that case I should probably turn that suggestion in [1] into a an
actual PATCH sooner than later...

1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array
  2021-12-29  1:11     ` Ævar Arnfjörð Bjarmason
@ 2021-12-29 16:46       ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-29 16:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler



On 12/28/21 8:11 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Move the thread name to a flex array at the bottom of the Trace2
>> thread local storage data and get rid of the strbuf.
>>
>> Let the flex array have the full computed value of the thread name
>> without truncation.
>>
>> Change the PERF target to truncate the thread name so that the columns
>> still line up.
> 
> This commit message really doesn't help in explaining what we're trying
> to do here and why it's needed. I'm not saying it's not, but why not a
> strbuf, why a flex array? The diff below also shows changes unrelated to
> this.
> 
> I tried this local fixup on top of this series which works, so I wonder
> if we're just trying to get rid of the strbuf to signal that this
> shouldn't change why not just strbuf_detach() and keep a "const char
> *thread_name"?
> 
> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 28ea55863d1..35d49b27b2e 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -48,7 +48,7 @@ void tr2tls_start_process_clock(void)
>   struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>   					     uint64_t us_thread_start)
>   {
> -	struct tr2tls_thread_ctx *ctx;
> +	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(struct tr2tls_thread_ctx));
>   	struct strbuf buf_name = STRBUF_INIT;
>   	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>   
> @@ -56,8 +56,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>   		strbuf_addf(&buf_name, "th%02d:", thread_id);
>   	strbuf_addstr(&buf_name, thread_name);
>   
> -	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
> -	strbuf_release(&buf_name);
> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>   
>   	ctx->thread_id = thread_id;
>   
> @@ -188,6 +187,7 @@ void tr2tls_release(void)
>   	while (ctx) {
>   		struct tr2tls_thread_ctx *next = ctx->next_ctx;
>   
> +		free((char *)ctx->thread_name);
>   		free(ctx->array_us_start);
>   		free(ctx);
>   
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index 503829bbd44..bc6c6f12e38 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -6,6 +6,7 @@
>   #include "trace2/tr2_tmr.h"
>   
>   struct tr2tls_thread_ctx {
> +	const char *thread_name;
>   	struct tr2tls_thread_ctx *next_ctx;
>   	uint64_t *array_us_start;
>   	size_t alloc;
> @@ -14,8 +15,6 @@ struct tr2tls_thread_ctx {
>   
>   	struct tr2_timer_block timers;
>   	struct tr2_counter_block counters;
> -
> -	char thread_name[FLEX_ARRAY];
>   };
>   
>   /*

I have to admit that I really don't know how to please you.

In V1 I converted the "strbuf" to a "char *" inside the structure
because there was concern that one might assume that the thread
name could be changed after the thread was created.  You complained
that I made it a "char *" rather than a "const char *".  I explained
pointer ownership and you completely ignored that.  You explained
that I should just "cast away the const during the free" because
other places in the code use that "anti-pattern".  You also complained
that I didn't use a callback to get the thread name dynamically rather
than having a string field in the thread's TLS.  I explained that it
was faster to compute it once than to generate it on every logging
call.  You ignored that and hinted that the message formatting in
each of the target destinations would make that cost irrelevant.
I convert the field to a flex-array to avoid all of the allocation and
ownership issues and now you send me a "fixup" patch that undoes
the flex-array change and makes it look mostly like my previous
version -- but WITH the "const" and the "cast" (that I've already
talked about in this paragraph).

So, where does this leave us?  I'm really trying to "assume good
intentions" here, but we've spent way toooooooo long discussing
this thread_name field.  It's starting to feel like you're going
to just keep nagging me about this field until I make it look
exactly like you would have written it.

So, sorry to rant, but I don't know what else to say about this
field.  It is especially troubling that this "issue" has taken
so much time -- time that would be better spent actually looking
at the new timers and counters feature.


> 
>> [...]
>> index 7da94aba522..ed99a234b95 100644
>> --- a/trace2/tr2_tls.c
>> +++ b/trace2/tr2_tls.c
>> @@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void)
>>   struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   					     uint64_t us_thread_start)
>>   {
>> -	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>> +	struct tr2tls_thread_ctx *ctx;
>> +	struct strbuf buf_name = STRBUF_INIT;
>> +	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
> 
> Here's the looks-to-be-unrelated to this strbuf conversion code I
> mentioned above.

In the flex-array version, we defer the alloc of "ctx" until
after we have computed the thread name -- we to do that so that
we know the length of the thread name (and thus the size of the
flex-array).  To do that we need to know the thread id that we
will be formatting into the thread name.  And to do that we need
to reserve a thread id -- which is a global and requires a lock.

So the call to tr2tls_locked_increment() (as well as the formatting
of the name itself) was moved up to the top of the function rather
than after the "ctx" was allocated.

> 
>> +
>> +	if (thread_id)
>> +		strbuf_addf(&buf_name, "th%02d:", thread_id);
>> +	strbuf_addstr(&buf_name, thread_name);
>> +
>> +	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
>> +	strbuf_release(&buf_name);
>> +
>> +	ctx->thread_id = thread_id;
>>   
>>   	/*
>> [...]

Jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/9] trace2: add thread-name override to perf target
  2021-12-29  1:48     ` Ævar Arnfjörð Bjarmason
@ 2021-12-29 17:15       ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-29 17:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler



On 12/28/21 8:48 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach the message formatter in the Trace2 perf target to accept an
>> optional thread name argument.  This will override the thread name
>> inherited from the thread local storage data block.
>>
>> This will be used in a future commit for global events that should
>> not be tied to a particular thread, such as a global stopwatch timer.
> 
> We already have a "ctx", and that "ctx" has a "thread_name", but here
> and in the preceding commit we're adding a "thread_name" to every caller
> of these functions in case we'd like to override it.
> 
> Wouldn't it make more sense to just pass a "ctx" to these functions? One
> of them already takes it, here's an (obviously incomplete) fixup on top
> of your series to make the one that doesn't take a "ctx", and for the
> only non-NULL users of "thread_name" to just use a trivial helper to
> pass in a "ctx" with a new "thread_name", then to swap it back.
> 
> It would make for a smaller diffstat for this already large series, or
> we could do exactly what we're doing now, but avoid the churn of
> adjusting every caller by introducing a new sister function for those
> who want this parameter to be non-NULL.

I suppose it is possible to have a helper version of
`event_fmt_prepare()` that takes the extra argument and
fixup the existing function to call it with NULL.

I'll see if that makes sense.


[...]
>   
> +static void event_fmt_prepare_tn(const char *event_name, const char *file,
> +				 int line, const struct repository *repo,
> +				 struct json_writer *jw,
> +				 const char *thread_name)
> +{
> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
> +	const char *tmp;
> +
> +	tmp = ctx->thread_name;
> +	ctx->thread_name = thread_name;
> +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, jw, ctx);
> +	ctx->thread_name = tmp;
> +}
[...]

This only works if we agree that thread name is a pointer inside
the structure and not a flex-array.

Personally, I think this is trying to do things backwards by
temporarily changing the ctx->thread_name field.  I think it
would be better to `event_fmt_prepare_tn()` do the actual
work with the supplied thread name and have the existing
`event_fmt_prepare()` just call it with ctx->thread_name.
Then we don't need to hack up the ctx.

I'll see if this makes the diffs a little cleaner.

Jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/9] Trace2 stopwatch timers and global counters
  2021-12-29  1:54   ` [PATCH v2 0/9] Trace2 stopwatch timers and " Ævar Arnfjörð Bjarmason
@ 2021-12-30 16:42     ` Jeff Hostetler
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Hostetler @ 2021-12-30 16:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff Hostetler via GitGitGadget
  Cc: git, Derrick Stolee, Matheus Tavares, Johannes Sixt, Jeff Hostetler



On 12/28/21 8:54 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:
> 
> I left some other comments on the series inline, just on the notes in
> the CL:
> 
>>   * Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
>>     varargs function to reduce the amount of copy-n-pasted code in many of
>>     the "fn" event handlers. This looks like a good change based on the
>>     mockup but is a large refactor.
> 
> FWIW what I meant with [1] was not that this series needed to take the
> detour of refactoring trace2/tr2_tgt_perf.c to use such a helper, but
> that for the function additions in this series it might make sense to
> introduce one and use it for the new functions.
> 
> For this series I think it's probably not worth it, so I'm fine with
> leaving this for some other time. Just pointing out that rather than
> your reading of:
> 
>   1. We have some refactorable verbosity
>   2. Refactor all callers
>   3. Change existing code to use that refactoring
>   4. Add new code to use the refactoring
> 
> It's also perfectly fine to do just:
> 
>   1. We have some refactorable verbosity
>   2. Introduce a less verbose
>   3. Add new code to use the helper
> 
> And leave the "refactor all callers" for some other time.
> 
> Anyway, I think for the two callers just leaving it entirely for this
> series is the right thing to do. It was more of a "hrm, that's some odd
> and avoidable verbosity..." comment on me read-through of v1.
> 
> 1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

Sorry, but I'm going to call BS on this.  You sent a ~200 line diff
showing how we could refactor and reduce some of the duplicated code.
You have a history of introducing unnecessary refactorings in the middle
of other topics, and this looks like another example of that.  Another
example of distracting everyone from reviewing the actual new code.

And when I say that it should be an independent topic in its own
series, you fall back to the your "oh, it was just a drive-by comment."
and/or "i didn't mean for you to actually do it." and/or "you just
read my email incorrectly."

Drive-by comments don't usually have ~200 line diffs attached....

A drive-by comment would just say that "there is an opportunity to
create a varargs version of the existing io function and reduce
some duplication in the bodies of the existing callers" and be done.
I don't need a 200 line diff to see how you spell that.

Again, sorry to rant, but I'm tired looking like the stupid half
in these conversations.

> 
>>   * Ævar proposed a new rationale for when/why we change the "_event" version
>>     number. That text can be added to the design document independently.
> 
> Hrm, no. In [1] I linked to some earlier musings of mine about what we
> should do about the TR2_EVENT_VERSION (mainly as an FYI since you added
> it, but hadn't commented on that post).
> 
> But my main comment there was that the series wasn't progressing as
> atomic changes. I.e. we promise to change the TR2_EVENT_VERSION version
> every time we change the event format, but v1 first changed the format
> and bumped the version, then made some more changes.

Did you really expect me to change it twice within a single 9 commit
patch series?

This series creates both "timers" and "counters" and will both appear
together if/when they are merged.  From an external point of view,
users would see version 4 added two new event types.  So I either
increment it for "timers" or I increment it for "counters" or I squash
the two commits together and increment it then.

I didn't want to squash them, so I chose the former.

> 
> I think that's probably fine per-se within a git release cycle, but it
> might be a symtom of commits that could be split up to be more atomic (I
> don't know, didn't look in detail).
> 
> However, in this v2 of the series the TR2_EVENT_VERSION bump is entirely
> gone.

You complained when/how I bumped it in V1.  So I removed it.

And I suggested that you commit your "earlier musings".  With
that in place, there would be no need for me to change the
version number (which is what you wanted all along, right?)


> 
> Maybe that means that you so vehemently agree with my proposal in [1] it
> that you'd like to start taking that view for trace2 changes right away
> :-)

s/so vehemently agree with/are tired of debating/

> 
> For me it's fine either way, I think TR2_EVENT_VERSION probably isn't
> that important.
> 
> But if that's the case it should probably be called out more explictly
> in the CL/commit. I.e. even if our "policy" (such as it is) about
> TR2_EVENT_VERSION currently says X we're going to start doing Y here
> intentionally.
> 
> And in that case I should probably turn that suggestion in [1] into a an
> actual PATCH sooner than later...
> 
> 1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/
> 

Right, I'll add a note to V3 stating that I did not update
the version number.

Jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2021-12-30 16:42 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char* Jeff Hostetler via GitGitGadget
2021-12-20 16:31   ` Ævar Arnfjörð Bjarmason
2021-12-20 19:07     ` Jeff Hostetler
2021-12-20 19:35       ` Ævar Arnfjörð Bjarmason
2021-12-22 16:32         ` Jeff Hostetler
2021-12-21  7:33     ` Junio C Hamano
2021-12-21  7:22   ` Junio C Hamano
2021-12-22 16:28     ` Jeff Hostetler
2021-12-22 19:57       ` Junio C Hamano
2021-12-20 15:01 ` [PATCH 3/9] trace2: defer free of TLS CTX until program exit Jeff Hostetler via GitGitGadget
2021-12-21  7:30   ` Junio C Hamano
2021-12-22 21:59     ` Jeff Hostetler
2021-12-22 22:56       ` Junio C Hamano
2021-12-22 23:04         ` Jeff Hostetler
2021-12-23  7:38         ` Johannes Sixt
2021-12-23 18:18           ` Junio C Hamano
2021-12-27 18:51             ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-20 16:39   ` Ævar Arnfjörð Bjarmason
2021-12-20 19:44     ` Jeff Hostetler
2021-12-21 14:20   ` Derrick Stolee
2021-12-20 15:01 ` [PATCH 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2021-12-20 16:42   ` Ævar Arnfjörð Bjarmason
2021-12-22 21:38     ` Jeff Hostetler
2021-12-21 14:45   ` Derrick Stolee
2021-12-22 21:57     ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-20 16:51   ` Ævar Arnfjörð Bjarmason
2021-12-22 22:56     ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
2021-12-20 17:14   ` Ævar Arnfjörð Bjarmason
2021-12-22 22:18     ` Jeff Hostetler
2021-12-21 14:51 ` [PATCH 0/9] Trace2 stopwatch timers and " Derrick Stolee
2021-12-21 23:27   ` Matheus Tavares
2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2021-12-29  0:48     ` Ævar Arnfjörð Bjarmason
2021-12-28 19:36   ` [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array Jeff Hostetler via GitGitGadget
2021-12-29  1:11     ` Ævar Arnfjörð Bjarmason
2021-12-29 16:46       ` Jeff Hostetler
2021-12-28 19:36   ` [PATCH v2 3/9] trace2: defer free of thread local storage until program exit Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
2021-12-29  1:48     ` Ævar Arnfjörð Bjarmason
2021-12-29 17:15       ` Jeff Hostetler
2021-12-28 19:36   ` [PATCH v2 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-28 19:36   ` [PATCH v2 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
2021-12-29  1:54   ` [PATCH v2 0/9] Trace2 stopwatch timers and " Ævar Arnfjörð Bjarmason
2021-12-30 16:42     ` Jeff Hostetler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.