* [PATCH 0/5] Support metric group constraint
@ 2020-02-19 19:08 kan.liang
2020-02-19 19:08 ` [PATCH 1/5] perf jevents: Support metric constraint kan.liang
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: kan.liang @ 2020-02-19 19:08 UTC (permalink / raw)
To: acme, jolsa, mingo, peterz, linux-kernel
Cc: mark.rutland, namhyung, ravi.bangoria, yao.jin, ak, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Some metric groups, e.g. Page_Walks_Utilization, will never count when
NMI watchdog is enabled.
$echo 1 > /proc/sys/kernel/nmi_watchdog
$perf stat -M Page_Walks_Utilization
Performance counter stats for 'system wide':
<not counted> itlb_misses.walk_pending (0.00%)
<not counted> dtlb_load_misses.walk_pending (0.00%)
<not counted> dtlb_store_misses.walk_pending (0.00%)
<not counted> ept.walk_pending (0.00%)
<not counted> cycles (0.00%)
2.343460588 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try
reorganizing the group.
A metric group is a weak group, which relies on group validation
code in the kernel to determine whether to be opened as a group or
a non-group. However, group validation code may return false-positives,
especially when NMI watchdog is enabled. (The metric group is allowed
as a group but will never be scheduled.)
The attempt to fix the group validation code has been rejected.
https://lore.kernel.org/lkml/20200117091341.GX2827@hirez.programming.kicks-ass.net/
Because we cannot accurately predict whether the group can be scheduled
as a group, only by checking current status.
This patch set provides another solution to mitigate the issue.
Add "MetricConstraint" in event list, which provides a hint for perf tool,
e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
metric group to non-group (standalone metrics) if NMI watchdog is enabled.
After applying the patch,
$echo 1 > /proc/sys/kernel/nmi_watchdog
$perf stat -M Page_Walks_Utilization
Splitting metric group Page_Walks_Utilization into standalone metrics.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Performance counter stats for 'system wide':
18,253,454 itlb_misses.walk_pending # 0.0
Page_Walks_Utilization (50.55%)
78,051,525 dtlb_load_misses.walk_pending (50.55%)
29,213,063 dtlb_store_misses.walk_pending (50.55%)
0 ept.walk_pending (50.55%)
2,542,132,364 cycles (49.92%)
1.037095993 seconds time elapsed
Kan Liang (5):
perf jevents: Support metric constraint
perf metricgroup: Factor out metricgroup__add_metric_weak_group()
perf util: Factor out sysctl__nmi_watchdog_enabled()
perf metricgroup: Support metric constraint
perf vendor events: Add NO_NMI_WATCHDOG metric constraint
.../arch/x86/cascadelakex/clx-metrics.json | 3 +-
.../pmu-events/arch/x86/skylake/skl-metrics.json | 3 +-
.../pmu-events/arch/x86/skylakex/skx-metrics.json | 3 +-
tools/perf/pmu-events/jevents.c | 19 +++--
tools/perf/pmu-events/jevents.h | 2 +-
tools/perf/pmu-events/pmu-events.h | 1 +
tools/perf/util/metricgroup.c | 97 ++++++++++++++++------
tools/perf/util/stat-display.c | 6 +-
tools/perf/util/util.c | 18 ++++
tools/perf/util/util.h | 2 +
10 files changed, 116 insertions(+), 38 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/5] perf jevents: Support metric constraint
2020-02-19 19:08 [PATCH 0/5] Support metric group constraint kan.liang
@ 2020-02-19 19:08 ` kan.liang
2020-02-19 19:08 ` [PATCH 2/5] perf metricgroup: Factor out metricgroup__add_metric_weak_group() kan.liang
` (4 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2020-02-19 19:08 UTC (permalink / raw)
To: acme, jolsa, mingo, peterz, linux-kernel
Cc: mark.rutland, namhyung, ravi.bangoria, yao.jin, ak, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
A new field "MetricConstraint" is introduced in JSON event list.
Extend jevents to parse the field and save the value in
metric_constraint.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/pmu-events/jevents.c | 19 +++++++++++++------
tools/perf/pmu-events/jevents.h | 2 +-
tools/perf/pmu-events/pmu-events.h | 1 +
3 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index 079c77b..6d0f61f 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -323,7 +323,7 @@ static int print_events_table_entry(void *data, char *name, char *event,
char *pmu, char *unit, char *perpkg,
char *metric_expr,
char *metric_name, char *metric_group,
- char *deprecated)
+ char *deprecated, char *metric_constraint)
{
struct perf_entry_data *pd = data;
FILE *outfp = pd->outfp;
@@ -357,6 +357,8 @@ static int print_events_table_entry(void *data, char *name, char *event,
fprintf(outfp, "\t.metric_group = \"%s\",\n", metric_group);
if (deprecated)
fprintf(outfp, "\t.deprecated = \"%s\",\n", deprecated);
+ if (metric_constraint)
+ fprintf(outfp, "\t.metric_constraint = \"%s\",\n", metric_constraint);
fprintf(outfp, "},\n");
return 0;
@@ -375,6 +377,7 @@ struct event_struct {
char *metric_name;
char *metric_group;
char *deprecated;
+ char *metric_constraint;
};
#define ADD_EVENT_FIELD(field) do { if (field) { \
@@ -422,7 +425,7 @@ static int save_arch_std_events(void *data, char *name, char *event,
char *desc, char *long_desc, char *pmu,
char *unit, char *perpkg, char *metric_expr,
char *metric_name, char *metric_group,
- char *deprecated)
+ char *deprecated, char *metric_constraint)
{
struct event_struct *es;
@@ -486,7 +489,7 @@ try_fixup(const char *fn, char *arch_std, char **event, char **desc,
char **name, char **long_desc, char **pmu, char **filter,
char **perpkg, char **unit, char **metric_expr, char **metric_name,
char **metric_group, unsigned long long eventcode,
- char **deprecated)
+ char **deprecated, char **metric_constraint)
{
/* try to find matching event from arch standard values */
struct event_struct *es;
@@ -515,7 +518,7 @@ int json_events(const char *fn,
char *pmu, char *unit, char *perpkg,
char *metric_expr,
char *metric_name, char *metric_group,
- char *deprecated),
+ char *deprecated, char *metric_constraint),
void *data)
{
int err;
@@ -545,6 +548,7 @@ int json_events(const char *fn,
char *metric_name = NULL;
char *metric_group = NULL;
char *deprecated = NULL;
+ char *metric_constraint = NULL;
char *arch_std = NULL;
unsigned long long eventcode = 0;
struct msrmap *msr = NULL;
@@ -629,6 +633,8 @@ int json_events(const char *fn,
addfield(map, &metric_name, "", "", val);
} else if (json_streq(map, field, "MetricGroup")) {
addfield(map, &metric_group, "", "", val);
+ } else if (json_streq(map, field, "MetricConstraint")) {
+ addfield(map, &metric_constraint, "", "", val);
} else if (json_streq(map, field, "MetricExpr")) {
addfield(map, &metric_expr, "", "", val);
for (s = metric_expr; *s; s++)
@@ -670,13 +676,13 @@ int json_events(const char *fn,
&long_desc, &pmu, &filter, &perpkg,
&unit, &metric_expr, &metric_name,
&metric_group, eventcode,
- &deprecated);
+ &deprecated, &metric_constraint);
if (err)
goto free_strings;
}
err = func(data, name, real_event(name, event), desc, long_desc,
pmu, unit, perpkg, metric_expr, metric_name,
- metric_group, deprecated);
+ metric_group, deprecated, metric_constraint);
free_strings:
free(event);
free(desc);
@@ -691,6 +697,7 @@ int json_events(const char *fn,
free(metric_expr);
free(metric_name);
free(metric_group);
+ free(metric_constraint);
free(arch_std);
if (err)
diff --git a/tools/perf/pmu-events/jevents.h b/tools/perf/pmu-events/jevents.h
index 5cda49a4..2afc830 100644
--- a/tools/perf/pmu-events/jevents.h
+++ b/tools/perf/pmu-events/jevents.h
@@ -8,7 +8,7 @@ int json_events(const char *fn,
char *pmu,
char *unit, char *perpkg, char *metric_expr,
char *metric_name, char *metric_group,
- char *deprecated),
+ char *deprecated, char *metric_constraint),
void *data);
char *get_cpu_str(void);
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index caeb577..53e76d5 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -18,6 +18,7 @@ struct pmu_event {
const char *metric_name;
const char *metric_group;
const char *deprecated;
+ const char *metric_constraint;
};
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/5] perf metricgroup: Factor out metricgroup__add_metric_weak_group()
2020-02-19 19:08 [PATCH 0/5] Support metric group constraint kan.liang
2020-02-19 19:08 ` [PATCH 1/5] perf jevents: Support metric constraint kan.liang
@ 2020-02-19 19:08 ` kan.liang
2020-02-19 19:08 ` [PATCH 3/5] perf util: Factor out sysctl__nmi_watchdog_enabled() kan.liang
` (3 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2020-02-19 19:08 UTC (permalink / raw)
To: acme, jolsa, mingo, peterz, linux-kernel
Cc: mark.rutland, namhyung, ravi.bangoria, yao.jin, ak, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Factor out metricgroup__add_metric_weak_group() which add metrics into a
weak group. The change can improve code readability. Because following
patch will introduce a function which add standalone metrics.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/util/metricgroup.c | 57 +++++++++++++++++++++++++------------------
1 file changed, 33 insertions(+), 24 deletions(-)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 02aee94..1cd042c 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -399,13 +399,42 @@ void metricgroup__print(bool metrics, bool metricgroups, char *filter,
strlist__delete(metriclist);
}
+static void metricgroup__add_metric_weak_group(struct strbuf *events,
+ const char **ids,
+ int idnum)
+{
+ bool no_group = false;
+ int i;
+
+ for (i = 0; i < idnum; i++) {
+ pr_debug("found event %s\n", ids[i]);
+ /*
+ * Duration time maps to a software event and can make
+ * groups not count. Always use it outside a
+ * group.
+ */
+ if (!strcmp(ids[i], "duration_time")) {
+ if (i > 0)
+ strbuf_addf(events, "}:W,");
+ strbuf_addf(events, "duration_time");
+ no_group = true;
+ continue;
+ }
+ strbuf_addf(events, "%s%s",
+ i == 0 || no_group ? "{" : ",",
+ ids[i]);
+ no_group = false;
+ }
+ if (!no_group)
+ strbuf_addf(events, "}:W");
+}
+
static int metricgroup__add_metric(const char *metric, struct strbuf *events,
struct list_head *group_list)
{
struct pmu_events_map *map = perf_pmu__find_map(NULL);
struct pmu_event *pe;
- int ret = -EINVAL;
- int i, j;
+ int i, ret = -EINVAL;
if (!map)
return 0;
@@ -422,7 +451,6 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
const char **ids;
int idnum;
struct egroup *eg;
- bool no_group = false;
pr_debug("metric expr %s for %s\n", pe->metric_expr, pe->metric_name);
@@ -431,27 +459,8 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
continue;
if (events->len > 0)
strbuf_addf(events, ",");
- for (j = 0; j < idnum; j++) {
- pr_debug("found event %s\n", ids[j]);
- /*
- * Duration time maps to a software event and can make
- * groups not count. Always use it outside a
- * group.
- */
- if (!strcmp(ids[j], "duration_time")) {
- if (j > 0)
- strbuf_addf(events, "}:W,");
- strbuf_addf(events, "duration_time");
- no_group = true;
- continue;
- }
- strbuf_addf(events, "%s%s",
- j == 0 || no_group ? "{" : ",",
- ids[j]);
- no_group = false;
- }
- if (!no_group)
- strbuf_addf(events, "}:W");
+
+ metricgroup__add_metric_weak_group(events, ids, idnum);
eg = malloc(sizeof(struct egroup));
if (!eg) {
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/5] perf util: Factor out sysctl__nmi_watchdog_enabled()
2020-02-19 19:08 [PATCH 0/5] Support metric group constraint kan.liang
2020-02-19 19:08 ` [PATCH 1/5] perf jevents: Support metric constraint kan.liang
2020-02-19 19:08 ` [PATCH 2/5] perf metricgroup: Factor out metricgroup__add_metric_weak_group() kan.liang
@ 2020-02-19 19:08 ` kan.liang
2020-02-19 19:08 ` [PATCH 4/5] perf metricgroup: Support metric constraint kan.liang
` (2 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2020-02-19 19:08 UTC (permalink / raw)
To: acme, jolsa, mingo, peterz, linux-kernel
Cc: mark.rutland, namhyung, ravi.bangoria, yao.jin, ak, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
NMI watchdog status is required for metric group constraint examination.
Factor out sysctl__nmi_watchdog_enabled() to retrieve the NMI watchdog
status.
Users may count more than one metric groups each time. If so, the NMI
watchdog status may be retrieved several times. To reduce the overhead,
cache the NMI watchdog status.
Replace the NMI watchdog status checking in print_footer() by
sysctl__nmi_watchdog_enabled().
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/util/stat-display.c | 6 ++----
tools/perf/util/util.c | 18 ++++++++++++++++++
tools/perf/util/util.h | 2 ++
3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bc31fcc..16efdba 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -16,6 +16,7 @@
#include <linux/ctype.h>
#include "cgroup.h"
#include <api/fs/fs.h>
+#include "util.h"
#define CNTR_NOT_SUPPORTED "<not supported>"
#define CNTR_NOT_COUNTED "<not counted>"
@@ -1097,7 +1098,6 @@ static void print_footer(struct perf_stat_config *config)
{
double avg = avg_stats(config->walltime_nsecs_stats) / NSEC_PER_SEC;
FILE *output = config->output;
- int n;
if (!config->null_run)
fprintf(output, "\n");
@@ -1131,9 +1131,7 @@ static void print_footer(struct perf_stat_config *config)
}
fprintf(output, "\n\n");
- if (config->print_free_counters_hint &&
- sysctl__read_int("kernel/nmi_watchdog", &n) >= 0 &&
- n > 0)
+ if (config->print_free_counters_hint && sysctl__nmi_watchdog_enabled())
fprintf(output,
"Some events weren't counted. Try disabling the NMI watchdog:\n"
" echo 0 > /proc/sys/kernel/nmi_watchdog\n"
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 969ae56..d707c96 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -55,6 +55,24 @@ int sysctl__max_stack(void)
return sysctl_perf_event_max_stack;
}
+bool sysctl__nmi_watchdog_enabled(void)
+{
+ static bool cached;
+ static bool nmi_watchdog;
+ int value;
+
+ if (cached)
+ return nmi_watchdog;
+
+ if (sysctl__read_int("kernel/nmi_watchdog", &value) < 0)
+ return false;
+
+ nmi_watchdog = (value > 0) ? true : false;
+ cached = true;
+
+ return nmi_watchdog;
+}
+
bool test_attr__enabled;
bool perf_host = true;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 9969b8b..f486fdd 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -29,6 +29,8 @@ size_t hex_width(u64 v);
int sysctl__max_stack(void);
+bool sysctl__nmi_watchdog_enabled(void);
+
int fetch_kernel_version(unsigned int *puint,
char *str, size_t str_sz);
#define KVER_VERSION(x) (((x) >> 16) & 0xff)
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/5] perf metricgroup: Support metric constraint
2020-02-19 19:08 [PATCH 0/5] Support metric group constraint kan.liang
` (2 preceding siblings ...)
2020-02-19 19:08 ` [PATCH 3/5] perf util: Factor out sysctl__nmi_watchdog_enabled() kan.liang
@ 2020-02-19 19:08 ` kan.liang
2020-02-20 11:35 ` Jiri Olsa
2020-02-19 19:08 ` [PATCH 5/5] perf vendor events: Add NO_NMI_WATCHDOG " kan.liang
2020-02-20 11:39 ` [PATCH 0/5] Support metric group constraint Jiri Olsa
5 siblings, 1 reply; 17+ messages in thread
From: kan.liang @ 2020-02-19 19:08 UTC (permalink / raw)
To: acme, jolsa, mingo, peterz, linux-kernel
Cc: mark.rutland, namhyung, ravi.bangoria, yao.jin, ak, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Some metric groups have metric constraints. A metric group can be
scheduled as a group only when some constraints are applied.
For example, Page_Walks_Utilization has a metric constraint,
"NO_NMI_WATCHDOG".
When NMI watchdog is disabled, the metric group can be scheduled as a
group. Otherwise, splitting the metric group into standalone metrics.
Add a new function, metricgroup__has_constraint(), to check whether all
constraints are applied. If not, splitting the metric group into
standalone metrics.
Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a
warning for the metric group with the constraint, when NMI WATCHDOG is
enabled.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/util/metricgroup.c | 42 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 1cd042c..f9a9b50 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -22,6 +22,8 @@
#include <linux/string.h>
#include <linux/zalloc.h>
#include <subcmd/parse-options.h>
+#include <api/fs/fs.h>
+#include "util.h"
struct metric_event *metricgroup__lookup(struct rblist *metric_events,
struct evsel *evsel,
@@ -429,6 +431,34 @@ static void metricgroup__add_metric_weak_group(struct strbuf *events,
strbuf_addf(events, "}:W");
}
+static void metricgroup__add_metric_non_group(struct strbuf *events,
+ const char **ids,
+ int idnum)
+{
+ int i;
+
+ for (i = 0; i < idnum; i++)
+ strbuf_addf(events, ",%s", ids[i]);
+}
+
+static bool violate_nmi_constraint;
+
+static bool metricgroup__has_constraint(struct pmu_event *pe)
+{
+ if (!pe->metric_constraint)
+ return false;
+
+ if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
+ sysctl__nmi_watchdog_enabled()) {
+ pr_warning("Splitting metric group %s into standalone metrics.\n",
+ pe->metric_name);
+ violate_nmi_constraint = true;
+ return true;
+ }
+
+ return false;
+}
+
static int metricgroup__add_metric(const char *metric, struct strbuf *events,
struct list_head *group_list)
{
@@ -460,7 +490,10 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
if (events->len > 0)
strbuf_addf(events, ",");
- metricgroup__add_metric_weak_group(events, ids, idnum);
+ if (metricgroup__has_constraint(pe))
+ metricgroup__add_metric_non_group(events, ids, idnum);
+ else
+ metricgroup__add_metric_weak_group(events, ids, idnum);
eg = malloc(sizeof(struct egroup));
if (!eg) {
@@ -544,6 +577,13 @@ int metricgroup__parse_groups(const struct option *opt,
strbuf_release(&extra_events);
ret = metricgroup__setup_events(&group_list, perf_evlist,
metric_events);
+
+ if (violate_nmi_constraint) {
+ pr_warning("Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:\n"
+ " echo 0 > /proc/sys/kernel/nmi_watchdog\n"
+ " perf stat ...\n"
+ " echo 1 > /proc/sys/kernel/nmi_watchdog\n");
+ }
out:
metricgroup__free_egroups(&group_list);
return ret;
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 5/5] perf vendor events: Add NO_NMI_WATCHDOG metric constraint
2020-02-19 19:08 [PATCH 0/5] Support metric group constraint kan.liang
` (3 preceding siblings ...)
2020-02-19 19:08 ` [PATCH 4/5] perf metricgroup: Support metric constraint kan.liang
@ 2020-02-19 19:08 ` kan.liang
2020-02-20 11:39 ` [PATCH 0/5] Support metric group constraint Jiri Olsa
5 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2020-02-19 19:08 UTC (permalink / raw)
To: acme, jolsa, mingo, peterz, linux-kernel
Cc: mark.rutland, namhyung, ravi.bangoria, yao.jin, ak, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Add NO_NMI_WATCHDOG metric constraint to Page_Walks_Utilization for
Sky Lake and Cascade Lake.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json | 3 ++-
tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json | 3 ++-
tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json | 3 ++-
3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
index f946532..a728c6e 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
@@ -215,7 +215,8 @@
"BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
"MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )",
"MetricGroup": "TLB",
- "MetricName": "Page_Walks_Utilization"
+ "MetricName": "Page_Walks_Utilization",
+ "MetricConstraint": "NO_NMI_WATCHDOG"
},
{
"BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
index e7feb60..f97e831 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
@@ -215,7 +215,8 @@
"BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
"MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )",
"MetricGroup": "TLB",
- "MetricName": "Page_Walks_Utilization"
+ "MetricName": "Page_Walks_Utilization",
+ "MetricConstraint": "NO_NMI_WATCHDOG"
},
{
"BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
index 21d7a0c..35f5db1 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
@@ -215,7 +215,8 @@
"BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
"MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * cycles )",
"MetricGroup": "TLB",
- "MetricName": "Page_Walks_Utilization"
+ "MetricName": "Page_Walks_Utilization",
+ "MetricConstraint": "NO_NMI_WATCHDOG"
},
{
"BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 4/5] perf metricgroup: Support metric constraint
2020-02-19 19:08 ` [PATCH 4/5] perf metricgroup: Support metric constraint kan.liang
@ 2020-02-20 11:35 ` Jiri Olsa
2020-02-20 16:14 ` Liang, Kan
0 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2020-02-20 11:35 UTC (permalink / raw)
To: kan.liang
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On Wed, Feb 19, 2020 at 11:08:39AM -0800, kan.liang@linux.intel.com wrote:
SNIP
> +static bool violate_nmi_constraint;
> +
> +static bool metricgroup__has_constraint(struct pmu_event *pe)
> +{
> + if (!pe->metric_constraint)
> + return false;
> +
> + if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
> + sysctl__nmi_watchdog_enabled()) {
> + pr_warning("Splitting metric group %s into standalone metrics.\n",
> + pe->metric_name);
> + violate_nmi_constraint = true;
no static flags plz.. can't you just print that rest of the warning in here?
jirka
> + return true;
> + }
> +
> + return false;
> +}
> +
> static int metricgroup__add_metric(const char *metric, struct strbuf *events,
> struct list_head *group_list)
> {
> @@ -460,7 +490,10 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
> if (events->len > 0)
> strbuf_addf(events, ",");
>
> - metricgroup__add_metric_weak_group(events, ids, idnum);
> + if (metricgroup__has_constraint(pe))
> + metricgroup__add_metric_non_group(events, ids, idnum);
> + else
> + metricgroup__add_metric_weak_group(events, ids, idnum);
>
> eg = malloc(sizeof(struct egroup));
> if (!eg) {
> @@ -544,6 +577,13 @@ int metricgroup__parse_groups(const struct option *opt,
> strbuf_release(&extra_events);
> ret = metricgroup__setup_events(&group_list, perf_evlist,
> metric_events);
> +
> + if (violate_nmi_constraint) {
> + pr_warning("Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:\n"
> + " echo 0 > /proc/sys/kernel/nmi_watchdog\n"
> + " perf stat ...\n"
> + " echo 1 > /proc/sys/kernel/nmi_watchdog\n");
> + }
> out:
> metricgroup__free_egroups(&group_list);
> return ret;
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Support metric group constraint
2020-02-19 19:08 [PATCH 0/5] Support metric group constraint kan.liang
` (4 preceding siblings ...)
2020-02-19 19:08 ` [PATCH 5/5] perf vendor events: Add NO_NMI_WATCHDOG " kan.liang
@ 2020-02-20 11:39 ` Jiri Olsa
2020-02-20 16:03 ` Liang, Kan
5 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2020-02-20 11:39 UTC (permalink / raw)
To: kan.liang
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On Wed, Feb 19, 2020 at 11:08:35AM -0800, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Some metric groups, e.g. Page_Walks_Utilization, will never count when
> NMI watchdog is enabled.
>
> $echo 1 > /proc/sys/kernel/nmi_watchdog
> $perf stat -M Page_Walks_Utilization
>
> Performance counter stats for 'system wide':
>
> <not counted> itlb_misses.walk_pending (0.00%)
> <not counted> dtlb_load_misses.walk_pending (0.00%)
> <not counted> dtlb_store_misses.walk_pending (0.00%)
> <not counted> ept.walk_pending (0.00%)
> <not counted> cycles (0.00%)
>
> 2.343460588 seconds time elapsed
>
> Some events weren't counted. Try disabling the NMI watchdog:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> The events in group usually have to be from the same PMU. Try
> reorganizing the group.
>
> A metric group is a weak group, which relies on group validation
> code in the kernel to determine whether to be opened as a group or
> a non-group. However, group validation code may return false-positives,
> especially when NMI watchdog is enabled. (The metric group is allowed
> as a group but will never be scheduled.)
>
> The attempt to fix the group validation code has been rejected.
> https://lore.kernel.org/lkml/20200117091341.GX2827@hirez.programming.kicks-ass.net/
> Because we cannot accurately predict whether the group can be scheduled
> as a group, only by checking current status.
>
> This patch set provides another solution to mitigate the issue.
> Add "MetricConstraint" in event list, which provides a hint for perf tool,
> e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
> metric group to non-group (standalone metrics) if NMI watchdog is enabled.
the problem is in the missing counter, that's taken by NMI watchdog, right?
and it's problem for any metric that won't fit to the available
counters.. shouldn't we rather do this workaround for any metric
that wouldn't fit in available counters? not just for chosen
ones..?
thanks,
jirka
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Support metric group constraint
2020-02-20 11:39 ` [PATCH 0/5] Support metric group constraint Jiri Olsa
@ 2020-02-20 16:03 ` Liang, Kan
2020-02-20 16:43 ` Andi Kleen
2020-02-21 13:18 ` Jiri Olsa
0 siblings, 2 replies; 17+ messages in thread
From: Liang, Kan @ 2020-02-20 16:03 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On 2/20/2020 6:39 AM, Jiri Olsa wrote:
> On Wed, Feb 19, 2020 at 11:08:35AM -0800, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Some metric groups, e.g. Page_Walks_Utilization, will never count when
>> NMI watchdog is enabled.
>>
>> $echo 1 > /proc/sys/kernel/nmi_watchdog
>> $perf stat -M Page_Walks_Utilization
>>
>> Performance counter stats for 'system wide':
>>
>> <not counted> itlb_misses.walk_pending (0.00%)
>> <not counted> dtlb_load_misses.walk_pending (0.00%)
>> <not counted> dtlb_store_misses.walk_pending (0.00%)
>> <not counted> ept.walk_pending (0.00%)
>> <not counted> cycles (0.00%)
>>
>> 2.343460588 seconds time elapsed
>>
>> Some events weren't counted. Try disabling the NMI watchdog:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>> The events in group usually have to be from the same PMU. Try
>> reorganizing the group.
>>
>> A metric group is a weak group, which relies on group validation
>> code in the kernel to determine whether to be opened as a group or
>> a non-group. However, group validation code may return false-positives,
>> especially when NMI watchdog is enabled. (The metric group is allowed
>> as a group but will never be scheduled.)
>>
>> The attempt to fix the group validation code has been rejected.
>> https://lore.kernel.org/lkml/20200117091341.GX2827@hirez.programming.kicks-ass.net/
>> Because we cannot accurately predict whether the group can be scheduled
>> as a group, only by checking current status.
>>
>> This patch set provides another solution to mitigate the issue.
>> Add "MetricConstraint" in event list, which provides a hint for perf tool,
>> e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
>> metric group to non-group (standalone metrics) if NMI watchdog is enabled.
>
> the problem is in the missing counter, that's taken by NMI watchdog, right?
>
> and it's problem for any metric that won't fit to the available
> counters.. shouldn't we rather do this workaround for any metric
> that wouldn't fit in available counters?
I think current perf already did this.
All metric groups are weak group. Kernel (validate_group()) tells perf
tool whether a metric group fit to available counters.
If yes, the metric group will be scheduled as a group.
If no, perf tool will not using a group and re-try. The code is as below.
try_again:
if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
/* Weak group failed. Reset the group. */
if ((errno == EINVAL || errno == EBADF) &&
counter->leader != counter &&
counter->weak_group) {
counter = perf_evlist__reset_weak_group(evsel_list, counter);
goto try_again;
}
This patch set is to workaroud the false-positives from the kernel.
Kernel only validate the group itself. It assumes that all counters are
available. So, when any counter is permanently occupied by others, e.g.
watchdog, the validate_group() may return false-positives.
? not just for chosen
> ones..?
For now, I think we only need to workaround the Page_Walks_Utilization
metric group. Because it has 5 events, and one of the events is cycles.
The cycles has to be scheduled on fixed counter 2. But it's already
occupied by watchdog.
The false-positives of validate_group() will trigger the issue (metric
group never be scheduled.).
For other metric groups, even they have cycles, the issue should not be
triggered.
For example, if they have 4 or less events, the cycles can be scheduled
to GP counter instead.
If they have 6 or more events, the weak group will be reject anyway.
Perf tool will open it as non-group (standalone metrics).
I think we only need to apply the constraint for the
Page_Walks_Utilization metric group for now.
Thanks,
Kan
>
> thanks,
> jirka
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/5] perf metricgroup: Support metric constraint
2020-02-20 11:35 ` Jiri Olsa
@ 2020-02-20 16:14 ` Liang, Kan
2020-02-21 13:09 ` Jiri Olsa
0 siblings, 1 reply; 17+ messages in thread
From: Liang, Kan @ 2020-02-20 16:14 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On 2/20/2020 6:35 AM, Jiri Olsa wrote:
> On Wed, Feb 19, 2020 at 11:08:39AM -0800, kan.liang@linux.intel.com wrote:
>
> SNIP
>
>> +static bool violate_nmi_constraint;
>> +
>> +static bool metricgroup__has_constraint(struct pmu_event *pe)
>> +{
>> + if (!pe->metric_constraint)
>> + return false;
>> +
>> + if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
>> + sysctl__nmi_watchdog_enabled()) {
>> + pr_warning("Splitting metric group %s into standalone metrics.\n",
>> + pe->metric_name);
>> + violate_nmi_constraint = true;
>
> no static flags plz.. can't you just print that rest of the warning in here?
>
Because we only want to print the NMI watchdog warning once.
If there are more than one metric groups with constraint, the warning
may be printed several times. For example,
$ perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
Splitting metric group Page_Walks_Utilization into standalone metrics.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Splitting metric group Page_Walks_Utilization into standalone metrics.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Is it OK?
If it's OK, I think we can remove the flag.
Thanks,
Kan
> jirka
>
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
>> static int metricgroup__add_metric(const char *metric, struct strbuf *events,
>> struct list_head *group_list)
>> {
>> @@ -460,7 +490,10 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
>> if (events->len > 0)
>> strbuf_addf(events, ",");
>>
>> - metricgroup__add_metric_weak_group(events, ids, idnum);
>> + if (metricgroup__has_constraint(pe))
>> + metricgroup__add_metric_non_group(events, ids, idnum);
>> + else
>> + metricgroup__add_metric_weak_group(events, ids, idnum);
>>
>> eg = malloc(sizeof(struct egroup));
>> if (!eg) {
>> @@ -544,6 +577,13 @@ int metricgroup__parse_groups(const struct option *opt,
>> strbuf_release(&extra_events);
>> ret = metricgroup__setup_events(&group_list, perf_evlist,
>> metric_events);
>> +
>> + if (violate_nmi_constraint) {
>> + pr_warning("Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:\n"
>> + " echo 0 > /proc/sys/kernel/nmi_watchdog\n"
>> + " perf stat ...\n"
>> + " echo 1 > /proc/sys/kernel/nmi_watchdog\n");
>> + }
>> out:
>> metricgroup__free_egroups(&group_list);
>> return ret;
>> --
>> 2.7.4
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Support metric group constraint
2020-02-20 16:03 ` Liang, Kan
@ 2020-02-20 16:43 ` Andi Kleen
2020-02-20 19:25 ` Liang, Kan
2020-02-21 13:18 ` Jiri Olsa
1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2020-02-20 16:43 UTC (permalink / raw)
To: Liang, Kan
Cc: Jiri Olsa, acme, mingo, peterz, linux-kernel, mark.rutland,
namhyung, ravi.bangoria, yao.jin
> For other metric groups, even they have cycles, the issue should not be
> triggered.
> For example, if they have 4 or less events, the cycles can be scheduled to
> GP counter instead.
> If they have 6 or more events, the weak group will be reject anyway.
> Perf tool will open it as non-group (standalone metrics).
Technically it can also happen for 9 events with Hyper Threading off or
on Icelake (8 generic counters)
I didn't think we had any of those, but please double check.
-Andi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Support metric group constraint
2020-02-20 16:43 ` Andi Kleen
@ 2020-02-20 19:25 ` Liang, Kan
0 siblings, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2020-02-20 19:25 UTC (permalink / raw)
To: Andi Kleen
Cc: Jiri Olsa, acme, mingo, peterz, linux-kernel, mark.rutland,
namhyung, ravi.bangoria, yao.jin
On 2/20/2020 11:43 AM, Andi Kleen wrote:
>> For other metric groups, even they have cycles, the issue should not be
>> triggered.
>> For example, if they have 4 or less events, the cycles can be scheduled to
>> GP counter instead.
>> If they have 6 or more events, the weak group will be reject anyway.
>> Perf tool will open it as non-group (standalone metrics).
>
> Technically it can also happen for 9 events with Hyper Threading off or
> on Icelake (8 generic counters)
>
> I didn't think we had any of those, but please double check.
>
I checked all public metrics groups. Right, we don't have such metrics
group with 8 GP events + 1 cycles.
We only need to add watchdog constraint for Page_Walks_Utilization for now.
Thanks,
Kan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/5] perf metricgroup: Support metric constraint
2020-02-20 16:14 ` Liang, Kan
@ 2020-02-21 13:09 ` Jiri Olsa
2020-02-21 14:30 ` Liang, Kan
0 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2020-02-21 13:09 UTC (permalink / raw)
To: Liang, Kan
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On Thu, Feb 20, 2020 at 11:14:09AM -0500, Liang, Kan wrote:
>
>
> On 2/20/2020 6:35 AM, Jiri Olsa wrote:
> > On Wed, Feb 19, 2020 at 11:08:39AM -0800, kan.liang@linux.intel.com wrote:
> >
> > SNIP
> >
> > > +static bool violate_nmi_constraint;
> > > +
> > > +static bool metricgroup__has_constraint(struct pmu_event *pe)
> > > +{
> > > + if (!pe->metric_constraint)
> > > + return false;
> > > +
> > > + if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
> > > + sysctl__nmi_watchdog_enabled()) {
> > > + pr_warning("Splitting metric group %s into standalone metrics.\n",
> > > + pe->metric_name);
> > > + violate_nmi_constraint = true;
> >
> > no static flags plz.. can't you just print that rest of the warning in here?
> >
>
> Because we only want to print the NMI watchdog warning once.
> If there are more than one metric groups with constraint, the warning may be
> printed several times. For example,
> $ perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
> Splitting metric group Page_Walks_Utilization into standalone metrics.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
> constraint:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> Splitting metric group Page_Walks_Utilization into standalone metrics.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
> constraint:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> Is it OK?
>
> If it's OK, I think we can remove the flag.
we use the 'print once' static flags in functions,
so plz keep it inside like WARN_ONCE, or use it directly
jirka
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Support metric group constraint
2020-02-20 16:03 ` Liang, Kan
2020-02-20 16:43 ` Andi Kleen
@ 2020-02-21 13:18 ` Jiri Olsa
1 sibling, 0 replies; 17+ messages in thread
From: Jiri Olsa @ 2020-02-21 13:18 UTC (permalink / raw)
To: Liang, Kan
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On Thu, Feb 20, 2020 at 11:03:35AM -0500, Liang, Kan wrote:
>
>
> On 2/20/2020 6:39 AM, Jiri Olsa wrote:
> > On Wed, Feb 19, 2020 at 11:08:35AM -0800, kan.liang@linux.intel.com wrote:
> > > From: Kan Liang <kan.liang@linux.intel.com>
> > >
> > > Some metric groups, e.g. Page_Walks_Utilization, will never count when
> > > NMI watchdog is enabled.
> > >
> > > $echo 1 > /proc/sys/kernel/nmi_watchdog
> > > $perf stat -M Page_Walks_Utilization
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > <not counted> itlb_misses.walk_pending (0.00%)
> > > <not counted> dtlb_load_misses.walk_pending (0.00%)
> > > <not counted> dtlb_store_misses.walk_pending (0.00%)
> > > <not counted> ept.walk_pending (0.00%)
> > > <not counted> cycles (0.00%)
> > >
> > > 2.343460588 seconds time elapsed
> > >
> > > Some events weren't counted. Try disabling the NMI watchdog:
> > > echo 0 > /proc/sys/kernel/nmi_watchdog
> > > perf stat ...
> > > echo 1 > /proc/sys/kernel/nmi_watchdog
> > > The events in group usually have to be from the same PMU. Try
> > > reorganizing the group.
> > >
> > > A metric group is a weak group, which relies on group validation
> > > code in the kernel to determine whether to be opened as a group or
> > > a non-group. However, group validation code may return false-positives,
> > > especially when NMI watchdog is enabled. (The metric group is allowed
> > > as a group but will never be scheduled.)
> > >
> > > The attempt to fix the group validation code has been rejected.
> > > https://lore.kernel.org/lkml/20200117091341.GX2827@hirez.programming.kicks-ass.net/
> > > Because we cannot accurately predict whether the group can be scheduled
> > > as a group, only by checking current status.
> > >
> > > This patch set provides another solution to mitigate the issue.
> > > Add "MetricConstraint" in event list, which provides a hint for perf tool,
> > > e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
> > > metric group to non-group (standalone metrics) if NMI watchdog is enabled.
> >
> > the problem is in the missing counter, that's taken by NMI watchdog, right?
> >
> > and it's problem for any metric that won't fit to the available
> > counters.. shouldn't we rather do this workaround for any metric
> > that wouldn't fit in available counters?
>
> I think current perf already did this.
> All metric groups are weak group. Kernel (validate_group()) tells perf tool
> whether a metric group fit to available counters.
> If yes, the metric group will be scheduled as a group.
> If no, perf tool will not using a group and re-try. The code is as below.
>
> try_again:
> if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
>
> /* Weak group failed. Reset the group. */
> if ((errno == EINVAL || errno == EBADF) &&
> counter->leader != counter &&
> counter->weak_group) {
> counter = perf_evlist__reset_weak_group(evsel_list, counter);
> goto try_again;
> }
>
>
> This patch set is to workaroud the false-positives from the kernel.
ah so validate_group will say the group can go on, but then
the pmu add will fail because the countters are occupied
thanks for explanation, looks good
jirka
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/5] perf metricgroup: Support metric constraint
2020-02-21 13:09 ` Jiri Olsa
@ 2020-02-21 14:30 ` Liang, Kan
2020-02-21 14:48 ` Jiri Olsa
0 siblings, 1 reply; 17+ messages in thread
From: Liang, Kan @ 2020-02-21 14:30 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On 2/21/2020 8:09 AM, Jiri Olsa wrote:
> On Thu, Feb 20, 2020 at 11:14:09AM -0500, Liang, Kan wrote:
>>
>>
>> On 2/20/2020 6:35 AM, Jiri Olsa wrote:
>>> On Wed, Feb 19, 2020 at 11:08:39AM -0800, kan.liang@linux.intel.com wrote:
>>>
>>> SNIP
>>>
>>>> +static bool violate_nmi_constraint;
>>>> +
>>>> +static bool metricgroup__has_constraint(struct pmu_event *pe)
>>>> +{
>>>> + if (!pe->metric_constraint)
>>>> + return false;
>>>> +
>>>> + if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
>>>> + sysctl__nmi_watchdog_enabled()) {
>>>> + pr_warning("Splitting metric group %s into standalone metrics.\n",
>>>> + pe->metric_name);
>>>> + violate_nmi_constraint = true;
>>>
>>> no static flags plz.. can't you just print that rest of the warning in here?
>>>
>>
>> Because we only want to print the NMI watchdog warning once.
>> If there are more than one metric groups with constraint, the warning may be
>> printed several times. For example,
>> $ perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
>> constraint:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
>> constraint:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>> Is it OK?
>>
>> If it's OK, I think we can remove the flag.
>
> we use the 'print once' static flags in functions,
> so plz keep it inside like WARN_ONCE, or use it directly
>
If using WARN_ONCE, the warning is always printed for the first
violation. For example,
#perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
Splitting metric group Page_Walks_Utilization into standalone metrics.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Splitting metric group Page_Walks_Utilization into standalone metrics.
The output of current patch is as below.
#perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
Splitting metric group Page_Walks_Utilization into standalone metrics.
Splitting metric group Page_Walks_Utilization into standalone metrics.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Personally, I think the output of current patch looks better.
But there is nothing wrong with the output of WARN_ONCE.
Should I use WARN_ONCE in next V2?
Thanks,
Kan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/5] perf metricgroup: Support metric constraint
2020-02-21 14:30 ` Liang, Kan
@ 2020-02-21 14:48 ` Jiri Olsa
2020-02-21 15:42 ` Liang, Kan
0 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2020-02-21 14:48 UTC (permalink / raw)
To: Liang, Kan
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On Fri, Feb 21, 2020 at 09:30:15AM -0500, Liang, Kan wrote:
>
>
> On 2/21/2020 8:09 AM, Jiri Olsa wrote:
> > On Thu, Feb 20, 2020 at 11:14:09AM -0500, Liang, Kan wrote:
> > >
> > >
> > > On 2/20/2020 6:35 AM, Jiri Olsa wrote:
> > > > On Wed, Feb 19, 2020 at 11:08:39AM -0800, kan.liang@linux.intel.com wrote:
> > > >
> > > > SNIP
> > > >
> > > > > +static bool violate_nmi_constraint;
> > > > > +
> > > > > +static bool metricgroup__has_constraint(struct pmu_event *pe)
> > > > > +{
> > > > > + if (!pe->metric_constraint)
> > > > > + return false;
> > > > > +
> > > > > + if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
> > > > > + sysctl__nmi_watchdog_enabled()) {
> > > > > + pr_warning("Splitting metric group %s into standalone metrics.\n",
> > > > > + pe->metric_name);
> > > > > + violate_nmi_constraint = true;
> > > >
> > > > no static flags plz.. can't you just print that rest of the warning in here?
> > > >
> > >
> > > Because we only want to print the NMI watchdog warning once.
> > > If there are more than one metric groups with constraint, the warning may be
> > > printed several times. For example,
> > > $ perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
> > > Splitting metric group Page_Walks_Utilization into standalone metrics.
> > > Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
> > > constraint:
> > > echo 0 > /proc/sys/kernel/nmi_watchdog
> > > perf stat ...
> > > echo 1 > /proc/sys/kernel/nmi_watchdog
> > > Splitting metric group Page_Walks_Utilization into standalone metrics.
> > > Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
> > > constraint:
> > > echo 0 > /proc/sys/kernel/nmi_watchdog
> > > perf stat ...
> > > echo 1 > /proc/sys/kernel/nmi_watchdog
> > > Is it OK?
> > >
> > > If it's OK, I think we can remove the flag.
> >
> > we use the 'print once' static flags in functions,
> > so plz keep it inside like WARN_ONCE, or use it directly
> >
>
> If using WARN_ONCE, the warning is always printed for the first violation.
> For example,
>
> #perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
> Splitting metric group Page_Walks_Utilization into standalone metrics.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> Splitting metric group Page_Walks_Utilization into standalone metrics.
>
>
> The output of current patch is as below.
> #perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
> Splitting metric group Page_Walks_Utilization into standalone metrics.
> Splitting metric group Page_Walks_Utilization into standalone metrics.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
>
>
> Personally, I think the output of current patch looks better.
> But there is nothing wrong with the output of WARN_ONCE.
>
> Should I use WARN_ONCE in next V2?
I just wanted you to keep that static flag inside the function,
so we don't have another static variable used across the code
if the WARN_ONCE does not fit, just use your own flag inside
the function
jirka
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/5] perf metricgroup: Support metric constraint
2020-02-21 14:48 ` Jiri Olsa
@ 2020-02-21 15:42 ` Liang, Kan
0 siblings, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2020-02-21 15:42 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, mingo, peterz, linux-kernel, mark.rutland, namhyung,
ravi.bangoria, yao.jin, ak
On 2/21/2020 9:48 AM, Jiri Olsa wrote:
> On Fri, Feb 21, 2020 at 09:30:15AM -0500, Liang, Kan wrote:
>>
>>
>> On 2/21/2020 8:09 AM, Jiri Olsa wrote:
>>> On Thu, Feb 20, 2020 at 11:14:09AM -0500, Liang, Kan wrote:
>>>>
>>>>
>>>> On 2/20/2020 6:35 AM, Jiri Olsa wrote:
>>>>> On Wed, Feb 19, 2020 at 11:08:39AM -0800, kan.liang@linux.intel.com wrote:
>>>>>
>>>>> SNIP
>>>>>
>>>>>> +static bool violate_nmi_constraint;
>>>>>> +
>>>>>> +static bool metricgroup__has_constraint(struct pmu_event *pe)
>>>>>> +{
>>>>>> + if (!pe->metric_constraint)
>>>>>> + return false;
>>>>>> +
>>>>>> + if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
>>>>>> + sysctl__nmi_watchdog_enabled()) {
>>>>>> + pr_warning("Splitting metric group %s into standalone metrics.\n",
>>>>>> + pe->metric_name);
>>>>>> + violate_nmi_constraint = true;
>>>>>
>>>>> no static flags plz.. can't you just print that rest of the warning in here?
>>>>>
>>>>
>>>> Because we only want to print the NMI watchdog warning once.
>>>> If there are more than one metric groups with constraint, the warning may be
>>>> printed several times. For example,
>>>> $ perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
>>>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>>>> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
>>>> constraint:
>>>> echo 0 > /proc/sys/kernel/nmi_watchdog
>>>> perf stat ...
>>>> echo 1 > /proc/sys/kernel/nmi_watchdog
>>>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>>>> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric
>>>> constraint:
>>>> echo 0 > /proc/sys/kernel/nmi_watchdog
>>>> perf stat ...
>>>> echo 1 > /proc/sys/kernel/nmi_watchdog
>>>> Is it OK?
>>>>
>>>> If it's OK, I think we can remove the flag.
>>>
>>> we use the 'print once' static flags in functions,
>>> so plz keep it inside like WARN_ONCE, or use it directly
>>>
>>
>> If using WARN_ONCE, the warning is always printed for the first violation.
>> For example,
>>
>> #perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>>
>>
>> The output of current patch is as below.
>> #perf stat -M Page_Walks_Utilization,Page_Walks_Utilization
>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>> Splitting metric group Page_Walks_Utilization into standalone metrics.
>> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>>
>>
>> Personally, I think the output of current patch looks better.
>> But there is nothing wrong with the output of WARN_ONCE.
>>
>> Should I use WARN_ONCE in next V2?
>
> I just wanted you to keep that static flag inside the function,
> so we don't have another static variable used across the code
>
> if the WARN_ONCE does not fit, just use your own flag inside
> the function
Here is the current code flow. The metric groups are processed one by
one. When perf tool processes the metric group in
metricgroup__has_constraint(), we have no idea whether the following
groups break the constraint.
metricgroup__parse_groups():
metricgroup__add_metric_list():
while ((p = strsep(&llist, ",")) != NULL) {
metricgroup__add_metric():
metricgroup__has_constraint()
}
The watchdog hint has to be printed after all metric groups are processed.
What about the patch as below?
A dedicated function for warnings is introduced. But it has to be
invoked twice. One is in the middle of processing. The other is after
all metric groups are processed.
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index f9a9b50..c3a8c70 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -441,7 +441,24 @@ static void
metricgroup__add_metric_non_group(struct strbuf *events,
strbuf_addf(events, ",%s", ids[i]);
}
-static bool violate_nmi_constraint;
+static void metricgroup___watchdog_constraint_hint(const char *name,
bool foot)
+{
+ static bool violate_nmi_constraint;
+
+ if (!foot) {
+ pr_warning("Splitting metric group %s into standalone metrics.\n", name);
+ violate_nmi_constraint = true;
+ return;
+ }
+
+ if (!violate_nmi_constraint)
+ return;
+
+ pr_warning("Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG
metric constraint:\n"
+ " echo 0 > /proc/sys/kernel/nmi_watchdog\n"
+ " perf stat ...\n"
+ " echo 1 > /proc/sys/kernel/nmi_watchdog\n");
+}
static bool metricgroup__has_constraint(struct pmu_event *pe)
{
@@ -450,9 +467,7 @@ static bool metricgroup__has_constraint(struct
pmu_event *pe)
if (!strcmp(pe->metric_constraint, "NO_NMI_WATCHDOG") &&
sysctl__nmi_watchdog_enabled()) {
- pr_warning("Splitting metric group %s into standalone metrics.\n",
- pe->metric_name);
- violate_nmi_constraint = true;
+ metricgroup___watchdog_constraint_hint(pe->metric_name, false);
return true;
}
@@ -535,6 +550,10 @@ static int metricgroup__add_metric_list(const char
*list, struct strbuf *events,
}
}
free(nlist);
+
+ if (!ret)
+ metricgroup___watchdog_constraint_hint(NULL, true);
+
return ret;
}
@@ -577,13 +596,6 @@ int metricgroup__parse_groups(const struct option *opt,
strbuf_release(&extra_events);
ret = metricgroup__setup_events(&group_list, perf_evlist,
metric_events);
-
- if (violate_nmi_constraint) {
- pr_warning("Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG
metric constraint:\n"
- " echo 0 > /proc/sys/kernel/nmi_watchdog\n"
- " perf stat ...\n"
- " echo 1 > /proc/sys/kernel/nmi_watchdog\n");
- }
out:
metricgroup__free_egroups(&group_list);
return ret;
Thanks,
Kan
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-02-21 15:42 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19 19:08 [PATCH 0/5] Support metric group constraint kan.liang
2020-02-19 19:08 ` [PATCH 1/5] perf jevents: Support metric constraint kan.liang
2020-02-19 19:08 ` [PATCH 2/5] perf metricgroup: Factor out metricgroup__add_metric_weak_group() kan.liang
2020-02-19 19:08 ` [PATCH 3/5] perf util: Factor out sysctl__nmi_watchdog_enabled() kan.liang
2020-02-19 19:08 ` [PATCH 4/5] perf metricgroup: Support metric constraint kan.liang
2020-02-20 11:35 ` Jiri Olsa
2020-02-20 16:14 ` Liang, Kan
2020-02-21 13:09 ` Jiri Olsa
2020-02-21 14:30 ` Liang, Kan
2020-02-21 14:48 ` Jiri Olsa
2020-02-21 15:42 ` Liang, Kan
2020-02-19 19:08 ` [PATCH 5/5] perf vendor events: Add NO_NMI_WATCHDOG " kan.liang
2020-02-20 11:39 ` [PATCH 0/5] Support metric group constraint Jiri Olsa
2020-02-20 16:03 ` Liang, Kan
2020-02-20 16:43 ` Andi Kleen
2020-02-20 19:25 ` Liang, Kan
2020-02-21 13:18 ` Jiri Olsa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).