All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-08-21  8:36 ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Changes since v6:
- Supplement the omitted EventCode;
- Keep the original way of ConfigCode;
- Supplement the test in empty-pmu-events.c, so that the pmu event test
  can succeed when build with NO_JEVENT=1.
- Link: https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/

Jing Zhang (8):
  perf pmu: "Compat" supports matching multiple identifiers
  perf metric: "Compat" supports matching multiple identifiers
  perf vendor events: Supplement the omitted EventCode
  perf jevents: Support more event fields
  perf test: Make matching_pmu effective
  perf test: Add pmu-event test for "Compat" and new event_field.
  perf jevents: Add support for Arm CMN PMU aliasing
  perf vendor events: Add JSON metrics for Arm CMN

 .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
 .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
 .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
 .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
 .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
 .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
 .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
 .../arch/x86/broadwellde/uncore-cache.json         |   2 +
 .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
 .../arch/x86/broadwellde/uncore-memory.json        |   1 +
 .../arch/x86/broadwellde/uncore-power.json         |   1 +
 .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
 .../arch/x86/broadwellx/uncore-cache.json          |   2 +
 .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
 .../arch/x86/broadwellx/uncore-memory.json         |   2 +
 .../arch/x86/broadwellx/uncore-power.json          |   1 +
 .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
 .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
 .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
 .../arch/x86/cascadelakex/uncore-io.json           |   1 +
 .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
 .../arch/x86/cascadelakex/uncore-power.json        |   1 +
 .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
 .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
 .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
 .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
 .../arch/x86/graniterapids/pipeline.json           |   4 +
 .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
 .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
 .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
 .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
 .../arch/x86/haswellx/uncore-memory.json           |   2 +
 .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
 .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
 .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
 .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
 .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
 .../arch/x86/icelakex/uncore-memory.json           |   1 +
 .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
 .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
 .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
 .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
 .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
 .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
 .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
 .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
 .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
 .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
 .../arch/x86/jaketown/uncore-memory.json           |   1 +
 .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
 .../arch/x86/knightslanding/pipeline.json          |   3 +
 .../arch/x86/knightslanding/uncore-cache.json      |   1 +
 .../arch/x86/knightslanding/uncore-memory.json     |   4 +
 .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
 .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
 .../arch/x86/sapphirerapids/pipeline.json          |   5 +
 .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
 .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
 .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
 .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
 .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
 .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
 .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
 .../arch/x86/skylakex/uncore-memory.json           |   1 +
 .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
 .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
 .../arch/x86/snowridgex/uncore-cache.json          |   1 +
 .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
 .../arch/x86/snowridgex/uncore-memory.json         |   1 +
 .../arch/x86/snowridgex/uncore-power.json          |   1 +
 .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
 tools/perf/pmu-events/empty-pmu-events.c           |   8 +
 tools/perf/pmu-events/jevents.py                   |  21 +-
 tools/perf/tests/pmu-events.c                      |  64 ++++-
 tools/perf/util/metricgroup.c                      |   2 +-
 tools/perf/util/pmu.c                              |  33 ++-
 tools/perf/util/pmu.h                              |   1 +
 77 files changed, 679 insertions(+), 9 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-08-21  8:36 ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Changes since v6:
- Supplement the omitted EventCode;
- Keep the original way of ConfigCode;
- Supplement the test in empty-pmu-events.c, so that the pmu event test
  can succeed when build with NO_JEVENT=1.
- Link: https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/

Jing Zhang (8):
  perf pmu: "Compat" supports matching multiple identifiers
  perf metric: "Compat" supports matching multiple identifiers
  perf vendor events: Supplement the omitted EventCode
  perf jevents: Support more event fields
  perf test: Make matching_pmu effective
  perf test: Add pmu-event test for "Compat" and new event_field.
  perf jevents: Add support for Arm CMN PMU aliasing
  perf vendor events: Add JSON metrics for Arm CMN

 .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
 .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
 .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
 .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
 .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
 .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
 .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
 .../arch/x86/broadwellde/uncore-cache.json         |   2 +
 .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
 .../arch/x86/broadwellde/uncore-memory.json        |   1 +
 .../arch/x86/broadwellde/uncore-power.json         |   1 +
 .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
 .../arch/x86/broadwellx/uncore-cache.json          |   2 +
 .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
 .../arch/x86/broadwellx/uncore-memory.json         |   2 +
 .../arch/x86/broadwellx/uncore-power.json          |   1 +
 .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
 .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
 .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
 .../arch/x86/cascadelakex/uncore-io.json           |   1 +
 .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
 .../arch/x86/cascadelakex/uncore-power.json        |   1 +
 .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
 .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
 .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
 .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
 .../arch/x86/graniterapids/pipeline.json           |   4 +
 .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
 .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
 .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
 .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
 .../arch/x86/haswellx/uncore-memory.json           |   2 +
 .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
 .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
 .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
 .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
 .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
 .../arch/x86/icelakex/uncore-memory.json           |   1 +
 .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
 .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
 .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
 .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
 .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
 .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
 .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
 .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
 .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
 .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
 .../arch/x86/jaketown/uncore-memory.json           |   1 +
 .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
 .../arch/x86/knightslanding/pipeline.json          |   3 +
 .../arch/x86/knightslanding/uncore-cache.json      |   1 +
 .../arch/x86/knightslanding/uncore-memory.json     |   4 +
 .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
 .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
 .../arch/x86/sapphirerapids/pipeline.json          |   5 +
 .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
 .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
 .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
 .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
 .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
 .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
 .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
 .../arch/x86/skylakex/uncore-memory.json           |   1 +
 .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
 .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
 .../arch/x86/snowridgex/uncore-cache.json          |   1 +
 .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
 .../arch/x86/snowridgex/uncore-memory.json         |   1 +
 .../arch/x86/snowridgex/uncore-power.json          |   1 +
 .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
 tools/perf/pmu-events/empty-pmu-events.c           |   8 +
 tools/perf/pmu-events/jevents.py                   |  21 +-
 tools/perf/tests/pmu-events.c                      |  64 ++++-
 tools/perf/util/metricgroup.c                      |   2 +-
 tools/perf/util/pmu.c                              |  33 ++-
 tools/perf/util/pmu.h                              |   1 +
 77 files changed, 679 insertions(+), 9 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json

-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-21  8:36   ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The jevent "Compat" is used for uncore PMU alias or metric definitions.

The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU event.
Since a Compat value can only match one identifier, when adding the
same event alias to PMUs with different identifiers, each identifier
needs to be defined once, which is not streamlined enough.

So let "Compat" supports matching multiple identifiers for uncore PMU
alias. For example, the Compat value {43401;436*} can match the PMU
identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
the prefix "436", that is, all CMN650, where "*" is a wildcard.
Tokens in Unit field are delimited by ';' with no spaces.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
 tools/perf/util/pmu.h |  1 +
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index ad209c8..6402423 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
 	return res;
 }
 
+bool pmu_uncore_identifier_match(const char *id, const char *compat)
+{
+	char *tmp = NULL, *tok, *str;
+	bool res;
+	int n;
+
+	/*
+	 * The strdup() call is necessary here because "compat" is a const str*
+	 * type and cannot be used as an argument to strtok_r().
+	 */
+	str = strdup(compat);
+	if (!str)
+		return false;
+
+	tok = strtok_r(str, ";", &tmp);
+	for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
+		n = strlen(tok);
+		if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
+		    !strcmp(id, tok)) {
+			res = true;
+			goto out;
+		}
+	}
+	res = false;
+out:
+	free(str);
+	return res;
+}
+
 struct pmu_add_cpu_aliases_map_data {
 	struct list_head *head;
 	const char *name;
@@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
 	if (!pe->compat || !pe->pmu)
 		return 0;
 
-	if (!strcmp(pmu->id, pe->compat) &&
-	    pmu_uncore_alias_match(pe->pmu, pmu->name)) {
+	if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
+	    pmu_uncore_identifier_match(pmu->id, pe->compat)) {
 		__perf_pmu__new_alias(idata->head, -1,
 				      (char *)pe->name,
 				      (char *)pe->desc,
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index b9a02de..9d4385d 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
 char *perf_pmu__getcpuid(struct perf_pmu *pmu);
 const struct pmu_events_table *pmu_events_table__find(void);
 const struct pmu_metrics_table *pmu_metrics_table__find(void);
+bool pmu_uncore_identifier_match(const char *id, const char *compat);
 void perf_pmu_free_alias(struct perf_pmu_alias *alias);
 
 int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
@ 2023-08-21  8:36   ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The jevent "Compat" is used for uncore PMU alias or metric definitions.

The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU event.
Since a Compat value can only match one identifier, when adding the
same event alias to PMUs with different identifiers, each identifier
needs to be defined once, which is not streamlined enough.

So let "Compat" supports matching multiple identifiers for uncore PMU
alias. For example, the Compat value {43401;436*} can match the PMU
identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
the prefix "436", that is, all CMN650, where "*" is a wildcard.
Tokens in Unit field are delimited by ';' with no spaces.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
 tools/perf/util/pmu.h |  1 +
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index ad209c8..6402423 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
 	return res;
 }
 
+bool pmu_uncore_identifier_match(const char *id, const char *compat)
+{
+	char *tmp = NULL, *tok, *str;
+	bool res;
+	int n;
+
+	/*
+	 * The strdup() call is necessary here because "compat" is a const str*
+	 * type and cannot be used as an argument to strtok_r().
+	 */
+	str = strdup(compat);
+	if (!str)
+		return false;
+
+	tok = strtok_r(str, ";", &tmp);
+	for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
+		n = strlen(tok);
+		if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
+		    !strcmp(id, tok)) {
+			res = true;
+			goto out;
+		}
+	}
+	res = false;
+out:
+	free(str);
+	return res;
+}
+
 struct pmu_add_cpu_aliases_map_data {
 	struct list_head *head;
 	const char *name;
@@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
 	if (!pe->compat || !pe->pmu)
 		return 0;
 
-	if (!strcmp(pmu->id, pe->compat) &&
-	    pmu_uncore_alias_match(pe->pmu, pmu->name)) {
+	if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
+	    pmu_uncore_identifier_match(pmu->id, pe->compat)) {
 		__perf_pmu__new_alias(idata->head, -1,
 				      (char *)pe->name,
 				      (char *)pe->desc,
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index b9a02de..9d4385d 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
 char *perf_pmu__getcpuid(struct perf_pmu *pmu);
 const struct pmu_events_table *pmu_events_table__find(void);
 const struct pmu_metrics_table *pmu_metrics_table__find(void);
+bool pmu_uncore_identifier_match(const char *id, const char *compat);
 void perf_pmu_free_alias(struct perf_pmu_alias *alias);
 
 int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 2/8] perf metric: "Compat" supports matching multiple identifiers
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-21  8:36   ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The jevent "Compat" is used for uncore PMU alias or metric definitions.

The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU metric.
Since a Compat value can only match one identifier, when adding the
same metric to PMUs with different identifiers, each identifier needs
to be defined once, which is not streamlined enough.

So let "Compat" supports matching multiple identifiers for uncore PMU
metric.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 tools/perf/util/metricgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 5e9c657..ff81bc5 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -477,7 +477,7 @@ static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
 
 	while ((pmu = perf_pmu__scan(pmu))) {
 
-		if (!pmu->id || strcmp(pmu->id, pm->compat))
+		if (!pmu->id || !pmu_uncore_identifier_match(pmu->id, pm->compat))
 			continue;
 
 		return d->fn(pm, table, d->data);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 2/8] perf metric: "Compat" supports matching multiple identifiers
@ 2023-08-21  8:36   ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The jevent "Compat" is used for uncore PMU alias or metric definitions.

The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU metric.
Since a Compat value can only match one identifier, when adding the
same metric to PMUs with different identifiers, each identifier needs
to be defined once, which is not streamlined enough.

So let "Compat" supports matching multiple identifiers for uncore PMU
metric.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 tools/perf/util/metricgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 5e9c657..ff81bc5 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -477,7 +477,7 @@ static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
 
 	while ((pmu = perf_pmu__scan(pmu))) {
 
-		if (!pmu->id || strcmp(pmu->id, pm->compat))
+		if (!pmu->id || !pmu_uncore_identifier_match(pmu->id, pm->compat))
 			continue;
 
 		return d->fn(pm, table, d->data);
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 3/8] perf vendor events: Supplement the omitted EventCode
  2023-08-21  8:36 ` Jing Zhang
                   ` (2 preceding siblings ...)
  (?)
@ 2023-08-21  8:36 ` Jing Zhang
  2023-08-25  4:24   ` Ian Rogers
  -1 siblings, 1 reply; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

If there is an "event=0" in the event description, the EventCode can
be omitted in the JSON file, and jevent.py will automatically fill in
"event=0" during parsing.

However, for some events where EventCode and ConfigCode are missing,
it is not necessary to automatically fill in "event=0", such as the
CMN event description which is typically "type=xxx, eventid=xxx".

Therefore, before modifying jevent.py to prevent it from automatically
adding "event=0" by default, it is necessary to fill in all omitted
EventCodes first.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 tools/perf/pmu-events/arch/x86/alderlake/pipeline.json     |  9 +++++++++
 tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json    |  3 +++
 tools/perf/pmu-events/arch/x86/broadwell/pipeline.json     |  4 ++++
 tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json   |  4 ++++
 .../perf/pmu-events/arch/x86/broadwellde/uncore-cache.json |  2 ++
 .../arch/x86/broadwellde/uncore-interconnect.json          |  1 +
 .../pmu-events/arch/x86/broadwellde/uncore-memory.json     |  1 +
 .../perf/pmu-events/arch/x86/broadwellde/uncore-power.json |  1 +
 tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json    |  4 ++++
 .../perf/pmu-events/arch/x86/broadwellx/uncore-cache.json  |  2 ++
 .../arch/x86/broadwellx/uncore-interconnect.json           | 13 +++++++++++++
 .../perf/pmu-events/arch/x86/broadwellx/uncore-memory.json |  2 ++
 .../perf/pmu-events/arch/x86/broadwellx/uncore-power.json  |  1 +
 tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json  |  4 ++++
 .../pmu-events/arch/x86/cascadelakex/uncore-cache.json     |  2 ++
 .../arch/x86/cascadelakex/uncore-interconnect.json         |  1 +
 tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json |  1 +
 .../pmu-events/arch/x86/cascadelakex/uncore-memory.json    |  1 +
 .../pmu-events/arch/x86/cascadelakex/uncore-power.json     |  1 +
 tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json   |  2 ++
 tools/perf/pmu-events/arch/x86/goldmont/pipeline.json      |  3 +++
 tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json  |  3 +++
 tools/perf/pmu-events/arch/x86/grandridge/pipeline.json    |  3 +++
 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json |  4 ++++
 tools/perf/pmu-events/arch/x86/haswell/pipeline.json       |  4 ++++
 tools/perf/pmu-events/arch/x86/haswellx/pipeline.json      |  4 ++++
 tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json  |  2 ++
 .../pmu-events/arch/x86/haswellx/uncore-interconnect.json  | 14 ++++++++++++++
 tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json |  2 ++
 tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json  |  1 +
 tools/perf/pmu-events/arch/x86/icelake/pipeline.json       |  4 ++++
 tools/perf/pmu-events/arch/x86/icelakex/pipeline.json      |  4 ++++
 tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json  |  1 +
 .../pmu-events/arch/x86/icelakex/uncore-interconnect.json  |  1 +
 tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json |  1 +
 tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json  |  1 +
 tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json     |  3 +++
 tools/perf/pmu-events/arch/x86/ivytown/pipeline.json       |  4 ++++
 tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json   |  2 ++
 .../pmu-events/arch/x86/ivytown/uncore-interconnect.json   | 11 +++++++++++
 tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json  |  1 +
 tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json   |  1 +
 tools/perf/pmu-events/arch/x86/jaketown/pipeline.json      |  4 ++++
 tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json  |  2 ++
 .../pmu-events/arch/x86/jaketown/uncore-interconnect.json  | 12 ++++++++++++
 tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json |  1 +
 tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json  |  2 ++
 .../perf/pmu-events/arch/x86/knightslanding/pipeline.json  |  3 +++
 .../pmu-events/arch/x86/knightslanding/uncore-cache.json   |  1 +
 .../pmu-events/arch/x86/knightslanding/uncore-memory.json  |  4 ++++
 tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json    |  8 ++++++++
 tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json   |  4 ++++
 .../perf/pmu-events/arch/x86/sapphirerapids/pipeline.json  |  5 +++++
 tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json  |  4 ++++
 tools/perf/pmu-events/arch/x86/silvermont/pipeline.json    |  3 +++
 tools/perf/pmu-events/arch/x86/skylake/pipeline.json       |  4 ++++
 tools/perf/pmu-events/arch/x86/skylakex/pipeline.json      |  4 ++++
 tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json  |  2 ++
 .../pmu-events/arch/x86/skylakex/uncore-interconnect.json  |  1 +
 tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json     |  1 +
 tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json |  1 +
 tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json  |  1 +
 tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json    |  2 ++
 .../perf/pmu-events/arch/x86/snowridgex/uncore-cache.json  |  1 +
 .../arch/x86/snowridgex/uncore-interconnect.json           |  1 +
 .../perf/pmu-events/arch/x86/snowridgex/uncore-memory.json |  1 +
 .../perf/pmu-events/arch/x86/snowridgex/uncore-power.json  |  1 +
 tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json     |  5 +++++
 68 files changed, 211 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
index cb5b861..7054426 100644
--- a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
@@ -489,6 +489,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.CORE",
         "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
         "SampleAfterValue": "2000003",
@@ -550,6 +551,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
         "SampleAfterValue": "2000003",
@@ -558,6 +560,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately 
 so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -584,6 +587,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
         "SampleAfterValue": "2000003",
@@ -592,6 +596,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -743,6 +748,7 @@
     },
     {
         "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
@@ -752,6 +758,7 @@
     },
     {
         "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
@@ -796,6 +803,7 @@
     },
     {
         "BriefDescription": "Precise instruction retired with PEBS precise-distribution",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.PREC_DIST",
         "PEBS": "1",
         "PublicDescription": "A version of INST_RETIRED that allows for a precise distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR++) feature to fix bias in how retired instructions get sampled. Use on Fixed Counter 0.",
@@ -1160,6 +1168,7 @@
     },
     {
         "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "TOPDOWN.SLOTS",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
index fa53ff1..345d1c8 100644
--- a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
@@ -211,6 +211,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.CORE",
         "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
         "SampleAfterValue": "2000003",
@@ -225,6 +226,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
         "SampleAfterValue": "2000003",
@@ -240,6 +242,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
index 9a902d2..b114d0d 100644
--- a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
@@ -336,6 +336,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appea
 r 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -359,6 +360,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -366,6 +368,7 @@
     },
     {
         "AnyThread": "1",
+        "EventCode": "0x0",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
@@ -514,6 +517,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
index 9a902d2..ce90d058 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
@@ -336,6 +336,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appea
 r 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -359,6 +360,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -367,6 +369,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -514,6 +517,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
index 56bba6d..117be19 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
@@ -8,6 +8,7 @@
     },
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_C_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CBOX"
@@ -1501,6 +1502,7 @@
     },
     {
         "BriefDescription": "uclks",
+        "EventCode": "0x0",
         "EventName": "UNC_H_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
index 8a327e0..ce54bd3 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
@@ -19,6 +19,7 @@
     },
     {
         "BriefDescription": "Clocks in the IRP",
+        "EventCode": "0x0",
         "EventName": "UNC_I_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Number of clocks in the IRP.",
diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
index a764234..32c46bd 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
@@ -131,6 +131,7 @@
     },
     {
         "BriefDescription": "DRAM Clockticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_DCLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC"
diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
index 83d2013..f57eb8e 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "pclk Cycles",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
index 9a902d2..ce90d058 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
@@ -336,6 +336,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appea
 r 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -359,6 +360,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -367,6 +369,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -514,6 +517,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
index 400d784..346f5cf 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
@@ -183,6 +183,7 @@
     },
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_C_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CBOX"
@@ -1689,6 +1690,7 @@
     },
     {
         "BriefDescription": "uclks",
+        "EventCode": "0x0",
         "EventName": "UNC_H_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
index e61a23f..df96e41 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "Number of non data (control) flits transmitted . Derived from unc_q_txl_flits_g0.non_data",
+        "EventCode": "0x0",
         "EventName": "QPI_CTL_BANDWIDTH_TX",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
@@ -10,6 +11,7 @@
     },
     {
         "BriefDescription": "Number of data flits transmitted . Derived from unc_q_txl_flits_g0.data",
+        "EventCode": "0x0",
         "EventName": "QPI_DATA_BANDWIDTH_TX",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
@@ -37,6 +39,7 @@
     },
     {
         "BriefDescription": "Clocks in the IRP",
+        "EventCode": "0x0",
         "EventName": "UNC_I_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Number of clocks in the IRP.",
@@ -1400,6 +1403,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
@@ -1408,6 +1412,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
@@ -1416,6 +1421,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
@@ -1424,6 +1430,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
+		"EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
@@ -1432,6 +1439,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
@@ -1440,6 +1448,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
@@ -1448,6 +1457,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
@@ -1456,6 +1466,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
@@ -1464,6 +1475,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
@@ -3162,6 +3174,7 @@
     },
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_S_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "SBOX"
diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
index b5a33e7a..0c5888d 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
@@ -158,12 +158,14 @@
     },
     {
         "BriefDescription": "Clockticks in the Memory Controller using one of the programmable counters",
+        "EventCode": "0x0",
         "EventName": "UNC_M_CLOCKTICKS_P",
         "PerPkg": "1",
         "Unit": "iMC"
     },
     {
         "BriefDescription": "This event is deprecated. Refer to new event UNC_M_CLOCKTICKS_P",
+        "EventCode": "0x0",
         "EventName": "UNC_M_DCLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC"
diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
index 83d2013..f57eb8e 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "pclk Cycles",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
index 0f06e31..99346e1 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
@@ -191,6 +191,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overfl
 ow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -222,6 +223,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -230,6 +232,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -369,6 +372,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
index 2c88053..ba7a6f6 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
@@ -512,6 +512,7 @@
     },
     {
         "BriefDescription": "Uncore cache clock ticks",
+        "EventCode": "0x0",
         "EventName": "UNC_CHA_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts clockticks of the clock controlling the uncore caching and home agent (CHA).",
@@ -5792,6 +5793,7 @@
     },
     {
         "BriefDescription": "This event is deprecated. Refer to new event UNC_CHA_CLOCKTICKS",
+        "EventCode": "0x0",
         "Deprecated": "1",
         "EventName": "UNC_C_CLOCKTICKS",
         "PerPkg": "1",
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
index 725780f..43d7b24 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
@@ -1090,6 +1090,7 @@
     },
     {
         "BriefDescription": "Cycles - at UCLK",
+        "EventCode": "0x0",
         "EventName": "UNC_M2M_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "M2M"
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
index 743c91f..377d54f 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
@@ -1271,6 +1271,7 @@
     },
     {
         "BriefDescription": "Counting disabled",
+        "EventCode": "0x0",
         "EventName": "UNC_IIO_NOTHING",
         "PerPkg": "1",
         "Unit": "IIO"
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
index f761856..77bb0ea 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
@@ -167,6 +167,7 @@
     },
     {
         "BriefDescription": "Memory controller clock ticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts clockticks of the fixed frequency clock of the memory controller using one of the programmable counters.",
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
index c6254af..a01b279 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "pclk Cycles",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
diff --git a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
index 9dd8c90..3388cd5 100644
--- a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
@@ -150,6 +150,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
         "SampleAfterValue": "2000003",
@@ -179,6 +180,7 @@
     },
     {
         "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
diff --git a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
index acb8974..79806e7 100644
--- a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
@@ -143,6 +143,7 @@
     },
     {
         "BriefDescription": "Core cycles when core is not halted  (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.CORE",
         "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.  You cannot collect a PEBs record for this event.",
         "SampleAfterValue": "2000003",
@@ -165,6 +166,7 @@
     },
     {
         "BriefDescription": "Reference cycles when core is not halted  (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction.  In mobile systems the core frequency may change from time.  This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  This event uses fixed counter 2.  You cannot collect a PEBs record for this event.",
         "SampleAfterValue": "2000003",
@@ -187,6 +189,7 @@
     },
     {
         "BriefDescription": "Instructions retired (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.  This event uses fixed counter 0.  You cannot collect a PEBs record for this event.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
index 33ef331..1be1b50 100644
--- a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
@@ -143,6 +143,7 @@
     },
     {
         "BriefDescription": "Core cycles when core is not halted  (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.CORE",
         "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.  You cannot collect a PEBs record for this event.",
         "SampleAfterValue": "2000003",
@@ -165,6 +166,7 @@
     },
     {
         "BriefDescription": "Reference cycles when core is not halted  (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction.  In mobile systems the core frequency may change from time.  This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  This event uses fixed counter 2.  You cannot collect a PEBs record for this event.",
         "SampleAfterValue": "2000003",
@@ -187,6 +189,7 @@
     },
     {
         "BriefDescription": "Instructions retired (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "2",
         "PublicDescription": "Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.  This event uses fixed counter 0.  You cannot collect a PEBs record for this event.",
diff --git a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
index 4121295..5335a7b 100644
--- a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
@@ -29,6 +29,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "SampleAfterValue": "2000003",
         "UMask": "0x3"
@@ -43,6 +44,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -55,6 +57,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
index 764c043..6ca34b9 100644
--- a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
@@ -17,6 +17,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately 
 so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -32,6 +33,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -46,6 +48,7 @@
     },
     {
         "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
@@ -78,6 +81,7 @@
     },
     {
         "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "TOPDOWN.SLOTS",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
diff --git a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
index 540f437..0d5eafd 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
@@ -303,6 +303,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.",
         "SampleAfterValue": "2000003",
@@ -327,6 +328,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.",
         "SampleAfterValue": "2000003",
@@ -335,6 +337,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -436,6 +439,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "Errata": "HSD140, HSD143",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
diff --git a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
index 540f437..0d5eafd 100644
--- a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
@@ -303,6 +303,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.",
         "SampleAfterValue": "2000003",
@@ -327,6 +328,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.",
         "SampleAfterValue": "2000003",
@@ -335,6 +337,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -436,6 +439,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "Errata": "HSD140, HSD143",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
index 9227cc2..64e2fb4 100644
--- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
@@ -183,6 +183,7 @@
     },
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_C_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CBOX"
@@ -1698,6 +1699,7 @@
     },
     {
         "BriefDescription": "uclks",
+        "EventCode": "0x0",
         "EventName": "UNC_H_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
index 954e8198..7c4fc13 100644
--- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "Number of non data (control) flits transmitted . Derived from unc_q_txl_flits_g0.non_data",
+        "EventCode": "0x0",
         "EventName": "QPI_CTL_BANDWIDTH_TX",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
@@ -10,6 +11,7 @@
     },
     {
         "BriefDescription": "Number of data flits transmitted . Derived from unc_q_txl_flits_g0.data",
+        "EventCode": "0x0",
         "EventName": "QPI_DATA_BANDWIDTH_TX",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
@@ -37,6 +39,7 @@
     },
     {
         "BriefDescription": "Clocks in the IRP",
+        "EventCode": "0x0",
         "EventName": "UNC_I_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Number of clocks in the IRP.",
@@ -1401,6 +1404,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
@@ -1409,6 +1413,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
@@ -1417,6 +1422,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
@@ -1425,6 +1431,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
@@ -1433,6 +1440,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
@@ -1441,6 +1449,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
@@ -1449,6 +1458,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
@@ -1457,6 +1467,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
@@ -1465,6 +1476,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
@@ -3136,6 +3148,7 @@
     },
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_S_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "SBOX"
@@ -3823,6 +3836,7 @@
     },
     {
         "BriefDescription": "UNC_U_CLOCKTICKS",
+        "EventCode": "0x0",
         "EventName": "UNC_U_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "UBOX"
diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
index c005f51..124c3ae 100644
--- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
@@ -151,12 +151,14 @@
     },
     {
         "BriefDescription": "DRAM Clockticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC"
     },
     {
         "BriefDescription": "DRAM Clockticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_DCLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC"
diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
index daebf10..9276058 100644
--- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "pclk Cycles",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
diff --git a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
index 154fee4..0789412 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
@@ -193,6 +193,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to
  less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -208,6 +209,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -359,6 +361,7 @@
     },
     {
         "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.PREC_DIST",
         "PEBS": "1",
         "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
@@ -562,6 +565,7 @@
     },
     {
         "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "TOPDOWN.SLOTS",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
index 442a4c7..9cfb341 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
@@ -193,6 +193,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to
  less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -208,6 +209,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -359,6 +361,7 @@
     },
     {
         "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.PREC_DIST",
         "PEBS": "1",
         "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
@@ -544,6 +547,7 @@
     },
     {
         "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "TOPDOWN.SLOTS",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
index b6ce14e..ae57663 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
@@ -892,6 +892,7 @@
     },
     {
         "BriefDescription": "Clockticks of the uncore caching and home agent (CHA)",
+        "EventCode": "0x0",
         "EventName": "UNC_CHA_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CHA"
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
index 8ac5907..1b821b6 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
@@ -1419,6 +1419,7 @@
     },
     {
         "BriefDescription": "Clockticks of the mesh to memory (M2M)",
+        "EventCode": "0x0",
         "EventName": "UNC_M2M_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "M2M"
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
index 814d959..b0b2f27 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
@@ -100,6 +100,7 @@
     },
     {
         "BriefDescription": "DRAM Clockticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC"
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
index ee4dac6..9c4cd59 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "Clockticks of the power control unit (PCU)",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Clockticks of the power control unit (PCU) : The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
index 30a3da9..2df2d21 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
@@ -326,6 +326,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "SampleAfterValue": "2000003",
         "UMask": "0x3"
@@ -348,6 +349,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -355,6 +357,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "PublicDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
index 30a3da9..6f6f281 100644
--- a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
@@ -326,6 +326,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "SampleAfterValue": "2000003",
         "UMask": "0x3"
@@ -348,6 +349,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -355,6 +357,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "PublicDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
         "SampleAfterValue": "2000003",
@@ -510,6 +513,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
index 8bf2706..31e58fb 100644
--- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_C_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CBOX"
@@ -1533,6 +1534,7 @@
     },
     {
         "BriefDescription": "uclks",
+        "EventCode": "0x0",
         "EventName": "UNC_H_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
index ccf45153..f2492ec7 100644
--- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
@@ -109,6 +109,7 @@
     },
     {
         "BriefDescription": "Clocks in the IRP",
+        "EventCode": "0x0",
         "EventName": "UNC_I_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Number of clocks in the IRP.",
@@ -1522,6 +1523,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
@@ -1530,6 +1532,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calc
 ulate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
@@ -1538,6 +1541,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
@@ -1546,6 +1550,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
@@ -1554,6 +1559,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
@@ -1562,6 +1568,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
@@ -1570,6 +1577,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
@@ -1578,6 +1586,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
@@ -1586,6 +1595,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual
  data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
@@ -3104,6 +3114,7 @@
     },
     {
         "EventName": "UNC_U_CLOCKTICKS",
+        "EventCode": "0x0",
         "PerPkg": "1",
         "Unit": "UBOX"
     },
diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
index 6550934..869a320 100644
--- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
@@ -131,6 +131,7 @@
     },
     {
         "BriefDescription": "DRAM Clockticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_DCLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC"
diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
index 5df1ebf..0a5d0c3 100644
--- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "pclk Cycles",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
diff --git a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
index d0edfde..76b515d 100644
--- a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
@@ -329,6 +329,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -351,6 +352,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -359,6 +361,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -432,6 +435,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
index 63395e7e..160f1c4 100644
--- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_C_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CBOX"
@@ -863,6 +864,7 @@
     },
     {
         "BriefDescription": "uclks",
+        "EventCode": "0x0",
         "EventName": "UNC_H_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
index 874f15e..45f2966 100644
--- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
@@ -109,6 +109,7 @@
     },
     {
         "BriefDescription": "Clocks in the IRP",
+        "EventCode": "0x0",
         "EventName": "UNC_I_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Number of clocks in the IRP.",
@@ -847,6 +848,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other infor
 mation.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
@@ -855,6 +857,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Idle and Null Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.IDLE",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other infor
 mation.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
@@ -863,6 +866,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other infor
 mation.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
@@ -871,6 +875,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 6
 4 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
@@ -879,6 +884,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 6
 4 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
@@ -887,6 +893,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 6
 4 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
@@ -895,6 +902,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 6
 4 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
@@ -903,6 +911,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 6
 4 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
@@ -911,6 +920,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 6
 4 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
@@ -919,6 +929,7 @@
     },
     {
         "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
+        "EventCode": "0x0",
         "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 6
 4 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
@@ -1576,6 +1587,7 @@
     },
     {
         "EventName": "UNC_U_CLOCKTICKS",
+        "EventCode": "0x0",
         "PerPkg": "1",
         "Unit": "UBOX"
     },
diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
index 6dcc9415..2385b0a 100644
--- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
@@ -65,6 +65,7 @@
     },
     {
         "BriefDescription": "uclks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Uncore Fixed Counter - uclks",
diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
index b3ee5d7..f453afd 100644
--- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "pclk Cycles",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
@@ -216,6 +217,7 @@
     },
     {
         "BriefDescription": "Cycles spent changing Frequency",
+        "EventCode": "0x0",
         "EventName": "UNC_P_FREQ_TRANS_CYCLES",
         "PerPkg": "1",
         "PublicDescription": "Counts the number of cycles when the system is changing frequency.  This can not be filtered by thread ID.  One can also use it with the occupancy counter that monitors number of threads in C0 to estimate the performance impact that frequency transitions had on the system.",
diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
index 3dc5321..a74d45a 100644
--- a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
@@ -150,12 +150,14 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "SampleAfterValue": "2000003",
         "UMask": "0x3"
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter",
         "SampleAfterValue": "2000003",
@@ -177,6 +179,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions that retire.  For instructions that consist of multiple micro-ops, this event counts exactly once, as the last micro-op of the instruction retires.  The event continues counting while instructions retire, including during interrupt service routines caused by hardware interrupts, faults or traps.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
index 1b8dcfa..c062253 100644
--- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
@@ -3246,6 +3246,7 @@
     },
     {
         "BriefDescription": "Uncore Clocks",
+        "EventCode": "0x0",
         "EventName": "UNC_H_U_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CHA"
diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
index fb75297..3575baa 100644
--- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
@@ -41,6 +41,7 @@
     },
     {
         "BriefDescription": "ECLK count",
+        "EventCode": "0x0",
         "EventName": "UNC_E_E_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "EDC_ECLK"
@@ -55,6 +56,7 @@
     },
     {
         "BriefDescription": "UCLK count",
+        "EventCode": "0x0",
         "EventName": "UNC_E_U_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "EDC_UCLK"
@@ -93,12 +95,14 @@
     },
     {
         "BriefDescription": "DCLK count",
+        "EventCode": "0x0",
         "EventName": "UNC_M_D_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC_DCLK"
     },
     {
         "BriefDescription": "UCLK count",
+        "EventCode": "0x0",
         "EventName": "UNC_M_U_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "iMC_UCLK"
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
index 6397894..0de3572 100644
--- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
@@ -37,6 +37,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.CORE",
         "SampleAfterValue": "2000003",
         "UMask": "0x2",
@@ -51,6 +52,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "SampleAfterValue": "2000003",
         "UMask": "0x3",
@@ -58,6 +60,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately 
 so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -75,6 +78,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "SampleAfterValue": "2000003",
         "UMask": "0x2",
@@ -82,6 +86,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -105,6 +110,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "SampleAfterValue": "2000003",
@@ -113,6 +119,7 @@
     },
     {
         "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
@@ -157,6 +164,7 @@
     },
     {
         "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "TOPDOWN.SLOTS",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
index ecaf94c..973a5f4 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
@@ -337,6 +337,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -359,6 +360,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -367,6 +369,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -440,6 +443,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
index 72e9bdfa..ada2c34 100644
--- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
@@ -284,6 +284,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately 
 so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -299,6 +300,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -426,6 +428,7 @@
     },
     {
         "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
@@ -457,6 +460,7 @@
     },
     {
         "BriefDescription": "Precise instruction retired with PEBS precise-distribution",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.PREC_DIST",
         "PEBS": "1",
         "PublicDescription": "A version of INST_RETIRED that allows for a precise distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR++) feature to fix bias in how retired instructions get sampled. Use on Fixed Counter 0.",
@@ -719,6 +723,7 @@
     },
     {
         "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "TOPDOWN.SLOTS",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
diff --git a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
index 4121295..67be689 100644
--- a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
@@ -17,6 +17,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.CORE",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -29,6 +30,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "SampleAfterValue": "2000003",
         "UMask": "0x3"
@@ -43,6 +45,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -55,6 +58,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
index 2d4214b..6423c01 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
@@ -143,6 +143,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.CORE",
         "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.  The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. In systems with a constant core frequency, this event can give you a measurement of the elapsed time while the core was not in halt state by dividing the event count by the core frequency. This event is architecturally defined and is a designated fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use the core frequency which may change from time to time.  CPU_CLK_UNHALTE.REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  The fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the programmable events are C
 PU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.",
         "SampleAfterValue": "2000003",
@@ -165,6 +166,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.  The core frequency may change from time. This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  Divide this event count by core frequency to determine the elapsed time while the core was not in halt state.  Divide this event count by core frequency to determine the elapsed time while the core was not in halt state.  This event is architecturally defined and is a designated fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use the core frequency which may change from time to time.  CPU_CLK_UNHALTE.REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  The fixed events are
  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the programmable events are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.",
         "SampleAfterValue": "2000003",
@@ -180,6 +182,7 @@
     },
     {
         "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "This event counts the number of instructions that retire.  For instructions that consist of multiple micro-ops, this event counts exactly once, as the last micro-op of the instruction retires.  The event continues counting while instructions retire, including during interrupt service routines caused by hardware interrupts, faults or traps.  Background: Modern microprocessors employ extensive pipelining and speculative techniques.  Since sometimes an instruction is started but never completed, the notion of \"retirement\" is introduced.  A retired instruction is one that commits its states. Or stated differently, an instruction might be abandoned at some point. No instruction is truly finished until it retires.  This counter measures the number of completed instructions.  The fixed event is INST_RETIRED.ANY and the programmable event is INST_RETIRED.ANY_P.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
index 2dfc3af..53f1381 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
@@ -182,6 +182,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overfl
 ow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -213,6 +214,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -221,6 +223,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -360,6 +363,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
index 0f06e31..99346e1 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
@@ -191,6 +191,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overfl
 ow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -222,6 +223,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -230,6 +232,7 @@
     {
         "AnyThread": "1",
         "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
@@ -369,6 +372,7 @@
     },
     {
         "BriefDescription": "Instructions retired from execution.",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
index 543dfc1..4df1294 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
@@ -460,6 +460,7 @@
     },
     {
         "BriefDescription": "Clockticks of the uncore caching & home agent (CHA)",
+        "EventCode": "0x0",
         "EventName": "UNC_CHA_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts clockticks of the clock controlling the uncore caching and home agent (CHA).",
@@ -5678,6 +5679,7 @@
     {
         "BriefDescription": "This event is deprecated. Refer to new event UNC_CHA_CLOCKTICKS",
         "Deprecated": "1",
+        "EventCode": "0x0",
         "EventName": "UNC_C_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CHA"
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
index 26a5a20..40f609c 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
@@ -1090,6 +1090,7 @@
     },
     {
         "BriefDescription": "Cycles - at UCLK",
+        "EventCode": "0x0",
         "EventName": "UNC_M2M_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "M2M"
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
index 2a3a709..21a6a0f 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
@@ -1271,6 +1271,7 @@
     },
     {
         "BriefDescription": "Counting disabled",
+        "EventCode": "0x0",
         "EventName": "UNC_IIO_NOTHING",
         "PerPkg": "1",
         "Unit": "IIO"
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
index 6f8ff22..a7ce916 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
@@ -167,6 +167,7 @@
     },
     {
         "BriefDescription": "Memory controller clock ticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Counts clockticks of the fixed frequency clock of the memory controller using one of the programmable counters.",
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
index c6254af..a01b279 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "pclk Cycles",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
index 9dd8c90..3388cd5 100644
--- a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
@@ -150,6 +150,7 @@
     },
     {
         "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
         "SampleAfterValue": "2000003",
@@ -179,6 +180,7 @@
     },
     {
         "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
index a68a5bb..279381b 100644
--- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
+++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
@@ -872,6 +872,7 @@
     },
     {
         "BriefDescription": "Uncore cache clock ticks",
+        "EventCode": "0x0",
         "EventName": "UNC_CHA_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "CHA"
diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
index de38400..399536f 100644
--- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
@@ -1419,6 +1419,7 @@
     },
     {
         "BriefDescription": "Clockticks of the mesh to memory (M2M)",
+        "EventCode": "0x0",
         "EventName": "UNC_M2M_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "M2M"
diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
index 530e9b71..b24ba35 100644
--- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
@@ -120,6 +120,7 @@
     },
     {
         "BriefDescription": "Memory controller clock ticks",
+        "EventCode": "0x0",
         "EventName": "UNC_M_CLOCKTICKS",
         "PerPkg": "1",
         "PublicDescription": "Clockticks of the integrated memory controller (IMC)",
diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
index 27fc155..5c04d6e 100644
--- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
+++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
@@ -1,6 +1,7 @@
 [
     {
         "BriefDescription": "Clockticks of the power control unit (PCU)",
+        "EventCode": "0x0",
         "EventName": "UNC_P_CLOCKTICKS",
         "PerPkg": "1",
         "Unit": "PCU"
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
index a0aeeb8..54a81f9 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
@@ -193,6 +193,7 @@
     },
     {
         "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.REF_TSC",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to
  less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
@@ -208,6 +209,7 @@
     },
     {
         "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventCode": "0x0",
         "EventName": "CPU_CLK_UNHALTED.THREAD",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
@@ -352,6 +354,7 @@
     },
     {
         "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.ANY",
         "PEBS": "1",
         "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
@@ -377,6 +380,7 @@
     },
     {
         "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
+        "EventCode": "0x0",
         "EventName": "INST_RETIRED.PREC_DIST",
         "PEBS": "1",
         "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
@@ -569,6 +573,7 @@
     },
     {
         "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventCode": "0x0",
         "EventName": "TOPDOWN.SLOTS",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 4/8] perf jevents: Support more event fields
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-21  8:36   ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The previous code assumes an event has either an "event=" or "config"
field at the beginning. For CMN neither of these may be present, as an
event is typically "type=xx,eventid=xxx".

If EventCode and ConfigCode is not added in the alias JSON file, the
event description will add "event=0" by default. So, even if the event
field is added "eventid=xxx" and "type=xxx", the CMN events final
parsing result will be "event=0, eventid=xxx, type=xxx".

Therefore, when EventCode and ConfigCode are missing in JSON, "event=0"
is no longer added by default. And add EventIdCode and Type to the event
field.

I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
they are consistent.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 tools/perf/pmu-events/jevents.py | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index f57a8f2..369c8bf 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -275,11 +275,14 @@ class JsonEvent:
       }
       return table[unit] if unit in table else f'uncore_{unit.lower()}'
 
-    eventcode = 0
+    eventcode = None
     if 'EventCode' in jd:
       eventcode = int(jd['EventCode'].split(',', 1)[0], 0)
     if 'ExtSel' in jd:
-      eventcode |= int(jd['ExtSel']) << 8
+      if eventcode is None:
+        eventcode = int(jd['ExtSel']) << 8
+      else:
+        eventcode |= int(jd['ExtSel']) << 8
     configcode = int(jd['ConfigCode'], 0) if 'ConfigCode' in jd else None
     self.name = jd['EventName'].lower() if 'EventName' in jd else None
     self.topic = ''
@@ -317,7 +320,11 @@ class JsonEvent:
     if precise and self.desc and '(Precise Event)' not in self.desc:
       extra_desc += ' (Must be precise)' if precise == '2' else (' (Precise '
                                                                  'event)')
-    event = f'config={llx(configcode)}' if configcode is not None else f'event={llx(eventcode)}'
+    event = None
+    if eventcode is not None:
+      event = f'event={llx(eventcode)}'
+    elif configcode is not None:
+      event = f'config={llx(configcode)}'
     event_fields = [
         ('AnyThread', 'any='),
         ('PortMask', 'ch_mask='),
@@ -327,10 +334,15 @@ class JsonEvent:
         ('Invert', 'inv='),
         ('SampleAfterValue', 'period='),
         ('UMask', 'umask='),
+        ('NodeType', 'type='),
+        ('EventIdCode', 'eventid='),
     ]
     for key, value in event_fields:
       if key in jd and jd[key] != '0':
-        event += ',' + value + jd[key]
+        if event:
+          event += ',' + value + jd[key]
+        else:
+          event = value + jd[key]
     if filter:
       event += f',{filter}'
     if msr:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 4/8] perf jevents: Support more event fields
@ 2023-08-21  8:36   ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The previous code assumes an event has either an "event=" or "config"
field at the beginning. For CMN neither of these may be present, as an
event is typically "type=xx,eventid=xxx".

If EventCode and ConfigCode is not added in the alias JSON file, the
event description will add "event=0" by default. So, even if the event
field is added "eventid=xxx" and "type=xxx", the CMN events final
parsing result will be "event=0, eventid=xxx, type=xxx".

Therefore, when EventCode and ConfigCode are missing in JSON, "event=0"
is no longer added by default. And add EventIdCode and Type to the event
field.

I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
they are consistent.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 tools/perf/pmu-events/jevents.py | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index f57a8f2..369c8bf 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -275,11 +275,14 @@ class JsonEvent:
       }
       return table[unit] if unit in table else f'uncore_{unit.lower()}'
 
-    eventcode = 0
+    eventcode = None
     if 'EventCode' in jd:
       eventcode = int(jd['EventCode'].split(',', 1)[0], 0)
     if 'ExtSel' in jd:
-      eventcode |= int(jd['ExtSel']) << 8
+      if eventcode is None:
+        eventcode = int(jd['ExtSel']) << 8
+      else:
+        eventcode |= int(jd['ExtSel']) << 8
     configcode = int(jd['ConfigCode'], 0) if 'ConfigCode' in jd else None
     self.name = jd['EventName'].lower() if 'EventName' in jd else None
     self.topic = ''
@@ -317,7 +320,11 @@ class JsonEvent:
     if precise and self.desc and '(Precise Event)' not in self.desc:
       extra_desc += ' (Must be precise)' if precise == '2' else (' (Precise '
                                                                  'event)')
-    event = f'config={llx(configcode)}' if configcode is not None else f'event={llx(eventcode)}'
+    event = None
+    if eventcode is not None:
+      event = f'event={llx(eventcode)}'
+    elif configcode is not None:
+      event = f'config={llx(configcode)}'
     event_fields = [
         ('AnyThread', 'any='),
         ('PortMask', 'ch_mask='),
@@ -327,10 +334,15 @@ class JsonEvent:
         ('Invert', 'inv='),
         ('SampleAfterValue', 'period='),
         ('UMask', 'umask='),
+        ('NodeType', 'type='),
+        ('EventIdCode', 'eventid='),
     ]
     for key, value in event_fields:
       if key in jd and jd[key] != '0':
-        event += ',' + value + jd[key]
+        if event:
+          event += ',' + value + jd[key]
+        else:
+          event = value + jd[key]
     if filter:
       event += f',{filter}'
     if msr:
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 5/8] perf test: Make matching_pmu effective
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-21  8:36   ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The perf_pmu_test_event.matching_pmu didn't work. No matter what its
value is, it does not affect the test results. So let matching_pmu be
used for matching perf_pmu_test_pmu.pmu.name.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 tools/perf/tests/pmu-events.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 1dff863b..3204252 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -238,7 +238,7 @@ struct perf_pmu_test_pmu {
 	},
 	.alias_str = "event=0x2b",
 	.alias_long_desc = "ddr write-cycles event. Unit: uncore_sys_ddr_pmu ",
-	.matching_pmu = "uncore_sys_ddr_pmu",
+	.matching_pmu = "uncore_sys_ddr_pmu0",
 };
 
 static const struct perf_pmu_test_event sys_ccn_pmu_read_cycles = {
@@ -252,7 +252,7 @@ struct perf_pmu_test_pmu {
 	},
 	.alias_str = "config=0x2c",
 	.alias_long_desc = "ccn read-cycles event. Unit: uncore_sys_ccn_pmu ",
-	.matching_pmu = "uncore_sys_ccn_pmu",
+	.matching_pmu = "uncore_sys_ccn_pmu4",
 };
 
 static const struct perf_pmu_test_event *sys_events[] = {
@@ -599,6 +599,11 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 			struct pmu_event const *event = &test_event->event;
 
 			if (!strcmp(event->name, alias->name)) {
+				if (strcmp(pmu_name, test_event->matching_pmu)) {
+					pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s vs %s\n",
+							pmu_name, test_event->matching_pmu, pmu_name);
+					continue;
+				}
 				if (compare_alias_to_test_event(alias,
 							test_event,
 							pmu_name)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 5/8] perf test: Make matching_pmu effective
@ 2023-08-21  8:36   ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

The perf_pmu_test_event.matching_pmu didn't work. No matter what its
value is, it does not affect the test results. So let matching_pmu be
used for matching perf_pmu_test_pmu.pmu.name.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 tools/perf/tests/pmu-events.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 1dff863b..3204252 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -238,7 +238,7 @@ struct perf_pmu_test_pmu {
 	},
 	.alias_str = "event=0x2b",
 	.alias_long_desc = "ddr write-cycles event. Unit: uncore_sys_ddr_pmu ",
-	.matching_pmu = "uncore_sys_ddr_pmu",
+	.matching_pmu = "uncore_sys_ddr_pmu0",
 };
 
 static const struct perf_pmu_test_event sys_ccn_pmu_read_cycles = {
@@ -252,7 +252,7 @@ struct perf_pmu_test_pmu {
 	},
 	.alias_str = "config=0x2c",
 	.alias_long_desc = "ccn read-cycles event. Unit: uncore_sys_ccn_pmu ",
-	.matching_pmu = "uncore_sys_ccn_pmu",
+	.matching_pmu = "uncore_sys_ccn_pmu4",
 };
 
 static const struct perf_pmu_test_event *sys_events[] = {
@@ -599,6 +599,11 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 			struct pmu_event const *event = &test_event->event;
 
 			if (!strcmp(event->name, alias->name)) {
+				if (strcmp(pmu_name, test_event->matching_pmu)) {
+					pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s vs %s\n",
+							pmu_name, test_event->matching_pmu, pmu_name);
+					continue;
+				}
 				if (compare_alias_to_test_event(alias,
 							test_event,
 							pmu_name)) {
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field.
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-21  8:36   ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Add new event test for uncore system event which is used to verify the
functionality of "Compat" matching multiple identifiers and the new event
fields "EventIdCode" and "Type".

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../pmu-events/arch/test/test_soc/sys/uncore.json  |  8 ++++
 tools/perf/pmu-events/empty-pmu-events.c           |  8 ++++
 tools/perf/tests/pmu-events.c                      | 55 ++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
index c7e7528..06b886d 100644
--- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
+++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
@@ -12,5 +12,13 @@
            "EventName": "sys_ccn_pmu.read_cycles",
            "Unit": "sys_ccn_pmu",
            "Compat": "0x01"
+   },
+   {
+           "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
+           "NodeType": "0x05",
+           "EventIdCode": "0x01",
+           "EventName": "sys_cmn_pmu.hnf_cache_miss",
+           "Unit": "sys_cmn_pmu",
+           "Compat": "434*;436*;43c*;43a01"
    }
 ]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index e74defb..25be18a 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -245,6 +245,14 @@ struct pmu_events_map {
 		.pmu = "uncore_sys_ccn_pmu",
 	},
 	{
+		.name = "sys_cmn_pmu.hnf_cache_miss",
+		.event = "type=0x05,eventid=0x01",
+		.desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
+		.compat = "434*;436*;43c*;43a01",
+		.topic = "uncore",
+		.pmu = "uncore_sys_cmn_pmu",
+	},
+	{
 		.name = 0,
 		.event = 0,
 		.desc = 0,
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 3204252..79fb3e2 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -255,9 +255,24 @@ struct perf_pmu_test_pmu {
 	.matching_pmu = "uncore_sys_ccn_pmu4",
 };
 
+static const struct perf_pmu_test_event sys_cmn_pmu_hnf_cache_miss = {
+	.event = {
+		.name = "sys_cmn_pmu.hnf_cache_miss",
+		.event = "type=0x05,eventid=0x01",
+		.desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
+		.topic = "uncore",
+		.pmu = "uncore_sys_cmn_pmu",
+		.compat = "434*;436*;43c*;43a01",
+	},
+	.alias_str = "type=0x5,eventid=0x1",
+	.alias_long_desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
+	.matching_pmu = "uncore_sys_cmn_pmu0",
+};
+
 static const struct perf_pmu_test_event *sys_events[] = {
 	&sys_ddr_pmu_write_cycles,
 	&sys_ccn_pmu_read_cycles,
+	&sys_cmn_pmu_hnf_cache_miss,
 	NULL
 };
 
@@ -704,6 +719,46 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 			&sys_ccn_pmu_read_cycles,
 		},
 	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43401",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43602",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43c03",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43a01",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	}
 };
 
 /* Test that aliases generated are as expected */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field.
@ 2023-08-21  8:36   ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Add new event test for uncore system event which is used to verify the
functionality of "Compat" matching multiple identifiers and the new event
fields "EventIdCode" and "Type".

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../pmu-events/arch/test/test_soc/sys/uncore.json  |  8 ++++
 tools/perf/pmu-events/empty-pmu-events.c           |  8 ++++
 tools/perf/tests/pmu-events.c                      | 55 ++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
index c7e7528..06b886d 100644
--- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
+++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
@@ -12,5 +12,13 @@
            "EventName": "sys_ccn_pmu.read_cycles",
            "Unit": "sys_ccn_pmu",
            "Compat": "0x01"
+   },
+   {
+           "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
+           "NodeType": "0x05",
+           "EventIdCode": "0x01",
+           "EventName": "sys_cmn_pmu.hnf_cache_miss",
+           "Unit": "sys_cmn_pmu",
+           "Compat": "434*;436*;43c*;43a01"
    }
 ]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index e74defb..25be18a 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -245,6 +245,14 @@ struct pmu_events_map {
 		.pmu = "uncore_sys_ccn_pmu",
 	},
 	{
+		.name = "sys_cmn_pmu.hnf_cache_miss",
+		.event = "type=0x05,eventid=0x01",
+		.desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
+		.compat = "434*;436*;43c*;43a01",
+		.topic = "uncore",
+		.pmu = "uncore_sys_cmn_pmu",
+	},
+	{
 		.name = 0,
 		.event = 0,
 		.desc = 0,
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 3204252..79fb3e2 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -255,9 +255,24 @@ struct perf_pmu_test_pmu {
 	.matching_pmu = "uncore_sys_ccn_pmu4",
 };
 
+static const struct perf_pmu_test_event sys_cmn_pmu_hnf_cache_miss = {
+	.event = {
+		.name = "sys_cmn_pmu.hnf_cache_miss",
+		.event = "type=0x05,eventid=0x01",
+		.desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
+		.topic = "uncore",
+		.pmu = "uncore_sys_cmn_pmu",
+		.compat = "434*;436*;43c*;43a01",
+	},
+	.alias_str = "type=0x5,eventid=0x1",
+	.alias_long_desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
+	.matching_pmu = "uncore_sys_cmn_pmu0",
+};
+
 static const struct perf_pmu_test_event *sys_events[] = {
 	&sys_ddr_pmu_write_cycles,
 	&sys_ccn_pmu_read_cycles,
+	&sys_cmn_pmu_hnf_cache_miss,
 	NULL
 };
 
@@ -704,6 +719,46 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 			&sys_ccn_pmu_read_cycles,
 		},
 	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43401",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43602",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43c03",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	},
+	{
+		.pmu = {
+			.name = (char *)"uncore_sys_cmn_pmu0",
+			.is_uncore = 1,
+			.id = (char *)"43a01",
+		},
+		.aliases = {
+			&sys_cmn_pmu_hnf_cache_miss,
+		},
+	}
 };
 
 /* Test that aliases generated are as expected */
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-21  8:36   ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Currently just add aliases for part of Arm CMN PMU events which
are general and compatible for any SoC and CMN-ANY.

"Compat" value "434*;436*;43c*;43a*" means it is compatible with
all CMN600/CMN650/CMN700/Ci700, which can be obtained from
commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection").

The arm-cmn PMU events got from:
[0] https://developer.arm.com/documentation/100180/0302/?lang=en
[1] https://developer.arm.com/documentation/101408/0100/?lang=en
[2] https://developer.arm.com/documentation/102308/0302/?lang=en
[3] https://developer.arm.com/documentation/101569/0300/?lang=en

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
 tools/perf/pmu-events/jevents.py                   |   1 +
 2 files changed, 267 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
new file mode 100644
index 0000000..30435a3
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
@@ -0,0 +1,266 @@
+[
+	{
+		"EventName": "hnf_cache_miss",
+		"EventIdCode": "0x1",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts total cache misses in first lookup result (high priority).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_slc_sf_cache_access",
+		"EventIdCode": "0x2",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of cache accesses in first access (high priority).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_cache_fill",
+		"EventIdCode": "0x3",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts total allocations in HN SLC (all cache line allocations to SLC).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_pocq_retry",
+		"EventIdCode": "0x4",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of retried requests.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_pocq_reqs_recvd",
+		"EventIdCode": "0x5",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of requests that HN receives.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_sf_hit",
+		"EventIdCode": "0x6",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SF hits.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_sf_evictions",
+		"EventIdCode": "0x7",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SF eviction cache invalidations initiated.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_dir_snoops_sent",
+		"EventIdCode": "0x8",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of directed snoops sent (not including SF back invalidation).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_brd_snoops_sent",
+		"EventIdCode": "0x9",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of multicast snoops sent (not including SF back invalidation).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_slc_eviction",
+		"EventIdCode": "0xa",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SLC evictions (dirty only).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_slc_fill_invalid_way",
+		"EventIdCode": "0xb",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SLC fills to an invalid way.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_mc_retries",
+		"EventIdCode": "0xc",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of retried transactions by the MC.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_mc_reqs",
+		"EventIdCode": "0xd",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of requests that are sent to MC.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_qos_hh_retry",
+		"EventIdCode": "0xe",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of times a HighHigh priority request is protocolretried at the HN‑F.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_s0_rdata_beats",
+		"EventIdCode": "0x1",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 0. This event measures the read bandwidth, including CMO responses.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_s1_rdata_beats",
+		"EventIdCode": "0x2",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 1. This event measures the read bandwidth, including CMO responses.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_s2_rdata_beats",
+		"EventIdCode": "0x3",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 2. This event measures the read bandwidth, including CMO responses.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_rxdat_flits",
+		"EventIdCode": "0x4",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RXDAT flits received. This event measures the true read data bandwidth, excluding CMOs.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_txdat_flits",
+		"EventIdCode": "0x5",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of TXDAT flits dispatched. This event measures the write bandwidth.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_txreq_flits_total",
+		"EventIdCode": "0x6",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of TXREQ flits dispatched. This event measures the total request bandwidth.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_txreq_flits_retried",
+		"EventIdCode": "0x7",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of retried TXREQ flits dispatched. This event measures the retry rate.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_txrsp_retryack",
+		"EventIdCode": "0x4",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_txdat_flitv",
+		"EventIdCode": "0x5",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of TXDAT flits dispatched from XP to SBSX. This event is a measure of the write bandwidth.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_arvalid_no_arready",
+		"EventIdCode": "0x21",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AR channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_awvalid_no_awready",
+		"EventIdCode": "0x22",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AW channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_wvalid_no_wready",
+		"EventIdCode": "0x23",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on W channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_txrsp_retryack",
+		"EventIdCode": "0x2a",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_arvalid_no_arready",
+		"EventIdCode": "0x2b",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AR channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_arready_no_arvalid",
+		"EventIdCode": "0x2c",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the AR channel is waiting for new requests from HN-I bridge.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_awvalid_no_awready",
+		"EventIdCode": "0x2d",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AW channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_awready_no_awvalid",
+		"EventIdCode": "0x2e",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the AW channel is waiting for new requests from HN-I bridge.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_wvalid_no_wready",
+		"EventIdCode": "0x2f",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on W channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_txdat_stall",
+		"EventIdCode": "0x30",
+		"NodeType": "0x4",
+		"BriefDescription": "TXDAT valid but no link credit available.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	}
+]
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 369c8bf..935bd4b 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -272,6 +272,7 @@ class JsonEvent:
           'DFPMC': 'amd_df',
           'cpu_core': 'cpu_core',
           'cpu_atom': 'cpu_atom',
+          'arm_cmn': 'arm_cmn',
       }
       return table[unit] if unit in table else f'uncore_{unit.lower()}'
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing
@ 2023-08-21  8:36   ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Currently just add aliases for part of Arm CMN PMU events which
are general and compatible for any SoC and CMN-ANY.

"Compat" value "434*;436*;43c*;43a*" means it is compatible with
all CMN600/CMN650/CMN700/Ci700, which can be obtained from
commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection").

The arm-cmn PMU events got from:
[0] https://developer.arm.com/documentation/100180/0302/?lang=en
[1] https://developer.arm.com/documentation/101408/0100/?lang=en
[2] https://developer.arm.com/documentation/102308/0302/?lang=en
[3] https://developer.arm.com/documentation/101569/0300/?lang=en

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
 tools/perf/pmu-events/jevents.py                   |   1 +
 2 files changed, 267 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
new file mode 100644
index 0000000..30435a3
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
@@ -0,0 +1,266 @@
+[
+	{
+		"EventName": "hnf_cache_miss",
+		"EventIdCode": "0x1",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts total cache misses in first lookup result (high priority).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_slc_sf_cache_access",
+		"EventIdCode": "0x2",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of cache accesses in first access (high priority).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_cache_fill",
+		"EventIdCode": "0x3",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts total allocations in HN SLC (all cache line allocations to SLC).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_pocq_retry",
+		"EventIdCode": "0x4",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of retried requests.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_pocq_reqs_recvd",
+		"EventIdCode": "0x5",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of requests that HN receives.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_sf_hit",
+		"EventIdCode": "0x6",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SF hits.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_sf_evictions",
+		"EventIdCode": "0x7",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SF eviction cache invalidations initiated.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_dir_snoops_sent",
+		"EventIdCode": "0x8",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of directed snoops sent (not including SF back invalidation).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_brd_snoops_sent",
+		"EventIdCode": "0x9",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of multicast snoops sent (not including SF back invalidation).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_slc_eviction",
+		"EventIdCode": "0xa",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SLC evictions (dirty only).",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_slc_fill_invalid_way",
+		"EventIdCode": "0xb",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of SLC fills to an invalid way.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_mc_retries",
+		"EventIdCode": "0xc",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of retried transactions by the MC.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_mc_reqs",
+		"EventIdCode": "0xd",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of requests that are sent to MC.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hnf_qos_hh_retry",
+		"EventIdCode": "0xe",
+		"NodeType": "0x5",
+		"BriefDescription": "Counts number of times a HighHigh priority request is protocolretried at the HN‑F.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_s0_rdata_beats",
+		"EventIdCode": "0x1",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 0. This event measures the read bandwidth, including CMO responses.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_s1_rdata_beats",
+		"EventIdCode": "0x2",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 1. This event measures the read bandwidth, including CMO responses.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_s2_rdata_beats",
+		"EventIdCode": "0x3",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 2. This event measures the read bandwidth, including CMO responses.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_rxdat_flits",
+		"EventIdCode": "0x4",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of RXDAT flits received. This event measures the true read data bandwidth, excluding CMOs.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_txdat_flits",
+		"EventIdCode": "0x5",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of TXDAT flits dispatched. This event measures the write bandwidth.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_txreq_flits_total",
+		"EventIdCode": "0x6",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of TXREQ flits dispatched. This event measures the total request bandwidth.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "rnid_txreq_flits_retried",
+		"EventIdCode": "0x7",
+		"NodeType": "0xa",
+		"BriefDescription": "Number of retried TXREQ flits dispatched. This event measures the retry rate.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_txrsp_retryack",
+		"EventIdCode": "0x4",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_txdat_flitv",
+		"EventIdCode": "0x5",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of TXDAT flits dispatched from XP to SBSX. This event is a measure of the write bandwidth.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_arvalid_no_arready",
+		"EventIdCode": "0x21",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AR channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_awvalid_no_awready",
+		"EventIdCode": "0x22",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AW channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "sbsx_wvalid_no_wready",
+		"EventIdCode": "0x23",
+		"NodeType": "0x7",
+		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on W channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_txrsp_retryack",
+		"EventIdCode": "0x2a",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_arvalid_no_arready",
+		"EventIdCode": "0x2b",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AR channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_arready_no_arvalid",
+		"EventIdCode": "0x2c",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the AR channel is waiting for new requests from HN-I bridge.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_awvalid_no_awready",
+		"EventIdCode": "0x2d",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AW channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_awready_no_awvalid",
+		"EventIdCode": "0x2e",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the AW channel is waiting for new requests from HN-I bridge.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_wvalid_no_wready",
+		"EventIdCode": "0x2f",
+		"NodeType": "0x4",
+		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on W channel.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"EventName": "hni_txdat_stall",
+		"EventIdCode": "0x30",
+		"NodeType": "0x4",
+		"BriefDescription": "TXDAT valid but no link credit available.",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	}
+]
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 369c8bf..935bd4b 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -272,6 +272,7 @@ class JsonEvent:
           'DFPMC': 'amd_df',
           'cpu_core': 'cpu_core',
           'cpu_atom': 'cpu_atom',
+          'arm_cmn': 'arm_cmn',
       }
       return table[unit] if unit in table else f'uncore_{unit.lower()}'
 
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-21  8:36   ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
metrics which are general and compatible for any SoC with CMN-ANY.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  | 74 ++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
new file mode 100644
index 0000000..64db534
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
@@ -0,0 +1,74 @@
+[
+	{
+		"MetricName": "slc_miss_rate",
+		"BriefDescription": "The system level cache miss rate.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "hnf_message_retry_rate",
+		"BriefDescription": "HN-F message retry rate indicates whether a lack of credits is causing the bottlenecks.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_pocq_retry / hnf_pocq_reqs_recvd",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "sf_hit_rate",
+		"BriefDescription": "Snoop filter hit rate can be used to measure the snoop filter efficiency.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_sf_hit / hnf_slc_sf_cache_access",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "mc_message_retry_rate",
+		"BriefDescription": "The memory controller request retries rate indicates whether the memory controller is the bottleneck.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_mc_retries / hnf_mc_reqs",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "rni_actual_read_bandwidth.all",
+		"BriefDescription": "This event measure the actual bandwidth that RN-I bridge sends to the interconnect.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "rnid_rxdat_flits * 32 / 1e6 / duration_time",
+		"ScaleUnit": "1MB/s",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "rni_actual_write_bandwidth.all",
+		"BriefDescription": "This event measures the actual write bandwidth at RN-I bridges.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "rnid_txdat_flits * 32 / 1e6 / duration_time",
+		"ScaleUnit": "1MB/s",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "rni_retry_rate",
+		"BriefDescription": "RN-I bridge retry rate indicates whether the memory controller is the bottleneck.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "rnid_txreq_flits_retried / rnid_txreq_flits_total",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "sbsx_actual_write_bandwidth.all",
+		"BriefDescription": "sbsx actual write bandwidth.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "sbsx_txdat_flitv * 32 / 1e6 / duration_time",
+		"ScaleUnit": "1MB/s",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	}
+]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-08-21  8:36   ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-21  8:36 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Jing Zhang, Shuai Xue

Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
metrics which are general and compatible for any SoC with CMN-ANY.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
---
 .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  | 74 ++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
new file mode 100644
index 0000000..64db534
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
@@ -0,0 +1,74 @@
+[
+	{
+		"MetricName": "slc_miss_rate",
+		"BriefDescription": "The system level cache miss rate.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "hnf_message_retry_rate",
+		"BriefDescription": "HN-F message retry rate indicates whether a lack of credits is causing the bottlenecks.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_pocq_retry / hnf_pocq_reqs_recvd",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "sf_hit_rate",
+		"BriefDescription": "Snoop filter hit rate can be used to measure the snoop filter efficiency.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_sf_hit / hnf_slc_sf_cache_access",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "mc_message_retry_rate",
+		"BriefDescription": "The memory controller request retries rate indicates whether the memory controller is the bottleneck.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "hnf_mc_retries / hnf_mc_reqs",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "rni_actual_read_bandwidth.all",
+		"BriefDescription": "This event measure the actual bandwidth that RN-I bridge sends to the interconnect.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "rnid_rxdat_flits * 32 / 1e6 / duration_time",
+		"ScaleUnit": "1MB/s",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "rni_actual_write_bandwidth.all",
+		"BriefDescription": "This event measures the actual write bandwidth at RN-I bridges.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "rnid_txdat_flits * 32 / 1e6 / duration_time",
+		"ScaleUnit": "1MB/s",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "rni_retry_rate",
+		"BriefDescription": "RN-I bridge retry rate indicates whether the memory controller is the bottleneck.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "rnid_txreq_flits_retried / rnid_txreq_flits_total",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	},
+	{
+		"MetricName": "sbsx_actual_write_bandwidth.all",
+		"BriefDescription": "sbsx actual write bandwidth.",
+		"MetricGroup": "cmn",
+		"MetricExpr": "sbsx_txdat_flitv * 32 / 1e6 / duration_time",
+		"ScaleUnit": "1MB/s",
+		"Unit": "arm_cmn",
+		"Compat": "434*;436*;43c*;43a*"
+	}
+]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
  2023-08-21  8:36 ` Jing Zhang
@ 2023-08-23  8:12   ` John Garry
  -1 siblings, 0 replies; 57+ messages in thread
From: John Garry @ 2023-08-23  8:12 UTC (permalink / raw)
  To: Jing Zhang, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 21/08/2023 09:36, Jing Zhang wrote:

I'm hoping that Ian can check the outstanding patches here, but I'll 
also have a look.

> Changes since v6:
> - Supplement the omitted EventCode;
> - Keep the original way of ConfigCode;
> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>    can succeed when build with NO_JEVENT=1.
> - Link: https://urldefense.com/v3/__https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/__;!!ACWV5N9M2RV99hQ!Lakh7Lf-6l6Hovm4Tt5S5pqV1xHm-LAW2ICVl6gLONlN4CNk-BvyADBfjwQe5yQQj89yMK7s7rSjMNHCyFsIUxnHXg$
> 
> Jing Zhang (8):
>    perf pmu: "Compat" supports matching multiple identifiers
>    perf metric: "Compat" supports matching multiple identifiers
>    perf vendor events: Supplement the omitted EventCode
>    perf jevents: Support more event fields
>    perf test: Make matching_pmu effective
>    perf test: Add pmu-event test for "Compat" and new event_field.
>    perf jevents: Add support for Arm CMN PMU aliasing
>    perf vendor events: Add JSON metrics for Arm CMN
> 
>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>   .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>   .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>   .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>   .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>   .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>   .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>   .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>   .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>   .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>   .../arch/x86/broadwellde/uncore-power.json         |   1 +
>   .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>   .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>   .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>   .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>   .../arch/x86/broadwellx/uncore-power.json          |   1 +
>   .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>   .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>   .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>   .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>   .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>   .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>   .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>   .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>   .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>   .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>   .../arch/x86/graniterapids/pipeline.json           |   4 +
>   .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>   .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>   .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>   .../arch/x86/haswellx/uncore-memory.json           |   2 +
>   .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>   .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>   .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>   .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>   .../arch/x86/icelakex/uncore-memory.json           |   1 +
>   .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>   .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>   .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>   .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>   .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>   .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>   .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>   .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>   .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>   .../arch/x86/jaketown/uncore-memory.json           |   1 +
>   .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>   .../arch/x86/knightslanding/pipeline.json          |   3 +
>   .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>   .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>   .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>   .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>   .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>   .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>   .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>   .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>   .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>   .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>   .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>   .../arch/x86/skylakex/uncore-memory.json           |   1 +
>   .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>   .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>   .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>   .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>   .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>   .../arch/x86/snowridgex/uncore-power.json          |   1 +
>   .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>   tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>   tools/perf/pmu-events/jevents.py                   |  21 +-
>   tools/perf/tests/pmu-events.c                      |  64 ++++-
>   tools/perf/util/metricgroup.c                      |   2 +-
>   tools/perf/util/pmu.c                              |  33 ++-
>   tools/perf/util/pmu.h                              |   1 +
>   77 files changed, 679 insertions(+), 9 deletions(-)
>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-08-23  8:12   ` John Garry
  0 siblings, 0 replies; 57+ messages in thread
From: John Garry @ 2023-08-23  8:12 UTC (permalink / raw)
  To: Jing Zhang, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 21/08/2023 09:36, Jing Zhang wrote:

I'm hoping that Ian can check the outstanding patches here, but I'll 
also have a look.

> Changes since v6:
> - Supplement the omitted EventCode;
> - Keep the original way of ConfigCode;
> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>    can succeed when build with NO_JEVENT=1.
> - Link: https://urldefense.com/v3/__https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/__;!!ACWV5N9M2RV99hQ!Lakh7Lf-6l6Hovm4Tt5S5pqV1xHm-LAW2ICVl6gLONlN4CNk-BvyADBfjwQe5yQQj89yMK7s7rSjMNHCyFsIUxnHXg$
> 
> Jing Zhang (8):
>    perf pmu: "Compat" supports matching multiple identifiers
>    perf metric: "Compat" supports matching multiple identifiers
>    perf vendor events: Supplement the omitted EventCode
>    perf jevents: Support more event fields
>    perf test: Make matching_pmu effective
>    perf test: Add pmu-event test for "Compat" and new event_field.
>    perf jevents: Add support for Arm CMN PMU aliasing
>    perf vendor events: Add JSON metrics for Arm CMN
> 
>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>   .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>   .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>   .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>   .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>   .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>   .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>   .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>   .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>   .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>   .../arch/x86/broadwellde/uncore-power.json         |   1 +
>   .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>   .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>   .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>   .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>   .../arch/x86/broadwellx/uncore-power.json          |   1 +
>   .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>   .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>   .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>   .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>   .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>   .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>   .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>   .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>   .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>   .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>   .../arch/x86/graniterapids/pipeline.json           |   4 +
>   .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>   .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>   .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>   .../arch/x86/haswellx/uncore-memory.json           |   2 +
>   .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>   .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>   .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>   .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>   .../arch/x86/icelakex/uncore-memory.json           |   1 +
>   .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>   .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>   .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>   .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>   .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>   .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>   .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>   .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>   .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>   .../arch/x86/jaketown/uncore-memory.json           |   1 +
>   .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>   .../arch/x86/knightslanding/pipeline.json          |   3 +
>   .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>   .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>   .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>   .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>   .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>   .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>   .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>   .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>   .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>   .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>   .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>   .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>   .../arch/x86/skylakex/uncore-memory.json           |   1 +
>   .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>   .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>   .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>   .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>   .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>   .../arch/x86/snowridgex/uncore-power.json          |   1 +
>   .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>   tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>   tools/perf/pmu-events/jevents.py                   |  21 +-
>   tools/perf/tests/pmu-events.c                      |  64 ++++-
>   tools/perf/util/metricgroup.c                      |   2 +-
>   tools/perf/util/pmu.c                              |  33 ++-
>   tools/perf/util/pmu.h                              |   1 +
>   77 files changed, 679 insertions(+), 9 deletions(-)
>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 4/8] perf jevents: Support more event fields
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-23  9:12     ` Robin Murphy
  -1 siblings, 0 replies; 57+ messages in thread
From: Robin Murphy @ 2023-08-23  9:12 UTC (permalink / raw)
  To: Jing Zhang, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 2023-08-21 09:36, Jing Zhang wrote:
> The previous code assumes an event has either an "event=" or "config"
> field at the beginning. For CMN neither of these may be present, as an
> event is typically "type=xx,eventid=xxx".
> 
> If EventCode and ConfigCode is not added in the alias JSON file, the
> event description will add "event=0" by default. So, even if the event
> field is added "eventid=xxx" and "type=xxx", the CMN events final
> parsing result will be "event=0, eventid=xxx, type=xxx".
> 
> Therefore, when EventCode and ConfigCode are missing in JSON, "event=0"
> is no longer added by default. And add EventIdCode and Type to the event
> field.
> 
> I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
> they are consistent.
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> ---
>   tools/perf/pmu-events/jevents.py | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index f57a8f2..369c8bf 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -275,11 +275,14 @@ class JsonEvent:
>         }
>         return table[unit] if unit in table else f'uncore_{unit.lower()}'
>   
> -    eventcode = 0
> +    eventcode = None
>       if 'EventCode' in jd:
>         eventcode = int(jd['EventCode'].split(',', 1)[0], 0)
>       if 'ExtSel' in jd:
> -      eventcode |= int(jd['ExtSel']) << 8
> +      if eventcode is None:
> +        eventcode = int(jd['ExtSel']) << 8
> +      else:
> +        eventcode |= int(jd['ExtSel']) << 8
>       configcode = int(jd['ConfigCode'], 0) if 'ConfigCode' in jd else None
>       self.name = jd['EventName'].lower() if 'EventName' in jd else None
>       self.topic = ''
> @@ -317,7 +320,11 @@ class JsonEvent:
>       if precise and self.desc and '(Precise Event)' not in self.desc:
>         extra_desc += ' (Must be precise)' if precise == '2' else (' (Precise '
>                                                                    'event)')
> -    event = f'config={llx(configcode)}' if configcode is not None else f'event={llx(eventcode)}'
> +    event = None
> +    if eventcode is not None:
> +      event = f'event={llx(eventcode)}'
> +    elif configcode is not None:
> +      event = f'config={llx(configcode)}'
>       event_fields = [
>           ('AnyThread', 'any='),
>           ('PortMask', 'ch_mask='),
> @@ -327,10 +334,15 @@ class JsonEvent:
>           ('Invert', 'inv='),
>           ('SampleAfterValue', 'period='),
>           ('UMask', 'umask='),
> +        ('NodeType', 'type='),
> +        ('EventIdCode', 'eventid='),

FWIW, this smells like another brewing scalability problem, given that 
these are entirely driver-specific. Not sure off-hand how feasible it 
might be, but my instinct says that a neat solution would be to encode 
them right in the JSON, e.g.:

	"FormatAttr": { "type": 0x5 }

such that jevents should then only really need to consider whether an 
event is defined in terms of a raw "ConfigCode", one or more 
"FormatAttr"s which it can then parse dynamically, or reasonable special 
cases like "EventCode" (given how "event" is one of the most commonly 
used formats).

Thanks,
Robin.

>       ]
>       for key, value in event_fields:
>         if key in jd and jd[key] != '0':
> -        event += ',' + value + jd[key]
> +        if event:
> +          event += ',' + value + jd[key]
> +        else:
> +          event = value + jd[key]
>       if filter:
>         event += f',{filter}'
>       if msr:

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 4/8] perf jevents: Support more event fields
@ 2023-08-23  9:12     ` Robin Murphy
  0 siblings, 0 replies; 57+ messages in thread
From: Robin Murphy @ 2023-08-23  9:12 UTC (permalink / raw)
  To: Jing Zhang, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 2023-08-21 09:36, Jing Zhang wrote:
> The previous code assumes an event has either an "event=" or "config"
> field at the beginning. For CMN neither of these may be present, as an
> event is typically "type=xx,eventid=xxx".
> 
> If EventCode and ConfigCode is not added in the alias JSON file, the
> event description will add "event=0" by default. So, even if the event
> field is added "eventid=xxx" and "type=xxx", the CMN events final
> parsing result will be "event=0, eventid=xxx, type=xxx".
> 
> Therefore, when EventCode and ConfigCode are missing in JSON, "event=0"
> is no longer added by default. And add EventIdCode and Type to the event
> field.
> 
> I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
> they are consistent.
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> ---
>   tools/perf/pmu-events/jevents.py | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index f57a8f2..369c8bf 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -275,11 +275,14 @@ class JsonEvent:
>         }
>         return table[unit] if unit in table else f'uncore_{unit.lower()}'
>   
> -    eventcode = 0
> +    eventcode = None
>       if 'EventCode' in jd:
>         eventcode = int(jd['EventCode'].split(',', 1)[0], 0)
>       if 'ExtSel' in jd:
> -      eventcode |= int(jd['ExtSel']) << 8
> +      if eventcode is None:
> +        eventcode = int(jd['ExtSel']) << 8
> +      else:
> +        eventcode |= int(jd['ExtSel']) << 8
>       configcode = int(jd['ConfigCode'], 0) if 'ConfigCode' in jd else None
>       self.name = jd['EventName'].lower() if 'EventName' in jd else None
>       self.topic = ''
> @@ -317,7 +320,11 @@ class JsonEvent:
>       if precise and self.desc and '(Precise Event)' not in self.desc:
>         extra_desc += ' (Must be precise)' if precise == '2' else (' (Precise '
>                                                                    'event)')
> -    event = f'config={llx(configcode)}' if configcode is not None else f'event={llx(eventcode)}'
> +    event = None
> +    if eventcode is not None:
> +      event = f'event={llx(eventcode)}'
> +    elif configcode is not None:
> +      event = f'config={llx(configcode)}'
>       event_fields = [
>           ('AnyThread', 'any='),
>           ('PortMask', 'ch_mask='),
> @@ -327,10 +334,15 @@ class JsonEvent:
>           ('Invert', 'inv='),
>           ('SampleAfterValue', 'period='),
>           ('UMask', 'umask='),
> +        ('NodeType', 'type='),
> +        ('EventIdCode', 'eventid='),

FWIW, this smells like another brewing scalability problem, given that 
these are entirely driver-specific. Not sure off-hand how feasible it 
might be, but my instinct says that a neat solution would be to encode 
them right in the JSON, e.g.:

	"FormatAttr": { "type": 0x5 }

such that jevents should then only really need to consider whether an 
event is defined in terms of a raw "ConfigCode", one or more 
"FormatAttr"s which it can then parse dynamically, or reasonable special 
cases like "EventCode" (given how "event" is one of the most commonly 
used formats).

Thanks,
Robin.

>       ]
>       for key, value in event_fields:
>         if key in jd and jd[key] != '0':
> -        event += ',' + value + jd[key]
> +        if event:
> +          event += ',' + value + jd[key]
> +        else:
> +          event = value + jd[key]
>       if filter:
>         event += f',{filter}'
>       if msr:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-23  9:33     ` Robin Murphy
  -1 siblings, 0 replies; 57+ messages in thread
From: Robin Murphy @ 2023-08-23  9:33 UTC (permalink / raw)
  To: Jing Zhang, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 2023-08-21 09:36, Jing Zhang wrote:
> Currently just add aliases for part of Arm CMN PMU events which
> are general and compatible for any SoC and CMN-ANY.
> 
> "Compat" value "434*;436*;43c*;43a*" means it is compatible with
> all CMN600/CMN650/CMN700/Ci700, which can be obtained from
> commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection").
> 
> The arm-cmn PMU events got from:
> [0] https://developer.arm.com/documentation/100180/0302/?lang=en
> [1] https://developer.arm.com/documentation/101408/0100/?lang=en
> [2] https://developer.arm.com/documentation/102308/0302/?lang=en
> [3] https://developer.arm.com/documentation/101569/0300/?lang=en
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>   tools/perf/pmu-events/jevents.py                   |   1 +
>   2 files changed, 267 insertions(+)
>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
> new file mode 100644
> index 0000000..30435a3
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
> @@ -0,0 +1,266 @@
> +[
> +	{
> +		"EventName": "hnf_cache_miss",
> +		"EventIdCode": "0x1",
> +		"NodeType": "0x5",

Given my other comment, I also think there would be no harm in just 
having these as:
	
		"ConfigCode" : "0x10005"

if you'd rather make life easier to begin with, then be able to come 
back and improve things later. IMO it doesn't affect the readability of 
the important values *all* that much, since it's not like they're tighly 
packed together in oddly-aligned bitfields.

Thanks,
Robin.

> +		"BriefDescription": "Counts total cache misses in first lookup result (high priority).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_slc_sf_cache_access",
> +		"EventIdCode": "0x2",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of cache accesses in first access (high priority).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_cache_fill",
> +		"EventIdCode": "0x3",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts total allocations in HN SLC (all cache line allocations to SLC).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_pocq_retry",
> +		"EventIdCode": "0x4",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of retried requests.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_pocq_reqs_recvd",
> +		"EventIdCode": "0x5",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of requests that HN receives.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_sf_hit",
> +		"EventIdCode": "0x6",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SF hits.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_sf_evictions",
> +		"EventIdCode": "0x7",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SF eviction cache invalidations initiated.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_dir_snoops_sent",
> +		"EventIdCode": "0x8",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of directed snoops sent (not including SF back invalidation).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_brd_snoops_sent",
> +		"EventIdCode": "0x9",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of multicast snoops sent (not including SF back invalidation).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_slc_eviction",
> +		"EventIdCode": "0xa",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SLC evictions (dirty only).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_slc_fill_invalid_way",
> +		"EventIdCode": "0xb",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SLC fills to an invalid way.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_mc_retries",
> +		"EventIdCode": "0xc",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of retried transactions by the MC.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_mc_reqs",
> +		"EventIdCode": "0xd",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of requests that are sent to MC.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_qos_hh_retry",
> +		"EventIdCode": "0xe",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of times a HighHigh priority request is protocolretried at the HN‑F.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_s0_rdata_beats",
> +		"EventIdCode": "0x1",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 0. This event measures the read bandwidth, including CMO responses.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_s1_rdata_beats",
> +		"EventIdCode": "0x2",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 1. This event measures the read bandwidth, including CMO responses.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_s2_rdata_beats",
> +		"EventIdCode": "0x3",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 2. This event measures the read bandwidth, including CMO responses.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_rxdat_flits",
> +		"EventIdCode": "0x4",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RXDAT flits received. This event measures the true read data bandwidth, excluding CMOs.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_txdat_flits",
> +		"EventIdCode": "0x5",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of TXDAT flits dispatched. This event measures the write bandwidth.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_txreq_flits_total",
> +		"EventIdCode": "0x6",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of TXREQ flits dispatched. This event measures the total request bandwidth.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_txreq_flits_retried",
> +		"EventIdCode": "0x7",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of retried TXREQ flits dispatched. This event measures the retry rate.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_txrsp_retryack",
> +		"EventIdCode": "0x4",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_txdat_flitv",
> +		"EventIdCode": "0x5",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of TXDAT flits dispatched from XP to SBSX. This event is a measure of the write bandwidth.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_arvalid_no_arready",
> +		"EventIdCode": "0x21",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AR channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_awvalid_no_awready",
> +		"EventIdCode": "0x22",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AW channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_wvalid_no_wready",
> +		"EventIdCode": "0x23",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on W channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_txrsp_retryack",
> +		"EventIdCode": "0x2a",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_arvalid_no_arready",
> +		"EventIdCode": "0x2b",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AR channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_arready_no_arvalid",
> +		"EventIdCode": "0x2c",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the AR channel is waiting for new requests from HN-I bridge.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_awvalid_no_awready",
> +		"EventIdCode": "0x2d",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AW channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_awready_no_awvalid",
> +		"EventIdCode": "0x2e",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the AW channel is waiting for new requests from HN-I bridge.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_wvalid_no_wready",
> +		"EventIdCode": "0x2f",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on W channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_txdat_stall",
> +		"EventIdCode": "0x30",
> +		"NodeType": "0x4",
> +		"BriefDescription": "TXDAT valid but no link credit available.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	}
> +]
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 369c8bf..935bd4b 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -272,6 +272,7 @@ class JsonEvent:
>             'DFPMC': 'amd_df',
>             'cpu_core': 'cpu_core',
>             'cpu_atom': 'cpu_atom',
> +          'arm_cmn': 'arm_cmn',
>         }
>         return table[unit] if unit in table else f'uncore_{unit.lower()}'
>   

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing
@ 2023-08-23  9:33     ` Robin Murphy
  0 siblings, 0 replies; 57+ messages in thread
From: Robin Murphy @ 2023-08-23  9:33 UTC (permalink / raw)
  To: Jing Zhang, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 2023-08-21 09:36, Jing Zhang wrote:
> Currently just add aliases for part of Arm CMN PMU events which
> are general and compatible for any SoC and CMN-ANY.
> 
> "Compat" value "434*;436*;43c*;43a*" means it is compatible with
> all CMN600/CMN650/CMN700/Ci700, which can be obtained from
> commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection").
> 
> The arm-cmn PMU events got from:
> [0] https://developer.arm.com/documentation/100180/0302/?lang=en
> [1] https://developer.arm.com/documentation/101408/0100/?lang=en
> [2] https://developer.arm.com/documentation/102308/0302/?lang=en
> [3] https://developer.arm.com/documentation/101569/0300/?lang=en
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>   tools/perf/pmu-events/jevents.py                   |   1 +
>   2 files changed, 267 insertions(+)
>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
> new file mode 100644
> index 0000000..30435a3
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
> @@ -0,0 +1,266 @@
> +[
> +	{
> +		"EventName": "hnf_cache_miss",
> +		"EventIdCode": "0x1",
> +		"NodeType": "0x5",

Given my other comment, I also think there would be no harm in just 
having these as:
	
		"ConfigCode" : "0x10005"

if you'd rather make life easier to begin with, then be able to come 
back and improve things later. IMO it doesn't affect the readability of 
the important values *all* that much, since it's not like they're tighly 
packed together in oddly-aligned bitfields.

Thanks,
Robin.

> +		"BriefDescription": "Counts total cache misses in first lookup result (high priority).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_slc_sf_cache_access",
> +		"EventIdCode": "0x2",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of cache accesses in first access (high priority).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_cache_fill",
> +		"EventIdCode": "0x3",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts total allocations in HN SLC (all cache line allocations to SLC).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_pocq_retry",
> +		"EventIdCode": "0x4",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of retried requests.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_pocq_reqs_recvd",
> +		"EventIdCode": "0x5",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of requests that HN receives.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_sf_hit",
> +		"EventIdCode": "0x6",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SF hits.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_sf_evictions",
> +		"EventIdCode": "0x7",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SF eviction cache invalidations initiated.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_dir_snoops_sent",
> +		"EventIdCode": "0x8",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of directed snoops sent (not including SF back invalidation).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_brd_snoops_sent",
> +		"EventIdCode": "0x9",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of multicast snoops sent (not including SF back invalidation).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_slc_eviction",
> +		"EventIdCode": "0xa",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SLC evictions (dirty only).",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_slc_fill_invalid_way",
> +		"EventIdCode": "0xb",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of SLC fills to an invalid way.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_mc_retries",
> +		"EventIdCode": "0xc",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of retried transactions by the MC.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_mc_reqs",
> +		"EventIdCode": "0xd",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of requests that are sent to MC.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hnf_qos_hh_retry",
> +		"EventIdCode": "0xe",
> +		"NodeType": "0x5",
> +		"BriefDescription": "Counts number of times a HighHigh priority request is protocolretried at the HN‑F.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_s0_rdata_beats",
> +		"EventIdCode": "0x1",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 0. This event measures the read bandwidth, including CMO responses.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_s1_rdata_beats",
> +		"EventIdCode": "0x2",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 1. This event measures the read bandwidth, including CMO responses.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_s2_rdata_beats",
> +		"EventIdCode": "0x3",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 2. This event measures the read bandwidth, including CMO responses.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_rxdat_flits",
> +		"EventIdCode": "0x4",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of RXDAT flits received. This event measures the true read data bandwidth, excluding CMOs.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_txdat_flits",
> +		"EventIdCode": "0x5",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of TXDAT flits dispatched. This event measures the write bandwidth.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_txreq_flits_total",
> +		"EventIdCode": "0x6",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of TXREQ flits dispatched. This event measures the total request bandwidth.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "rnid_txreq_flits_retried",
> +		"EventIdCode": "0x7",
> +		"NodeType": "0xa",
> +		"BriefDescription": "Number of retried TXREQ flits dispatched. This event measures the retry rate.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_txrsp_retryack",
> +		"EventIdCode": "0x4",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_txdat_flitv",
> +		"EventIdCode": "0x5",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of TXDAT flits dispatched from XP to SBSX. This event is a measure of the write bandwidth.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_arvalid_no_arready",
> +		"EventIdCode": "0x21",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AR channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_awvalid_no_awready",
> +		"EventIdCode": "0x22",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AW channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "sbsx_wvalid_no_wready",
> +		"EventIdCode": "0x23",
> +		"NodeType": "0x7",
> +		"BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on W channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_txrsp_retryack",
> +		"EventIdCode": "0x2a",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_arvalid_no_arready",
> +		"EventIdCode": "0x2b",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AR channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_arready_no_arvalid",
> +		"EventIdCode": "0x2c",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the AR channel is waiting for new requests from HN-I bridge.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_awvalid_no_awready",
> +		"EventIdCode": "0x2d",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AW channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_awready_no_awvalid",
> +		"EventIdCode": "0x2e",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the AW channel is waiting for new requests from HN-I bridge.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_wvalid_no_wready",
> +		"EventIdCode": "0x2f",
> +		"NodeType": "0x4",
> +		"BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on W channel.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	},
> +	{
> +		"EventName": "hni_txdat_stall",
> +		"EventIdCode": "0x30",
> +		"NodeType": "0x4",
> +		"BriefDescription": "TXDAT valid but no link credit available.",
> +		"Unit": "arm_cmn",
> +		"Compat": "434*;436*;43c*;43a*"
> +	}
> +]
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 369c8bf..935bd4b 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -272,6 +272,7 @@ class JsonEvent:
>             'DFPMC': 'amd_df',
>             'cpu_core': 'cpu_core',
>             'cpu_atom': 'cpu_atom',
> +          'arm_cmn': 'arm_cmn',
>         }
>         return table[unit] if unit in table else f'uncore_{unit.lower()}'
>   

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing
  2023-08-23  9:33     ` Robin Murphy
@ 2023-08-24  2:12       ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-24  2:12 UTC (permalink / raw)
  To: Robin Murphy, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/8/23 下午5:33, Robin Murphy 写道:
> On 2023-08-21 09:36, Jing Zhang wrote:
>> Currently just add aliases for part of Arm CMN PMU events which
>> are general and compatible for any SoC and CMN-ANY.
>>
>> "Compat" value "434*;436*;43c*;43a*" means it is compatible with
>> all CMN600/CMN650/CMN700/Ci700, which can be obtained from
>> commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection").
>>
>> The arm-cmn PMU events got from:
>> [0] https://developer.arm.com/documentation/100180/0302/?lang=en
>> [1] https://developer.arm.com/documentation/101408/0100/?lang=en
>> [2] https://developer.arm.com/documentation/102308/0302/?lang=en
>> [3] https://developer.arm.com/documentation/101569/0300/?lang=en
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>>   tools/perf/pmu-events/jevents.py                   |   1 +
>>   2 files changed, 267 insertions(+)
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>> new file mode 100644
>> index 0000000..30435a3
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>> @@ -0,0 +1,266 @@
>> +[
>> +    {
>> +        "EventName": "hnf_cache_miss",
>> +        "EventIdCode": "0x1",
>> +        "NodeType": "0x5",
> 
> Given my other comment, I also think there would be no harm in just having these as:
>     
>         "ConfigCode" : "0x10005"
> 
> if you'd rather make life easier to begin with, then be able to come back and improve things later. IMO it doesn't affect the readability of the important values *all* that much, since it's not like they're tighly packed together in oddly-aligned bitfields.
> 

Thanks a lot! that's a great suggestion.  I hope to merge it in v6.6 first in the current way,
and then I will improve it in CMN driver and perf tools later.

Thanks,
Jing


> Thanks,
> Robin.
> 
>> +        "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_slc_sf_cache_access",
>> +        "EventIdCode": "0x2",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of cache accesses in first access (high priority).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_cache_fill",
>> +        "EventIdCode": "0x3",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts total allocations in HN SLC (all cache line allocations to SLC).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_pocq_retry",
>> +        "EventIdCode": "0x4",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of retried requests.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_pocq_reqs_recvd",
>> +        "EventIdCode": "0x5",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of requests that HN receives.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_sf_hit",
>> +        "EventIdCode": "0x6",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SF hits.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_sf_evictions",
>> +        "EventIdCode": "0x7",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SF eviction cache invalidations initiated.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_dir_snoops_sent",
>> +        "EventIdCode": "0x8",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of directed snoops sent (not including SF back invalidation).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_brd_snoops_sent",
>> +        "EventIdCode": "0x9",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of multicast snoops sent (not including SF back invalidation).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_slc_eviction",
>> +        "EventIdCode": "0xa",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SLC evictions (dirty only).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_slc_fill_invalid_way",
>> +        "EventIdCode": "0xb",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SLC fills to an invalid way.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_mc_retries",
>> +        "EventIdCode": "0xc",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of retried transactions by the MC.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_mc_reqs",
>> +        "EventIdCode": "0xd",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of requests that are sent to MC.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_qos_hh_retry",
>> +        "EventIdCode": "0xe",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of times a HighHigh priority request is protocolretried at the HN‑F.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_s0_rdata_beats",
>> +        "EventIdCode": "0x1",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 0. This event measures the read bandwidth, including CMO responses.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_s1_rdata_beats",
>> +        "EventIdCode": "0x2",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 1. This event measures the read bandwidth, including CMO responses.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_s2_rdata_beats",
>> +        "EventIdCode": "0x3",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 2. This event measures the read bandwidth, including CMO responses.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_rxdat_flits",
>> +        "EventIdCode": "0x4",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RXDAT flits received. This event measures the true read data bandwidth, excluding CMOs.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_txdat_flits",
>> +        "EventIdCode": "0x5",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of TXDAT flits dispatched. This event measures the write bandwidth.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_txreq_flits_total",
>> +        "EventIdCode": "0x6",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of TXREQ flits dispatched. This event measures the total request bandwidth.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_txreq_flits_retried",
>> +        "EventIdCode": "0x7",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of retried TXREQ flits dispatched. This event measures the retry rate.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_txrsp_retryack",
>> +        "EventIdCode": "0x4",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_txdat_flitv",
>> +        "EventIdCode": "0x5",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of TXDAT flits dispatched from XP to SBSX. This event is a measure of the write bandwidth.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_arvalid_no_arready",
>> +        "EventIdCode": "0x21",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AR channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_awvalid_no_awready",
>> +        "EventIdCode": "0x22",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AW channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_wvalid_no_wready",
>> +        "EventIdCode": "0x23",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on W channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_txrsp_retryack",
>> +        "EventIdCode": "0x2a",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_arvalid_no_arready",
>> +        "EventIdCode": "0x2b",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AR channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_arready_no_arvalid",
>> +        "EventIdCode": "0x2c",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the AR channel is waiting for new requests from HN-I bridge.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_awvalid_no_awready",
>> +        "EventIdCode": "0x2d",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AW channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_awready_no_awvalid",
>> +        "EventIdCode": "0x2e",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the AW channel is waiting for new requests from HN-I bridge.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_wvalid_no_wready",
>> +        "EventIdCode": "0x2f",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on W channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_txdat_stall",
>> +        "EventIdCode": "0x30",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "TXDAT valid but no link credit available.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    }
>> +]
>> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
>> index 369c8bf..935bd4b 100755
>> --- a/tools/perf/pmu-events/jevents.py
>> +++ b/tools/perf/pmu-events/jevents.py
>> @@ -272,6 +272,7 @@ class JsonEvent:
>>             'DFPMC': 'amd_df',
>>             'cpu_core': 'cpu_core',
>>             'cpu_atom': 'cpu_atom',
>> +          'arm_cmn': 'arm_cmn',
>>         }
>>         return table[unit] if unit in table else f'uncore_{unit.lower()}'
>>   

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing
@ 2023-08-24  2:12       ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-24  2:12 UTC (permalink / raw)
  To: Robin Murphy, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/8/23 下午5:33, Robin Murphy 写道:
> On 2023-08-21 09:36, Jing Zhang wrote:
>> Currently just add aliases for part of Arm CMN PMU events which
>> are general and compatible for any SoC and CMN-ANY.
>>
>> "Compat" value "434*;436*;43c*;43a*" means it is compatible with
>> all CMN600/CMN650/CMN700/Ci700, which can be obtained from
>> commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection").
>>
>> The arm-cmn PMU events got from:
>> [0] https://developer.arm.com/documentation/100180/0302/?lang=en
>> [1] https://developer.arm.com/documentation/101408/0100/?lang=en
>> [2] https://developer.arm.com/documentation/102308/0302/?lang=en
>> [3] https://developer.arm.com/documentation/101569/0300/?lang=en
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>>   tools/perf/pmu-events/jevents.py                   |   1 +
>>   2 files changed, 267 insertions(+)
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>> new file mode 100644
>> index 0000000..30435a3
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>> @@ -0,0 +1,266 @@
>> +[
>> +    {
>> +        "EventName": "hnf_cache_miss",
>> +        "EventIdCode": "0x1",
>> +        "NodeType": "0x5",
> 
> Given my other comment, I also think there would be no harm in just having these as:
>     
>         "ConfigCode" : "0x10005"
> 
> if you'd rather make life easier to begin with, then be able to come back and improve things later. IMO it doesn't affect the readability of the important values *all* that much, since it's not like they're tighly packed together in oddly-aligned bitfields.
> 

Thanks a lot! that's a great suggestion.  I hope to merge it in v6.6 first in the current way,
and then I will improve it in CMN driver and perf tools later.

Thanks,
Jing


> Thanks,
> Robin.
> 
>> +        "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_slc_sf_cache_access",
>> +        "EventIdCode": "0x2",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of cache accesses in first access (high priority).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_cache_fill",
>> +        "EventIdCode": "0x3",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts total allocations in HN SLC (all cache line allocations to SLC).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_pocq_retry",
>> +        "EventIdCode": "0x4",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of retried requests.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_pocq_reqs_recvd",
>> +        "EventIdCode": "0x5",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of requests that HN receives.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_sf_hit",
>> +        "EventIdCode": "0x6",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SF hits.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_sf_evictions",
>> +        "EventIdCode": "0x7",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SF eviction cache invalidations initiated.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_dir_snoops_sent",
>> +        "EventIdCode": "0x8",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of directed snoops sent (not including SF back invalidation).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_brd_snoops_sent",
>> +        "EventIdCode": "0x9",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of multicast snoops sent (not including SF back invalidation).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_slc_eviction",
>> +        "EventIdCode": "0xa",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SLC evictions (dirty only).",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_slc_fill_invalid_way",
>> +        "EventIdCode": "0xb",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of SLC fills to an invalid way.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_mc_retries",
>> +        "EventIdCode": "0xc",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of retried transactions by the MC.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_mc_reqs",
>> +        "EventIdCode": "0xd",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of requests that are sent to MC.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hnf_qos_hh_retry",
>> +        "EventIdCode": "0xe",
>> +        "NodeType": "0x5",
>> +        "BriefDescription": "Counts number of times a HighHigh priority request is protocolretried at the HN‑F.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_s0_rdata_beats",
>> +        "EventIdCode": "0x1",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 0. This event measures the read bandwidth, including CMO responses.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_s1_rdata_beats",
>> +        "EventIdCode": "0x2",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 1. This event measures the read bandwidth, including CMO responses.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_s2_rdata_beats",
>> +        "EventIdCode": "0x3",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatched on port 2. This event measures the read bandwidth, including CMO responses.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_rxdat_flits",
>> +        "EventIdCode": "0x4",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of RXDAT flits received. This event measures the true read data bandwidth, excluding CMOs.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_txdat_flits",
>> +        "EventIdCode": "0x5",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of TXDAT flits dispatched. This event measures the write bandwidth.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_txreq_flits_total",
>> +        "EventIdCode": "0x6",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of TXREQ flits dispatched. This event measures the total request bandwidth.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "rnid_txreq_flits_retried",
>> +        "EventIdCode": "0x7",
>> +        "NodeType": "0xa",
>> +        "BriefDescription": "Number of retried TXREQ flits dispatched. This event measures the retry rate.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_txrsp_retryack",
>> +        "EventIdCode": "0x4",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_txdat_flitv",
>> +        "EventIdCode": "0x5",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of TXDAT flits dispatched from XP to SBSX. This event is a measure of the write bandwidth.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_arvalid_no_arready",
>> +        "EventIdCode": "0x21",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AR channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_awvalid_no_awready",
>> +        "EventIdCode": "0x22",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on AW channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "sbsx_wvalid_no_wready",
>> +        "EventIdCode": "0x23",
>> +        "NodeType": "0x7",
>> +        "BriefDescription": "Number of cycles the SBSX bridge is stalled because of backpressure on W channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_txrsp_retryack",
>> +        "EventIdCode": "0x2a",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of RXREQ flits dispatched. This event is a measure of the retry rate.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_arvalid_no_arready",
>> +        "EventIdCode": "0x2b",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AR channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_arready_no_arvalid",
>> +        "EventIdCode": "0x2c",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the AR channel is waiting for new requests from HN-I bridge.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_awvalid_no_awready",
>> +        "EventIdCode": "0x2d",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on AW channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_awready_no_awvalid",
>> +        "EventIdCode": "0x2e",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the AW channel is waiting for new requests from HN-I bridge.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_wvalid_no_wready",
>> +        "EventIdCode": "0x2f",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "Number of cycles the HN-I bridge is stalled because of backpressure on W channel.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    },
>> +    {
>> +        "EventName": "hni_txdat_stall",
>> +        "EventIdCode": "0x30",
>> +        "NodeType": "0x4",
>> +        "BriefDescription": "TXDAT valid but no link credit available.",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "434*;436*;43c*;43a*"
>> +    }
>> +]
>> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
>> index 369c8bf..935bd4b 100755
>> --- a/tools/perf/pmu-events/jevents.py
>> +++ b/tools/perf/pmu-events/jevents.py
>> @@ -272,6 +272,7 @@ class JsonEvent:
>>             'DFPMC': 'amd_df',
>>             'cpu_core': 'cpu_core',
>>             'cpu_atom': 'cpu_atom',
>> +          'arm_cmn': 'arm_cmn',
>>         }
>>         return table[unit] if unit in table else f'uncore_{unit.lower()}'
>>   

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
  2023-08-23  8:12   ` John Garry
@ 2023-08-24  2:33     ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-24  2:33 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/8/23 下午4:12, John Garry 写道:
> On 21/08/2023 09:36, Jing Zhang wrote:
> 
> I'm hoping that Ian can check the outstanding patches here, but I'll also have a look.
> 

Thanks! I haven't added your tag to patch 6 for now, because I made a modification and added
the code in empty-pmu-events.c. Looking forward to your review.

Thanks,
Jing

>> Changes since v6:
>> - Supplement the omitted EventCode;
>> - Keep the original way of ConfigCode;
>> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>>    can succeed when build with NO_JEVENT=1.
>> - Link: https://urldefense.com/v3/__https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/__;!!ACWV5N9M2RV99hQ!Lakh7Lf-6l6Hovm4Tt5S5pqV1xHm-LAW2ICVl6gLONlN4CNk-BvyADBfjwQe5yQQj89yMK7s7rSjMNHCyFsIUxnHXg$
>>
>> Jing Zhang (8):
>>    perf pmu: "Compat" supports matching multiple identifiers
>>    perf metric: "Compat" supports matching multiple identifiers
>>    perf vendor events: Supplement the omitted EventCode
>>    perf jevents: Support more event fields
>>    perf test: Make matching_pmu effective
>>    perf test: Add pmu-event test for "Compat" and new event_field.
>>    perf jevents: Add support for Arm CMN PMU aliasing
>>    perf vendor events: Add JSON metrics for Arm CMN
>>
>>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>>   .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>>   .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>>   .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>>   .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>>   .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>>   .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>>   .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>>   .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>>   .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>>   .../arch/x86/broadwellde/uncore-power.json         |   1 +
>>   .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>>   .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>>   .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>>   .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>>   .../arch/x86/broadwellx/uncore-power.json          |   1 +
>>   .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>>   .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>>   .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>>   .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>>   .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>>   .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>>   .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>>   .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>>   .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>>   .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>>   .../arch/x86/graniterapids/pipeline.json           |   4 +
>>   .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>>   .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>>   .../arch/x86/haswellx/uncore-memory.json           |   2 +
>>   .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>>   .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>>   .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>>   .../arch/x86/icelakex/uncore-memory.json           |   1 +
>>   .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>>   .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>>   .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>>   .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>>   .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>>   .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>>   .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>>   .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>>   .../arch/x86/jaketown/uncore-memory.json           |   1 +
>>   .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>>   .../arch/x86/knightslanding/pipeline.json          |   3 +
>>   .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>>   .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>>   .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>>   .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>>   .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>>   .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>>   .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>>   .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>>   .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>>   .../arch/x86/skylakex/uncore-memory.json           |   1 +
>>   .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>>   .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>>   .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>>   .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>>   .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>>   .../arch/x86/snowridgex/uncore-power.json          |   1 +
>>   .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>>   tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>>   tools/perf/pmu-events/jevents.py                   |  21 +-
>>   tools/perf/tests/pmu-events.c                      |  64 ++++-
>>   tools/perf/util/metricgroup.c                      |   2 +-
>>   tools/perf/util/pmu.c                              |  33 ++-
>>   tools/perf/util/pmu.h                              |   1 +
>>   77 files changed, 679 insertions(+), 9 deletions(-)
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-08-24  2:33     ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-24  2:33 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/8/23 下午4:12, John Garry 写道:
> On 21/08/2023 09:36, Jing Zhang wrote:
> 
> I'm hoping that Ian can check the outstanding patches here, but I'll also have a look.
> 

Thanks! I haven't added your tag to patch 6 for now, because I made a modification and added
the code in empty-pmu-events.c. Looking forward to your review.

Thanks,
Jing

>> Changes since v6:
>> - Supplement the omitted EventCode;
>> - Keep the original way of ConfigCode;
>> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>>    can succeed when build with NO_JEVENT=1.
>> - Link: https://urldefense.com/v3/__https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/__;!!ACWV5N9M2RV99hQ!Lakh7Lf-6l6Hovm4Tt5S5pqV1xHm-LAW2ICVl6gLONlN4CNk-BvyADBfjwQe5yQQj89yMK7s7rSjMNHCyFsIUxnHXg$
>>
>> Jing Zhang (8):
>>    perf pmu: "Compat" supports matching multiple identifiers
>>    perf metric: "Compat" supports matching multiple identifiers
>>    perf vendor events: Supplement the omitted EventCode
>>    perf jevents: Support more event fields
>>    perf test: Make matching_pmu effective
>>    perf test: Add pmu-event test for "Compat" and new event_field.
>>    perf jevents: Add support for Arm CMN PMU aliasing
>>    perf vendor events: Add JSON metrics for Arm CMN
>>
>>   .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>>   .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>>   .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>>   .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>>   .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>>   .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>>   .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>>   .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>>   .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>>   .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>>   .../arch/x86/broadwellde/uncore-power.json         |   1 +
>>   .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>>   .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>>   .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>>   .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>>   .../arch/x86/broadwellx/uncore-power.json          |   1 +
>>   .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>>   .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>>   .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>>   .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>>   .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>>   .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>>   .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>>   .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>>   .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>>   .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>>   .../arch/x86/graniterapids/pipeline.json           |   4 +
>>   .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>>   .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>>   .../arch/x86/haswellx/uncore-memory.json           |   2 +
>>   .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>>   .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>>   .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>>   .../arch/x86/icelakex/uncore-memory.json           |   1 +
>>   .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>>   .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>>   .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>>   .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>>   .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>>   .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>>   .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>>   .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>>   .../arch/x86/jaketown/uncore-memory.json           |   1 +
>>   .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>>   .../arch/x86/knightslanding/pipeline.json          |   3 +
>>   .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>>   .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>>   .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>>   .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>>   .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>>   .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>>   .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>>   .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>>   .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>>   .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>>   .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>>   .../arch/x86/skylakex/uncore-memory.json           |   1 +
>>   .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>>   .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>>   .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>>   .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>>   .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>>   .../arch/x86/snowridgex/uncore-power.json          |   1 +
>>   .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>>   tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>>   tools/perf/pmu-events/jevents.py                   |  21 +-
>>   tools/perf/tests/pmu-events.c                      |  64 ++++-
>>   tools/perf/util/metricgroup.c                      |   2 +-
>>   tools/perf/util/pmu.c                              |  33 ++-
>>   tools/perf/util/pmu.h                              |   1 +
>>   77 files changed, 679 insertions(+), 9 deletions(-)
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-24 15:05     ` Robin Murphy
  -1 siblings, 0 replies; 57+ messages in thread
From: Robin Murphy @ 2023-08-24 15:05 UTC (permalink / raw)
  To: Jing Zhang, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 21/08/2023 9:36 am, Jing Zhang wrote:
> The jevent "Compat" is used for uncore PMU alias or metric definitions.
> 
> The same PMU driver has different PMU identifiers due to different
> hardware versions and types, but they may have some common PMU event.
> Since a Compat value can only match one identifier, when adding the
> same event alias to PMUs with different identifiers, each identifier
> needs to be defined once, which is not streamlined enough.
> 
> So let "Compat" supports matching multiple identifiers for uncore PMU
> alias. For example, the Compat value {43401;436*} can match the PMU
> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
> the prefix "436", that is, all CMN650, where "*" is a wildcard.
> Tokens in Unit field are delimited by ';' with no spaces.

I wonder is there any possibility of supporting multiple values as a 
JSON array, rather than a single delimited string? Otherwise, if we're 
putting restrictions on what characters a driver can expose as an 
identifier, then I think that really wants explicitly documenting. 
AFAICT there's currently not even any documentation of the de-facto ABI 
that it's expected to be a free-form string rather than completely 
arbitrary binary data.

Thanks,
Robin.

> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>   tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>   tools/perf/util/pmu.h |  1 +
>   2 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index ad209c8..6402423 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>   	return res;
>   }
>   
> +bool pmu_uncore_identifier_match(const char *id, const char *compat)
> +{
> +	char *tmp = NULL, *tok, *str;
> +	bool res;
> +	int n;
> +
> +	/*
> +	 * The strdup() call is necessary here because "compat" is a const str*
> +	 * type and cannot be used as an argument to strtok_r().
> +	 */
> +	str = strdup(compat);
> +	if (!str)
> +		return false;
> +
> +	tok = strtok_r(str, ";", &tmp);
> +	for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
> +		n = strlen(tok);
> +		if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
> +		    !strcmp(id, tok)) {
> +			res = true;
> +			goto out;
> +		}
> +	}
> +	res = false;
> +out:
> +	free(str);
> +	return res;
> +}
> +
>   struct pmu_add_cpu_aliases_map_data {
>   	struct list_head *head;
>   	const char *name;
> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>   	if (!pe->compat || !pe->pmu)
>   		return 0;
>   
> -	if (!strcmp(pmu->id, pe->compat) &&
> -	    pmu_uncore_alias_match(pe->pmu, pmu->name)) {
> +	if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
> +	    pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>   		__perf_pmu__new_alias(idata->head, -1,
>   				      (char *)pe->name,
>   				      (char *)pe->desc,
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index b9a02de..9d4385d 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>   char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>   const struct pmu_events_table *pmu_events_table__find(void);
>   const struct pmu_metrics_table *pmu_metrics_table__find(void);
> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>   void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>   
>   int perf_pmu__convert_scale(const char *scale, char **end, double *sval);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
@ 2023-08-24 15:05     ` Robin Murphy
  0 siblings, 0 replies; 57+ messages in thread
From: Robin Murphy @ 2023-08-24 15:05 UTC (permalink / raw)
  To: Jing Zhang, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

On 21/08/2023 9:36 am, Jing Zhang wrote:
> The jevent "Compat" is used for uncore PMU alias or metric definitions.
> 
> The same PMU driver has different PMU identifiers due to different
> hardware versions and types, but they may have some common PMU event.
> Since a Compat value can only match one identifier, when adding the
> same event alias to PMUs with different identifiers, each identifier
> needs to be defined once, which is not streamlined enough.
> 
> So let "Compat" supports matching multiple identifiers for uncore PMU
> alias. For example, the Compat value {43401;436*} can match the PMU
> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
> the prefix "436", that is, all CMN650, where "*" is a wildcard.
> Tokens in Unit field are delimited by ';' with no spaces.

I wonder is there any possibility of supporting multiple values as a 
JSON array, rather than a single delimited string? Otherwise, if we're 
putting restrictions on what characters a driver can expose as an 
identifier, then I think that really wants explicitly documenting. 
AFAICT there's currently not even any documentation of the de-facto ABI 
that it's expected to be a free-form string rather than completely 
arbitrary binary data.

Thanks,
Robin.

> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>   tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>   tools/perf/util/pmu.h |  1 +
>   2 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index ad209c8..6402423 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>   	return res;
>   }
>   
> +bool pmu_uncore_identifier_match(const char *id, const char *compat)
> +{
> +	char *tmp = NULL, *tok, *str;
> +	bool res;
> +	int n;
> +
> +	/*
> +	 * The strdup() call is necessary here because "compat" is a const str*
> +	 * type and cannot be used as an argument to strtok_r().
> +	 */
> +	str = strdup(compat);
> +	if (!str)
> +		return false;
> +
> +	tok = strtok_r(str, ";", &tmp);
> +	for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
> +		n = strlen(tok);
> +		if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
> +		    !strcmp(id, tok)) {
> +			res = true;
> +			goto out;
> +		}
> +	}
> +	res = false;
> +out:
> +	free(str);
> +	return res;
> +}
> +
>   struct pmu_add_cpu_aliases_map_data {
>   	struct list_head *head;
>   	const char *name;
> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>   	if (!pe->compat || !pe->pmu)
>   		return 0;
>   
> -	if (!strcmp(pmu->id, pe->compat) &&
> -	    pmu_uncore_alias_match(pe->pmu, pmu->name)) {
> +	if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
> +	    pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>   		__perf_pmu__new_alias(idata->head, -1,
>   				      (char *)pe->name,
>   				      (char *)pe->desc,
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index b9a02de..9d4385d 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>   char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>   const struct pmu_events_table *pmu_events_table__find(void);
>   const struct pmu_metrics_table *pmu_metrics_table__find(void);
> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>   void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>   
>   int perf_pmu__convert_scale(const char *scale, char **end, double *sval);

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-25  4:11     ` Ian Rogers
  -1 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:11 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>
> The same PMU driver has different PMU identifiers due to different
> hardware versions and types, but they may have some common PMU event.
> Since a Compat value can only match one identifier, when adding the
> same event alias to PMUs with different identifiers, each identifier
> needs to be defined once, which is not streamlined enough.
>
> So let "Compat" supports matching multiple identifiers for uncore PMU
> alias. For example, the Compat value {43401;436*} can match the PMU
> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
> the prefix "436", that is, all CMN650, where "*" is a wildcard.
> Tokens in Unit field are delimited by ';' with no spaces.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>  tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>  tools/perf/util/pmu.h |  1 +
>  2 files changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index ad209c8..6402423 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>         return res;
>  }
>
> +bool pmu_uncore_identifier_match(const char *id, const char *compat)

static?

> +{
> +       char *tmp = NULL, *tok, *str;
> +       bool res;

Initialize to false to avoid the goto.

> +       int n;

Move into the scope of the for loop, to reduce the scope.

> +
> +       /*
> +        * The strdup() call is necessary here because "compat" is a const str*
> +        * type and cannot be used as an argument to strtok_r().
> +        */
> +       str = strdup(compat);
> +       if (!str)
> +               return false;
> +
> +       tok = strtok_r(str, ";", &tmp);
> +       for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
> +               n = strlen(tok);
> +               if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
> +                   !strcmp(id, tok)) {

We use fnmatch for a similar check:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1982

> +                       res = true;
> +                       goto out;

With "res=false;" above this can just be a regular break.

Thanks,
Ian

> +               }
> +       }
> +       res = false;
> +out:
> +       free(str);
> +       return res;
> +}
> +
>  struct pmu_add_cpu_aliases_map_data {
>         struct list_head *head;
>         const char *name;
> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>         if (!pe->compat || !pe->pmu)
>                 return 0;
>
> -       if (!strcmp(pmu->id, pe->compat) &&
> -           pmu_uncore_alias_match(pe->pmu, pmu->name)) {
> +       if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
> +           pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>                 __perf_pmu__new_alias(idata->head, -1,
>                                       (char *)pe->name,
>                                       (char *)pe->desc,
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index b9a02de..9d4385d 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>  char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>  const struct pmu_events_table *pmu_events_table__find(void);
>  const struct pmu_metrics_table *pmu_metrics_table__find(void);
> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>  void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>
>  int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
@ 2023-08-25  4:11     ` Ian Rogers
  0 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:11 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>
> The same PMU driver has different PMU identifiers due to different
> hardware versions and types, but they may have some common PMU event.
> Since a Compat value can only match one identifier, when adding the
> same event alias to PMUs with different identifiers, each identifier
> needs to be defined once, which is not streamlined enough.
>
> So let "Compat" supports matching multiple identifiers for uncore PMU
> alias. For example, the Compat value {43401;436*} can match the PMU
> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
> the prefix "436", that is, all CMN650, where "*" is a wildcard.
> Tokens in Unit field are delimited by ';' with no spaces.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>  tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>  tools/perf/util/pmu.h |  1 +
>  2 files changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index ad209c8..6402423 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>         return res;
>  }
>
> +bool pmu_uncore_identifier_match(const char *id, const char *compat)

static?

> +{
> +       char *tmp = NULL, *tok, *str;
> +       bool res;

Initialize to false to avoid the goto.

> +       int n;

Move into the scope of the for loop, to reduce the scope.

> +
> +       /*
> +        * The strdup() call is necessary here because "compat" is a const str*
> +        * type and cannot be used as an argument to strtok_r().
> +        */
> +       str = strdup(compat);
> +       if (!str)
> +               return false;
> +
> +       tok = strtok_r(str, ";", &tmp);
> +       for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
> +               n = strlen(tok);
> +               if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
> +                   !strcmp(id, tok)) {

We use fnmatch for a similar check:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1982

> +                       res = true;
> +                       goto out;

With "res=false;" above this can just be a regular break.

Thanks,
Ian

> +               }
> +       }
> +       res = false;
> +out:
> +       free(str);
> +       return res;
> +}
> +
>  struct pmu_add_cpu_aliases_map_data {
>         struct list_head *head;
>         const char *name;
> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>         if (!pe->compat || !pe->pmu)
>                 return 0;
>
> -       if (!strcmp(pmu->id, pe->compat) &&
> -           pmu_uncore_alias_match(pe->pmu, pmu->name)) {
> +       if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
> +           pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>                 __perf_pmu__new_alias(idata->head, -1,
>                                       (char *)pe->name,
>                                       (char *)pe->desc,
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index b9a02de..9d4385d 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>  char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>  const struct pmu_events_table *pmu_events_table__find(void);
>  const struct pmu_metrics_table *pmu_metrics_table__find(void);
> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>  void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>
>  int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
> --
> 1.8.3.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-25  4:13     ` Ian Rogers
  -1 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:13 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
> metrics which are general and compatible for any SoC with CMN-ANY.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  | 74 ++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> new file mode 100644
> index 0000000..64db534
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> @@ -0,0 +1,74 @@
> +[
> +       {
> +               "MetricName": "slc_miss_rate",
> +               "BriefDescription": "The system level cache miss rate.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"

Here a ';' is used as a separator, but for "Unit" ',' is used as a
separator. Is there a reason for the inconsistency?

Thanks,
Ian

> +       },
> +       {
> +               "MetricName": "hnf_message_retry_rate",
> +               "BriefDescription": "HN-F message retry rate indicates whether a lack of credits is causing the bottlenecks.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_pocq_retry / hnf_pocq_reqs_recvd",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "sf_hit_rate",
> +               "BriefDescription": "Snoop filter hit rate can be used to measure the snoop filter efficiency.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_sf_hit / hnf_slc_sf_cache_access",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "mc_message_retry_rate",
> +               "BriefDescription": "The memory controller request retries rate indicates whether the memory controller is the bottleneck.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_mc_retries / hnf_mc_reqs",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "rni_actual_read_bandwidth.all",
> +               "BriefDescription": "This event measure the actual bandwidth that RN-I bridge sends to the interconnect.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "rnid_rxdat_flits * 32 / 1e6 / duration_time",
> +               "ScaleUnit": "1MB/s",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "rni_actual_write_bandwidth.all",
> +               "BriefDescription": "This event measures the actual write bandwidth at RN-I bridges.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "rnid_txdat_flits * 32 / 1e6 / duration_time",
> +               "ScaleUnit": "1MB/s",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "rni_retry_rate",
> +               "BriefDescription": "RN-I bridge retry rate indicates whether the memory controller is the bottleneck.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "rnid_txreq_flits_retried / rnid_txreq_flits_total",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "sbsx_actual_write_bandwidth.all",
> +               "BriefDescription": "sbsx actual write bandwidth.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "sbsx_txdat_flitv * 32 / 1e6 / duration_time",
> +               "ScaleUnit": "1MB/s",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       }
> +]
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-08-25  4:13     ` Ian Rogers
  0 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:13 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
> metrics which are general and compatible for any SoC with CMN-ANY.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  | 74 ++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> new file mode 100644
> index 0000000..64db534
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> @@ -0,0 +1,74 @@
> +[
> +       {
> +               "MetricName": "slc_miss_rate",
> +               "BriefDescription": "The system level cache miss rate.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"

Here a ';' is used as a separator, but for "Unit" ',' is used as a
separator. Is there a reason for the inconsistency?

Thanks,
Ian

> +       },
> +       {
> +               "MetricName": "hnf_message_retry_rate",
> +               "BriefDescription": "HN-F message retry rate indicates whether a lack of credits is causing the bottlenecks.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_pocq_retry / hnf_pocq_reqs_recvd",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "sf_hit_rate",
> +               "BriefDescription": "Snoop filter hit rate can be used to measure the snoop filter efficiency.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_sf_hit / hnf_slc_sf_cache_access",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "mc_message_retry_rate",
> +               "BriefDescription": "The memory controller request retries rate indicates whether the memory controller is the bottleneck.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "hnf_mc_retries / hnf_mc_reqs",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "rni_actual_read_bandwidth.all",
> +               "BriefDescription": "This event measure the actual bandwidth that RN-I bridge sends to the interconnect.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "rnid_rxdat_flits * 32 / 1e6 / duration_time",
> +               "ScaleUnit": "1MB/s",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "rni_actual_write_bandwidth.all",
> +               "BriefDescription": "This event measures the actual write bandwidth at RN-I bridges.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "rnid_txdat_flits * 32 / 1e6 / duration_time",
> +               "ScaleUnit": "1MB/s",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "rni_retry_rate",
> +               "BriefDescription": "RN-I bridge retry rate indicates whether the memory controller is the bottleneck.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "rnid_txreq_flits_retried / rnid_txreq_flits_total",
> +               "ScaleUnit": "100%",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       },
> +       {
> +               "MetricName": "sbsx_actual_write_bandwidth.all",
> +               "BriefDescription": "sbsx actual write bandwidth.",
> +               "MetricGroup": "cmn",
> +               "MetricExpr": "sbsx_txdat_flitv * 32 / 1e6 / duration_time",
> +               "ScaleUnit": "1MB/s",
> +               "Unit": "arm_cmn",
> +               "Compat": "434*;436*;43c*;43a*"
> +       }
> +]
> --
> 1.8.3.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 2/8] perf metric: "Compat" supports matching multiple identifiers
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-25  4:16     ` Ian Rogers
  -1 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:16 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>
> The same PMU driver has different PMU identifiers due to different
> hardware versions and types, but they may have some common PMU metric.
> Since a Compat value can only match one identifier, when adding the
> same metric to PMUs with different identifiers, each identifier needs
> to be defined once, which is not streamlined enough.
>
> So let "Compat" supports matching multiple identifiers for uncore PMU
> metric.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/metricgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 5e9c657..ff81bc5 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -477,7 +477,7 @@ static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
>
>         while ((pmu = perf_pmu__scan(pmu))) {
>
> -               if (!pmu->id || strcmp(pmu->id, pm->compat))
> +               if (!pmu->id || !pmu_uncore_identifier_match(pmu->id, pm->compat))
>                         continue;
>
>                 return d->fn(pm, table, d->data);
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 2/8] perf metric: "Compat" supports matching multiple identifiers
@ 2023-08-25  4:16     ` Ian Rogers
  0 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:16 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>
> The same PMU driver has different PMU identifiers due to different
> hardware versions and types, but they may have some common PMU metric.
> Since a Compat value can only match one identifier, when adding the
> same metric to PMUs with different identifiers, each identifier needs
> to be defined once, which is not streamlined enough.
>
> So let "Compat" supports matching multiple identifiers for uncore PMU
> metric.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/metricgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 5e9c657..ff81bc5 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -477,7 +477,7 @@ static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
>
>         while ((pmu = perf_pmu__scan(pmu))) {
>
> -               if (!pmu->id || strcmp(pmu->id, pm->compat))
> +               if (!pmu->id || !pmu_uncore_identifier_match(pmu->id, pm->compat))
>                         continue;
>
>                 return d->fn(pm, table, d->data);
> --
> 1.8.3.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 3/8] perf vendor events: Supplement the omitted EventCode
  2023-08-21  8:36 ` [PATCH v7 3/8] perf vendor events: Supplement the omitted EventCode Jing Zhang
@ 2023-08-25  4:24   ` Ian Rogers
  2023-08-25  6:28     ` Jing Zhang
  0 siblings, 1 reply; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:24 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> If there is an "event=0" in the event description, the EventCode can
> be omitted in the JSON file, and jevent.py will automatically fill in
> "event=0" during parsing.
>
> However, for some events where EventCode and ConfigCode are missing,
> it is not necessary to automatically fill in "event=0", such as the
> CMN event description which is typically "type=xxx, eventid=xxx".
>
> Therefore, before modifying jevent.py to prevent it from automatically
> adding "event=0" by default, it is necessary to fill in all omitted
> EventCodes first.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>

As these files are generated, the generator script needs updating.
However, I don't think this change makes sense as the event=0 is
overwritten in the case of an arch_std event:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n369
So yes event=0 was filled in, but it was then overwritten.

Thanks,
Ian

> ---
>  tools/perf/pmu-events/arch/x86/alderlake/pipeline.json     |  9 +++++++++
>  tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json    |  3 +++
>  tools/perf/pmu-events/arch/x86/broadwell/pipeline.json     |  4 ++++
>  tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json   |  4 ++++
>  .../perf/pmu-events/arch/x86/broadwellde/uncore-cache.json |  2 ++
>  .../arch/x86/broadwellde/uncore-interconnect.json          |  1 +
>  .../pmu-events/arch/x86/broadwellde/uncore-memory.json     |  1 +
>  .../perf/pmu-events/arch/x86/broadwellde/uncore-power.json |  1 +
>  tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json    |  4 ++++
>  .../perf/pmu-events/arch/x86/broadwellx/uncore-cache.json  |  2 ++
>  .../arch/x86/broadwellx/uncore-interconnect.json           | 13 +++++++++++++
>  .../perf/pmu-events/arch/x86/broadwellx/uncore-memory.json |  2 ++
>  .../perf/pmu-events/arch/x86/broadwellx/uncore-power.json  |  1 +
>  tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json  |  4 ++++
>  .../pmu-events/arch/x86/cascadelakex/uncore-cache.json     |  2 ++
>  .../arch/x86/cascadelakex/uncore-interconnect.json         |  1 +
>  tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json |  1 +
>  .../pmu-events/arch/x86/cascadelakex/uncore-memory.json    |  1 +
>  .../pmu-events/arch/x86/cascadelakex/uncore-power.json     |  1 +
>  tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json   |  2 ++
>  tools/perf/pmu-events/arch/x86/goldmont/pipeline.json      |  3 +++
>  tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json  |  3 +++
>  tools/perf/pmu-events/arch/x86/grandridge/pipeline.json    |  3 +++
>  tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json |  4 ++++
>  tools/perf/pmu-events/arch/x86/haswell/pipeline.json       |  4 ++++
>  tools/perf/pmu-events/arch/x86/haswellx/pipeline.json      |  4 ++++
>  tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json  |  2 ++
>  .../pmu-events/arch/x86/haswellx/uncore-interconnect.json  | 14 ++++++++++++++
>  tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json |  2 ++
>  tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json  |  1 +
>  tools/perf/pmu-events/arch/x86/icelake/pipeline.json       |  4 ++++
>  tools/perf/pmu-events/arch/x86/icelakex/pipeline.json      |  4 ++++
>  tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json  |  1 +
>  .../pmu-events/arch/x86/icelakex/uncore-interconnect.json  |  1 +
>  tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json |  1 +
>  tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json  |  1 +
>  tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json     |  3 +++
>  tools/perf/pmu-events/arch/x86/ivytown/pipeline.json       |  4 ++++
>  tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json   |  2 ++
>  .../pmu-events/arch/x86/ivytown/uncore-interconnect.json   | 11 +++++++++++
>  tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json  |  1 +
>  tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json   |  1 +
>  tools/perf/pmu-events/arch/x86/jaketown/pipeline.json      |  4 ++++
>  tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json  |  2 ++
>  .../pmu-events/arch/x86/jaketown/uncore-interconnect.json  | 12 ++++++++++++
>  tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json |  1 +
>  tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json  |  2 ++
>  .../perf/pmu-events/arch/x86/knightslanding/pipeline.json  |  3 +++
>  .../pmu-events/arch/x86/knightslanding/uncore-cache.json   |  1 +
>  .../pmu-events/arch/x86/knightslanding/uncore-memory.json  |  4 ++++
>  tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json    |  8 ++++++++
>  tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json   |  4 ++++
>  .../perf/pmu-events/arch/x86/sapphirerapids/pipeline.json  |  5 +++++
>  tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json  |  4 ++++
>  tools/perf/pmu-events/arch/x86/silvermont/pipeline.json    |  3 +++
>  tools/perf/pmu-events/arch/x86/skylake/pipeline.json       |  4 ++++
>  tools/perf/pmu-events/arch/x86/skylakex/pipeline.json      |  4 ++++
>  tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json  |  2 ++
>  .../pmu-events/arch/x86/skylakex/uncore-interconnect.json  |  1 +
>  tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json     |  1 +
>  tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json |  1 +
>  tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json  |  1 +
>  tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json    |  2 ++
>  .../perf/pmu-events/arch/x86/snowridgex/uncore-cache.json  |  1 +
>  .../arch/x86/snowridgex/uncore-interconnect.json           |  1 +
>  .../perf/pmu-events/arch/x86/snowridgex/uncore-memory.json |  1 +
>  .../perf/pmu-events/arch/x86/snowridgex/uncore-power.json  |  1 +
>  tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json     |  5 +++++
>  68 files changed, 211 insertions(+)
>
> diff --git a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
> index cb5b861..7054426 100644
> --- a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
> @@ -489,6 +489,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.CORE",
>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
>          "SampleAfterValue": "2000003",
> @@ -550,6 +551,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>          "SampleAfterValue": "2000003",
> @@ -558,6 +560,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -584,6 +587,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
>          "SampleAfterValue": "2000003",
> @@ -592,6 +596,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -743,6 +748,7 @@
>      },
>      {
>          "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
> @@ -752,6 +758,7 @@
>      },
>      {
>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
> @@ -796,6 +803,7 @@
>      },
>      {
>          "BriefDescription": "Precise instruction retired with PEBS precise-distribution",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.PREC_DIST",
>          "PEBS": "1",
>          "PublicDescription": "A version of INST_RETIRED that allows for a precise distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR++) feature to fix bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> @@ -1160,6 +1168,7 @@
>      },
>      {
>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "TOPDOWN.SLOTS",
>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>          "SampleAfterValue": "10000003",
> diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
> index fa53ff1..345d1c8 100644
> --- a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
> @@ -211,6 +211,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.CORE",
>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
>          "SampleAfterValue": "2000003",
> @@ -225,6 +226,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>          "SampleAfterValue": "2000003",
> @@ -240,6 +242,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
> index 9a902d2..b114d0d 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
> @@ -336,6 +336,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -359,6 +360,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -366,6 +368,7 @@
>      },
>      {
>          "AnyThread": "1",
> +        "EventCode": "0x0",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
> @@ -514,6 +517,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
> index 9a902d2..ce90d058 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
> @@ -336,6 +336,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -359,6 +360,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -367,6 +369,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -514,6 +517,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
> index 56bba6d..117be19 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
> @@ -8,6 +8,7 @@
>      },
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_C_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CBOX"
> @@ -1501,6 +1502,7 @@
>      },
>      {
>          "BriefDescription": "uclks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_H_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
> index 8a327e0..ce54bd3 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
> @@ -19,6 +19,7 @@
>      },
>      {
>          "BriefDescription": "Clocks in the IRP",
> +        "EventCode": "0x0",
>          "EventName": "UNC_I_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Number of clocks in the IRP.",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
> index a764234..32c46bd 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
> @@ -131,6 +131,7 @@
>      },
>      {
>          "BriefDescription": "DRAM Clockticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_DCLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC"
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
> index 83d2013..f57eb8e 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "pclk Cycles",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
> index 9a902d2..ce90d058 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
> @@ -336,6 +336,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -359,6 +360,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -367,6 +369,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -514,6 +517,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
> index 400d784..346f5cf 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
> @@ -183,6 +183,7 @@
>      },
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_C_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CBOX"
> @@ -1689,6 +1690,7 @@
>      },
>      {
>          "BriefDescription": "uclks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_H_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
> index e61a23f..df96e41 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "Number of non data (control) flits transmitted . Derived from unc_q_txl_flits_g0.non_data",
> +        "EventCode": "0x0",
>          "EventName": "QPI_CTL_BANDWIDTH_TX",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
> @@ -10,6 +11,7 @@
>      },
>      {
>          "BriefDescription": "Number of data flits transmitted . Derived from unc_q_txl_flits_g0.data",
> +        "EventCode": "0x0",
>          "EventName": "QPI_DATA_BANDWIDTH_TX",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
> @@ -37,6 +39,7 @@
>      },
>      {
>          "BriefDescription": "Clocks in the IRP",
> +        "EventCode": "0x0",
>          "EventName": "UNC_I_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Number of clocks in the IRP.",
> @@ -1400,6 +1403,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
> @@ -1408,6 +1412,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
> @@ -1416,6 +1421,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
> @@ -1424,6 +1430,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
> +               "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
> @@ -1432,6 +1439,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
> @@ -1440,6 +1448,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
> @@ -1448,6 +1457,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
> @@ -1456,6 +1466,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
> @@ -1464,6 +1475,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
> @@ -3162,6 +3174,7 @@
>      },
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_S_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "SBOX"
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
> index b5a33e7a..0c5888d 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
> @@ -158,12 +158,14 @@
>      },
>      {
>          "BriefDescription": "Clockticks in the Memory Controller using one of the programmable counters",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_CLOCKTICKS_P",
>          "PerPkg": "1",
>          "Unit": "iMC"
>      },
>      {
>          "BriefDescription": "This event is deprecated. Refer to new event UNC_M_CLOCKTICKS_P",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_DCLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC"
> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
> index 83d2013..f57eb8e 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "pclk Cycles",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
> index 0f06e31..99346e1 100644
> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
> @@ -191,6 +191,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -222,6 +223,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -230,6 +232,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -369,6 +372,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
> index 2c88053..ba7a6f6 100644
> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
> @@ -512,6 +512,7 @@
>      },
>      {
>          "BriefDescription": "Uncore cache clock ticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_CHA_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts clockticks of the clock controlling the uncore caching and home agent (CHA).",
> @@ -5792,6 +5793,7 @@
>      },
>      {
>          "BriefDescription": "This event is deprecated. Refer to new event UNC_CHA_CLOCKTICKS",
> +        "EventCode": "0x0",
>          "Deprecated": "1",
>          "EventName": "UNC_C_CLOCKTICKS",
>          "PerPkg": "1",
> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
> index 725780f..43d7b24 100644
> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
> @@ -1090,6 +1090,7 @@
>      },
>      {
>          "BriefDescription": "Cycles - at UCLK",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M2M_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "M2M"
> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
> index 743c91f..377d54f 100644
> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
> @@ -1271,6 +1271,7 @@
>      },
>      {
>          "BriefDescription": "Counting disabled",
> +        "EventCode": "0x0",
>          "EventName": "UNC_IIO_NOTHING",
>          "PerPkg": "1",
>          "Unit": "IIO"
> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
> index f761856..77bb0ea 100644
> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
> @@ -167,6 +167,7 @@
>      },
>      {
>          "BriefDescription": "Memory controller clock ticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts clockticks of the fixed frequency clock of the memory controller using one of the programmable counters.",
> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
> index c6254af..a01b279 100644
> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "pclk Cycles",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> diff --git a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
> index 9dd8c90..3388cd5 100644
> --- a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
> @@ -150,6 +150,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>          "SampleAfterValue": "2000003",
> @@ -179,6 +180,7 @@
>      },
>      {
>          "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
> diff --git a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
> index acb8974..79806e7 100644
> --- a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
> @@ -143,6 +143,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when core is not halted  (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.CORE",
>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.  You cannot collect a PEBs record for this event.",
>          "SampleAfterValue": "2000003",
> @@ -165,6 +166,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when core is not halted  (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction.  In mobile systems the core frequency may change from time.  This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  This event uses fixed counter 2.  You cannot collect a PEBs record for this event.",
>          "SampleAfterValue": "2000003",
> @@ -187,6 +189,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.  This event uses fixed counter 0.  You cannot collect a PEBs record for this event.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
> index 33ef331..1be1b50 100644
> --- a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
> @@ -143,6 +143,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when core is not halted  (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.CORE",
>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.  You cannot collect a PEBs record for this event.",
>          "SampleAfterValue": "2000003",
> @@ -165,6 +166,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when core is not halted  (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction.  In mobile systems the core frequency may change from time.  This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  This event uses fixed counter 2.  You cannot collect a PEBs record for this event.",
>          "SampleAfterValue": "2000003",
> @@ -187,6 +189,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "2",
>          "PublicDescription": "Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.  This event uses fixed counter 0.  You cannot collect a PEBs record for this event.",
> diff --git a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
> index 4121295..5335a7b 100644
> --- a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
> @@ -29,6 +29,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x3"
> @@ -43,6 +44,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -55,6 +57,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> index 764c043..6ca34b9 100644
> --- a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> @@ -17,6 +17,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -32,6 +33,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -46,6 +48,7 @@
>      },
>      {
>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
> @@ -78,6 +81,7 @@
>      },
>      {
>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "TOPDOWN.SLOTS",
>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>          "SampleAfterValue": "10000003",
> diff --git a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
> index 540f437..0d5eafd 100644
> --- a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
> @@ -303,6 +303,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.",
>          "SampleAfterValue": "2000003",
> @@ -327,6 +328,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.",
>          "SampleAfterValue": "2000003",
> @@ -335,6 +337,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -436,6 +439,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "Errata": "HSD140, HSD143",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
> index 540f437..0d5eafd 100644
> --- a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
> @@ -303,6 +303,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.",
>          "SampleAfterValue": "2000003",
> @@ -327,6 +328,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.",
>          "SampleAfterValue": "2000003",
> @@ -335,6 +337,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -436,6 +439,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "Errata": "HSD140, HSD143",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
> index 9227cc2..64e2fb4 100644
> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
> @@ -183,6 +183,7 @@
>      },
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_C_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CBOX"
> @@ -1698,6 +1699,7 @@
>      },
>      {
>          "BriefDescription": "uclks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_H_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
> index 954e8198..7c4fc13 100644
> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "Number of non data (control) flits transmitted . Derived from unc_q_txl_flits_g0.non_data",
> +        "EventCode": "0x0",
>          "EventName": "QPI_CTL_BANDWIDTH_TX",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
> @@ -10,6 +11,7 @@
>      },
>      {
>          "BriefDescription": "Number of data flits transmitted . Derived from unc_q_txl_flits_g0.data",
> +        "EventCode": "0x0",
>          "EventName": "QPI_DATA_BANDWIDTH_TX",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
> @@ -37,6 +39,7 @@
>      },
>      {
>          "BriefDescription": "Clocks in the IRP",
> +        "EventCode": "0x0",
>          "EventName": "UNC_I_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Number of clocks in the IRP.",
> @@ -1401,6 +1404,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
> @@ -1409,6 +1413,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
> @@ -1417,6 +1422,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
> @@ -1425,6 +1431,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
> @@ -1433,6 +1440,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
> @@ -1441,6 +1449,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
> @@ -1449,6 +1458,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
> @@ -1457,6 +1467,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
> @@ -1465,6 +1476,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
> @@ -3136,6 +3148,7 @@
>      },
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_S_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "SBOX"
> @@ -3823,6 +3836,7 @@
>      },
>      {
>          "BriefDescription": "UNC_U_CLOCKTICKS",
> +        "EventCode": "0x0",
>          "EventName": "UNC_U_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "UBOX"
> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
> index c005f51..124c3ae 100644
> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
> @@ -151,12 +151,14 @@
>      },
>      {
>          "BriefDescription": "DRAM Clockticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC"
>      },
>      {
>          "BriefDescription": "DRAM Clockticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_DCLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC"
> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
> index daebf10..9276058 100644
> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "pclk Cycles",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> diff --git a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
> index 154fee4..0789412 100644
> --- a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
> @@ -193,6 +193,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -208,6 +209,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -359,6 +361,7 @@
>      },
>      {
>          "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.PREC_DIST",
>          "PEBS": "1",
>          "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> @@ -562,6 +565,7 @@
>      },
>      {
>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "TOPDOWN.SLOTS",
>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>          "SampleAfterValue": "10000003",
> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
> index 442a4c7..9cfb341 100644
> --- a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
> @@ -193,6 +193,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -208,6 +209,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -359,6 +361,7 @@
>      },
>      {
>          "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.PREC_DIST",
>          "PEBS": "1",
>          "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> @@ -544,6 +547,7 @@
>      },
>      {
>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "TOPDOWN.SLOTS",
>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>          "SampleAfterValue": "10000003",
> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
> index b6ce14e..ae57663 100644
> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
> @@ -892,6 +892,7 @@
>      },
>      {
>          "BriefDescription": "Clockticks of the uncore caching and home agent (CHA)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_CHA_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CHA"
> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
> index 8ac5907..1b821b6 100644
> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
> @@ -1419,6 +1419,7 @@
>      },
>      {
>          "BriefDescription": "Clockticks of the mesh to memory (M2M)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M2M_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "M2M"
> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
> index 814d959..b0b2f27 100644
> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
> @@ -100,6 +100,7 @@
>      },
>      {
>          "BriefDescription": "DRAM Clockticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC"
> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
> index ee4dac6..9c4cd59 100644
> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "Clockticks of the power control unit (PCU)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Clockticks of the power control unit (PCU) : The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
> index 30a3da9..2df2d21 100644
> --- a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
> @@ -326,6 +326,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x3"
> @@ -348,6 +349,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -355,6 +357,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "PublicDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
> index 30a3da9..6f6f281 100644
> --- a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
> @@ -326,6 +326,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x3"
> @@ -348,6 +349,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -355,6 +357,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "PublicDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>          "SampleAfterValue": "2000003",
> @@ -510,6 +513,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x1"
> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
> index 8bf2706..31e58fb 100644
> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_C_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CBOX"
> @@ -1533,6 +1534,7 @@
>      },
>      {
>          "BriefDescription": "uclks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_H_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
> index ccf45153..f2492ec7 100644
> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
> @@ -109,6 +109,7 @@
>      },
>      {
>          "BriefDescription": "Clocks in the IRP",
> +        "EventCode": "0x0",
>          "EventName": "UNC_I_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Number of clocks in the IRP.",
> @@ -1522,6 +1523,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
> @@ -1530,6 +1532,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
> @@ -1538,6 +1541,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
> @@ -1546,6 +1550,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
> @@ -1554,6 +1559,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
> @@ -1562,6 +1568,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
> @@ -1570,6 +1577,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
> @@ -1578,6 +1586,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
> @@ -1586,6 +1595,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
> @@ -3104,6 +3114,7 @@
>      },
>      {
>          "EventName": "UNC_U_CLOCKTICKS",
> +        "EventCode": "0x0",
>          "PerPkg": "1",
>          "Unit": "UBOX"
>      },
> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
> index 6550934..869a320 100644
> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
> @@ -131,6 +131,7 @@
>      },
>      {
>          "BriefDescription": "DRAM Clockticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_DCLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC"
> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
> index 5df1ebf..0a5d0c3 100644
> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "pclk Cycles",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
> index d0edfde..76b515d 100644
> --- a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
> @@ -329,6 +329,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -351,6 +352,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -359,6 +361,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -432,6 +435,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
> index 63395e7e..160f1c4 100644
> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_C_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CBOX"
> @@ -863,6 +864,7 @@
>      },
>      {
>          "BriefDescription": "uclks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_H_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
> index 874f15e..45f2966 100644
> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
> @@ -109,6 +109,7 @@
>      },
>      {
>          "BriefDescription": "Clocks in the IRP",
> +        "EventCode": "0x0",
>          "EventName": "UNC_I_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Number of clocks in the IRP.",
> @@ -847,6 +848,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
> @@ -855,6 +857,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Idle and Null Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.IDLE",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
> @@ -863,6 +866,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
> @@ -871,6 +875,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
> @@ -879,6 +884,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
> @@ -887,6 +893,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
> @@ -895,6 +902,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
> @@ -903,6 +911,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
> @@ -911,6 +920,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
> @@ -919,6 +929,7 @@
>      },
>      {
>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> +        "EventCode": "0x0",
>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
> @@ -1576,6 +1587,7 @@
>      },
>      {
>          "EventName": "UNC_U_CLOCKTICKS",
> +        "EventCode": "0x0",
>          "PerPkg": "1",
>          "Unit": "UBOX"
>      },
> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
> index 6dcc9415..2385b0a 100644
> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
> @@ -65,6 +65,7 @@
>      },
>      {
>          "BriefDescription": "uclks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Uncore Fixed Counter - uclks",
> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
> index b3ee5d7..f453afd 100644
> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "pclk Cycles",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> @@ -216,6 +217,7 @@
>      },
>      {
>          "BriefDescription": "Cycles spent changing Frequency",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_FREQ_TRANS_CYCLES",
>          "PerPkg": "1",
>          "PublicDescription": "Counts the number of cycles when the system is changing frequency.  This can not be filtered by thread ID.  One can also use it with the occupancy counter that monitors number of threads in C0 to estimate the performance impact that frequency transitions had on the system.",
> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
> index 3dc5321..a74d45a 100644
> --- a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
> @@ -150,12 +150,14 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x3"
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter",
>          "SampleAfterValue": "2000003",
> @@ -177,6 +179,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions that retire.  For instructions that consist of multiple micro-ops, this event counts exactly once, as the last micro-op of the instruction retires.  The event continues counting while instructions retire, including during interrupt service routines caused by hardware interrupts, faults or traps.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
> index 1b8dcfa..c062253 100644
> --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
> @@ -3246,6 +3246,7 @@
>      },
>      {
>          "BriefDescription": "Uncore Clocks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_H_U_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CHA"
> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> index fb75297..3575baa 100644
> --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> @@ -41,6 +41,7 @@
>      },
>      {
>          "BriefDescription": "ECLK count",
> +        "EventCode": "0x0",
>          "EventName": "UNC_E_E_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "EDC_ECLK"
> @@ -55,6 +56,7 @@
>      },
>      {
>          "BriefDescription": "UCLK count",
> +        "EventCode": "0x0",
>          "EventName": "UNC_E_U_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "EDC_UCLK"
> @@ -93,12 +95,14 @@
>      },
>      {
>          "BriefDescription": "DCLK count",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_D_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC_DCLK"
>      },
>      {
>          "BriefDescription": "UCLK count",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_U_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "iMC_UCLK"
> diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> index 6397894..0de3572 100644
> --- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> @@ -37,6 +37,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.CORE",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2",
> @@ -51,6 +52,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x3",
> @@ -58,6 +60,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -75,6 +78,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2",
> @@ -82,6 +86,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -105,6 +110,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "SampleAfterValue": "2000003",
> @@ -113,6 +119,7 @@
>      },
>      {
>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
> @@ -157,6 +164,7 @@
>      },
>      {
>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "TOPDOWN.SLOTS",
>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>          "SampleAfterValue": "10000003",
> diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
> index ecaf94c..973a5f4 100644
> --- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
> @@ -337,6 +337,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -359,6 +360,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -367,6 +369,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -440,6 +443,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
> index 72e9bdfa..ada2c34 100644
> --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
> @@ -284,6 +284,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -299,6 +300,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -426,6 +428,7 @@
>      },
>      {
>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
> @@ -457,6 +460,7 @@
>      },
>      {
>          "BriefDescription": "Precise instruction retired with PEBS precise-distribution",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.PREC_DIST",
>          "PEBS": "1",
>          "PublicDescription": "A version of INST_RETIRED that allows for a precise distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR++) feature to fix bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> @@ -719,6 +723,7 @@
>      },
>      {
>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "TOPDOWN.SLOTS",
>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>          "SampleAfterValue": "10000003",
> diff --git a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
> index 4121295..67be689 100644
> --- a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
> @@ -17,6 +17,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.CORE",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -29,6 +30,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x3"
> @@ -43,6 +45,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -55,6 +58,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
> index 2d4214b..6423c01 100644
> --- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
> @@ -143,6 +143,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.CORE",
>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.  The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. In systems with a constant core frequency, this event can give you a measurement of the elapsed time while the core was not in halt state by dividing the event count by the core frequency. This event is architecturally defined and is a designated fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use the core frequency which may change from time to time.  CPU_CLK_UNHALTE.REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  The fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the programmable events are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.",
>          "SampleAfterValue": "2000003",
> @@ -165,6 +166,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.  The core frequency may change from time. This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  Divide this event count by core frequency to determine the elapsed time while the core was not in halt state.  Divide this event count by core frequency to determine the elapsed time while the core was not in halt state.  This event is architecturally defined and is a designated fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use the core frequency which may change from time to time.  CPU_CLK_UNHALTE.REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  The fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the programmable events are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.",
>          "SampleAfterValue": "2000003",
> @@ -180,6 +182,7 @@
>      },
>      {
>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "This event counts the number of instructions that retire.  For instructions that consist of multiple micro-ops, this event counts exactly once, as the last micro-op of the instruction retires.  The event continues counting while instructions retire, including during interrupt service routines caused by hardware interrupts, faults or traps.  Background: Modern microprocessors employ extensive pipelining and speculative techniques.  Since sometimes an instruction is started but never completed, the notion of \"retirement\" is introduced.  A retired instruction is one that commits its states. Or stated differently, an instruction might be abandoned at some point. No instruction is truly finished until it retires.  This counter measures the number of completed instructions.  The fixed event is INST_RETIRED.ANY and the programmable event is INST_RETIRED.ANY_P.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
> index 2dfc3af..53f1381 100644
> --- a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
> @@ -182,6 +182,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -213,6 +214,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -221,6 +223,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -360,6 +363,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
> index 0f06e31..99346e1 100644
> --- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
> @@ -191,6 +191,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -222,6 +223,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -230,6 +232,7 @@
>      {
>          "AnyThread": "1",
>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>          "SampleAfterValue": "2000003",
>          "UMask": "0x2"
> @@ -369,6 +372,7 @@
>      },
>      {
>          "BriefDescription": "Instructions retired from execution.",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>          "SampleAfterValue": "2000003",
> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
> index 543dfc1..4df1294 100644
> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
> @@ -460,6 +460,7 @@
>      },
>      {
>          "BriefDescription": "Clockticks of the uncore caching & home agent (CHA)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_CHA_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts clockticks of the clock controlling the uncore caching and home agent (CHA).",
> @@ -5678,6 +5679,7 @@
>      {
>          "BriefDescription": "This event is deprecated. Refer to new event UNC_CHA_CLOCKTICKS",
>          "Deprecated": "1",
> +        "EventCode": "0x0",
>          "EventName": "UNC_C_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CHA"
> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
> index 26a5a20..40f609c 100644
> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
> @@ -1090,6 +1090,7 @@
>      },
>      {
>          "BriefDescription": "Cycles - at UCLK",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M2M_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "M2M"
> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
> index 2a3a709..21a6a0f 100644
> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
> @@ -1271,6 +1271,7 @@
>      },
>      {
>          "BriefDescription": "Counting disabled",
> +        "EventCode": "0x0",
>          "EventName": "UNC_IIO_NOTHING",
>          "PerPkg": "1",
>          "Unit": "IIO"
> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
> index 6f8ff22..a7ce916 100644
> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
> @@ -167,6 +167,7 @@
>      },
>      {
>          "BriefDescription": "Memory controller clock ticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Counts clockticks of the fixed frequency clock of the memory controller using one of the programmable counters.",
> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
> index c6254af..a01b279 100644
> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "pclk Cycles",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
> index 9dd8c90..3388cd5 100644
> --- a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
> @@ -150,6 +150,7 @@
>      },
>      {
>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>          "SampleAfterValue": "2000003",
> @@ -179,6 +180,7 @@
>      },
>      {
>          "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
> index a68a5bb..279381b 100644
> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
> @@ -872,6 +872,7 @@
>      },
>      {
>          "BriefDescription": "Uncore cache clock ticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_CHA_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "CHA"
> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
> index de38400..399536f 100644
> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
> @@ -1419,6 +1419,7 @@
>      },
>      {
>          "BriefDescription": "Clockticks of the mesh to memory (M2M)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M2M_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "M2M"
> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> index 530e9b71..b24ba35 100644
> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> @@ -120,6 +120,7 @@
>      },
>      {
>          "BriefDescription": "Memory controller clock ticks",
> +        "EventCode": "0x0",
>          "EventName": "UNC_M_CLOCKTICKS",
>          "PerPkg": "1",
>          "PublicDescription": "Clockticks of the integrated memory controller (IMC)",
> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> index 27fc155..5c04d6e 100644
> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> @@ -1,6 +1,7 @@
>  [
>      {
>          "BriefDescription": "Clockticks of the power control unit (PCU)",
> +        "EventCode": "0x0",
>          "EventName": "UNC_P_CLOCKTICKS",
>          "PerPkg": "1",
>          "Unit": "PCU"
> diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
> index a0aeeb8..54a81f9 100644
> --- a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
> +++ b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
> @@ -193,6 +193,7 @@
>      },
>      {
>          "BriefDescription": "Reference cycles when the core is not in halt state.",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>          "SampleAfterValue": "2000003",
> @@ -208,6 +209,7 @@
>      },
>      {
>          "BriefDescription": "Core cycles when the thread is not in halt state",
> +        "EventCode": "0x0",
>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>          "SampleAfterValue": "2000003",
> @@ -352,6 +354,7 @@
>      },
>      {
>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.ANY",
>          "PEBS": "1",
>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
> @@ -377,6 +380,7 @@
>      },
>      {
>          "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
> +        "EventCode": "0x0",
>          "EventName": "INST_RETIRED.PREC_DIST",
>          "PEBS": "1",
>          "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> @@ -569,6 +573,7 @@
>      },
>      {
>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
> +        "EventCode": "0x0",
>          "EventName": "TOPDOWN.SLOTS",
>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>          "SampleAfterValue": "10000003",
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 5/8] perf test: Make matching_pmu effective
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-25  4:27     ` Ian Rogers
  -1 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:27 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> The perf_pmu_test_event.matching_pmu didn't work. No matter what its
> value is, it does not affect the test results. So let matching_pmu be
> used for matching perf_pmu_test_pmu.pmu.name.

Could you rebase this onto the latest perf-tools-next, I'd like to test this.

Thanks,
Ian

> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>  tools/perf/tests/pmu-events.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
> index 1dff863b..3204252 100644
> --- a/tools/perf/tests/pmu-events.c
> +++ b/tools/perf/tests/pmu-events.c
> @@ -238,7 +238,7 @@ struct perf_pmu_test_pmu {
>         },
>         .alias_str = "event=0x2b",
>         .alias_long_desc = "ddr write-cycles event. Unit: uncore_sys_ddr_pmu ",
> -       .matching_pmu = "uncore_sys_ddr_pmu",
> +       .matching_pmu = "uncore_sys_ddr_pmu0",
>  };
>
>  static const struct perf_pmu_test_event sys_ccn_pmu_read_cycles = {
> @@ -252,7 +252,7 @@ struct perf_pmu_test_pmu {
>         },
>         .alias_str = "config=0x2c",
>         .alias_long_desc = "ccn read-cycles event. Unit: uncore_sys_ccn_pmu ",
> -       .matching_pmu = "uncore_sys_ccn_pmu",
> +       .matching_pmu = "uncore_sys_ccn_pmu4",
>  };
>
>  static const struct perf_pmu_test_event *sys_events[] = {
> @@ -599,6 +599,11 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>                         struct pmu_event const *event = &test_event->event;
>
>                         if (!strcmp(event->name, alias->name)) {
> +                               if (strcmp(pmu_name, test_event->matching_pmu)) {
> +                                       pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s vs %s\n",
> +                                                       pmu_name, test_event->matching_pmu, pmu_name);
> +                                       continue;
> +                               }
>                                 if (compare_alias_to_test_event(alias,
>                                                         test_event,
>                                                         pmu_name)) {
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 5/8] perf test: Make matching_pmu effective
@ 2023-08-25  4:27     ` Ian Rogers
  0 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:27 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> The perf_pmu_test_event.matching_pmu didn't work. No matter what its
> value is, it does not affect the test results. So let matching_pmu be
> used for matching perf_pmu_test_pmu.pmu.name.

Could you rebase this onto the latest perf-tools-next, I'd like to test this.

Thanks,
Ian

> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
>  tools/perf/tests/pmu-events.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
> index 1dff863b..3204252 100644
> --- a/tools/perf/tests/pmu-events.c
> +++ b/tools/perf/tests/pmu-events.c
> @@ -238,7 +238,7 @@ struct perf_pmu_test_pmu {
>         },
>         .alias_str = "event=0x2b",
>         .alias_long_desc = "ddr write-cycles event. Unit: uncore_sys_ddr_pmu ",
> -       .matching_pmu = "uncore_sys_ddr_pmu",
> +       .matching_pmu = "uncore_sys_ddr_pmu0",
>  };
>
>  static const struct perf_pmu_test_event sys_ccn_pmu_read_cycles = {
> @@ -252,7 +252,7 @@ struct perf_pmu_test_pmu {
>         },
>         .alias_str = "config=0x2c",
>         .alias_long_desc = "ccn read-cycles event. Unit: uncore_sys_ccn_pmu ",
> -       .matching_pmu = "uncore_sys_ccn_pmu",
> +       .matching_pmu = "uncore_sys_ccn_pmu4",
>  };
>
>  static const struct perf_pmu_test_event *sys_events[] = {
> @@ -599,6 +599,11 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>                         struct pmu_event const *event = &test_event->event;
>
>                         if (!strcmp(event->name, alias->name)) {
> +                               if (strcmp(pmu_name, test_event->matching_pmu)) {
> +                                       pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s vs %s\n",
> +                                                       pmu_name, test_event->matching_pmu, pmu_name);
> +                                       continue;
> +                               }
>                                 if (compare_alias_to_test_event(alias,
>                                                         test_event,
>                                                         pmu_name)) {
> --
> 1.8.3.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field.
  2023-08-21  8:36   ` Jing Zhang
@ 2023-08-25  4:30     ` Ian Rogers
  -1 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:30 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> Add new event test for uncore system event which is used to verify the
> functionality of "Compat" matching multiple identifiers and the new event
> fields "EventIdCode" and "Type".
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>

Thanks for the tests! I've no issue with them beside the already
mentioned ';'. This will need updating for:
https://lore.kernel.org/lkml/20230824183212.374787-1-irogers@google.com/
https://lore.kernel.org/lkml/20230825024002.801955-1-irogers@google.com/

Thanks,
Ian

> ---
>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |  8 ++++
>  tools/perf/pmu-events/empty-pmu-events.c           |  8 ++++
>  tools/perf/tests/pmu-events.c                      | 55 ++++++++++++++++++++++
>  3 files changed, 71 insertions(+)
>
> diff --git a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
> index c7e7528..06b886d 100644
> --- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
> +++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
> @@ -12,5 +12,13 @@
>             "EventName": "sys_ccn_pmu.read_cycles",
>             "Unit": "sys_ccn_pmu",
>             "Compat": "0x01"
> +   },
> +   {
> +           "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
> +           "NodeType": "0x05",
> +           "EventIdCode": "0x01",
> +           "EventName": "sys_cmn_pmu.hnf_cache_miss",
> +           "Unit": "sys_cmn_pmu",
> +           "Compat": "434*;436*;43c*;43a01"
>     }
>  ]
> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> index e74defb..25be18a 100644
> --- a/tools/perf/pmu-events/empty-pmu-events.c
> +++ b/tools/perf/pmu-events/empty-pmu-events.c
> @@ -245,6 +245,14 @@ struct pmu_events_map {
>                 .pmu = "uncore_sys_ccn_pmu",
>         },
>         {
> +               .name = "sys_cmn_pmu.hnf_cache_miss",
> +               .event = "type=0x05,eventid=0x01",
> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
> +               .compat = "434*;436*;43c*;43a01",
> +               .topic = "uncore",
> +               .pmu = "uncore_sys_cmn_pmu",
> +       },
> +       {
>                 .name = 0,
>                 .event = 0,
>                 .desc = 0,
> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
> index 3204252..79fb3e2 100644
> --- a/tools/perf/tests/pmu-events.c
> +++ b/tools/perf/tests/pmu-events.c
> @@ -255,9 +255,24 @@ struct perf_pmu_test_pmu {
>         .matching_pmu = "uncore_sys_ccn_pmu4",
>  };
>
> +static const struct perf_pmu_test_event sys_cmn_pmu_hnf_cache_miss = {
> +       .event = {
> +               .name = "sys_cmn_pmu.hnf_cache_miss",
> +               .event = "type=0x05,eventid=0x01",
> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
> +               .topic = "uncore",
> +               .pmu = "uncore_sys_cmn_pmu",
> +               .compat = "434*;436*;43c*;43a01",
> +       },
> +       .alias_str = "type=0x5,eventid=0x1",
> +       .alias_long_desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
> +       .matching_pmu = "uncore_sys_cmn_pmu0",
> +};
> +
>  static const struct perf_pmu_test_event *sys_events[] = {
>         &sys_ddr_pmu_write_cycles,
>         &sys_ccn_pmu_read_cycles,
> +       &sys_cmn_pmu_hnf_cache_miss,
>         NULL
>  };
>
> @@ -704,6 +719,46 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>                         &sys_ccn_pmu_read_cycles,
>                 },
>         },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43401",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43602",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43c03",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43a01",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       }
>  };
>
>  /* Test that aliases generated are as expected */
> --
> 1.8.3.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field.
@ 2023-08-25  4:30     ` Ian Rogers
  0 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:30 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue

On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> Add new event test for uncore system event which is used to verify the
> functionality of "Compat" matching multiple identifiers and the new event
> fields "EventIdCode" and "Type".
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>

Thanks for the tests! I've no issue with them beside the already
mentioned ';'. This will need updating for:
https://lore.kernel.org/lkml/20230824183212.374787-1-irogers@google.com/
https://lore.kernel.org/lkml/20230825024002.801955-1-irogers@google.com/

Thanks,
Ian

> ---
>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |  8 ++++
>  tools/perf/pmu-events/empty-pmu-events.c           |  8 ++++
>  tools/perf/tests/pmu-events.c                      | 55 ++++++++++++++++++++++
>  3 files changed, 71 insertions(+)
>
> diff --git a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
> index c7e7528..06b886d 100644
> --- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
> +++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
> @@ -12,5 +12,13 @@
>             "EventName": "sys_ccn_pmu.read_cycles",
>             "Unit": "sys_ccn_pmu",
>             "Compat": "0x01"
> +   },
> +   {
> +           "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
> +           "NodeType": "0x05",
> +           "EventIdCode": "0x01",
> +           "EventName": "sys_cmn_pmu.hnf_cache_miss",
> +           "Unit": "sys_cmn_pmu",
> +           "Compat": "434*;436*;43c*;43a01"
>     }
>  ]
> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> index e74defb..25be18a 100644
> --- a/tools/perf/pmu-events/empty-pmu-events.c
> +++ b/tools/perf/pmu-events/empty-pmu-events.c
> @@ -245,6 +245,14 @@ struct pmu_events_map {
>                 .pmu = "uncore_sys_ccn_pmu",
>         },
>         {
> +               .name = "sys_cmn_pmu.hnf_cache_miss",
> +               .event = "type=0x05,eventid=0x01",
> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
> +               .compat = "434*;436*;43c*;43a01",
> +               .topic = "uncore",
> +               .pmu = "uncore_sys_cmn_pmu",
> +       },
> +       {
>                 .name = 0,
>                 .event = 0,
>                 .desc = 0,
> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
> index 3204252..79fb3e2 100644
> --- a/tools/perf/tests/pmu-events.c
> +++ b/tools/perf/tests/pmu-events.c
> @@ -255,9 +255,24 @@ struct perf_pmu_test_pmu {
>         .matching_pmu = "uncore_sys_ccn_pmu4",
>  };
>
> +static const struct perf_pmu_test_event sys_cmn_pmu_hnf_cache_miss = {
> +       .event = {
> +               .name = "sys_cmn_pmu.hnf_cache_miss",
> +               .event = "type=0x05,eventid=0x01",
> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
> +               .topic = "uncore",
> +               .pmu = "uncore_sys_cmn_pmu",
> +               .compat = "434*;436*;43c*;43a01",
> +       },
> +       .alias_str = "type=0x5,eventid=0x1",
> +       .alias_long_desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
> +       .matching_pmu = "uncore_sys_cmn_pmu0",
> +};
> +
>  static const struct perf_pmu_test_event *sys_events[] = {
>         &sys_ddr_pmu_write_cycles,
>         &sys_ccn_pmu_read_cycles,
> +       &sys_cmn_pmu_hnf_cache_miss,
>         NULL
>  };
>
> @@ -704,6 +719,46 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>                         &sys_ccn_pmu_read_cycles,
>                 },
>         },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43401",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43602",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43c03",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       },
> +       {
> +               .pmu = {
> +                       .name = (char *)"uncore_sys_cmn_pmu0",
> +                       .is_uncore = 1,
> +                       .id = (char *)"43a01",
> +               },
> +               .aliases = {
> +                       &sys_cmn_pmu_hnf_cache_miss,
> +               },
> +       }
>  };
>
>  /* Test that aliases generated are as expected */
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 4/8] perf jevents: Support more event fields
  2023-08-23  9:12     ` Robin Murphy
@ 2023-08-25  4:42       ` Ian Rogers
  -1 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jing Zhang, John Garry, Will Deacon, James Clark,
	Arnaldo Carvalho de Melo, Mark Rutland, Mike Leach, Leo Yan,
	Namhyung Kim, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, linux-kernel, linux-arm-kernel,
	linux-perf-users, linux-doc, Zhuo Song, Shuai Xue

On Wed, Aug 23, 2023 at 2:12 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2023-08-21 09:36, Jing Zhang wrote:
> > The previous code assumes an event has either an "event=" or "config"
> > field at the beginning. For CMN neither of these may be present, as an
> > event is typically "type=xx,eventid=xxx".
> >
> > If EventCode and ConfigCode is not added in the alias JSON file, the
> > event description will add "event=0" by default. So, even if the event
> > field is added "eventid=xxx" and "type=xxx", the CMN events final
> > parsing result will be "event=0, eventid=xxx, type=xxx".
> >
> > Therefore, when EventCode and ConfigCode are missing in JSON, "event=0"
> > is no longer added by default. And add EventIdCode and Type to the event
> > field.
> >
> > I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
> > they are consistent.
> >
> > Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> > ---
> >   tools/perf/pmu-events/jevents.py | 20 ++++++++++++++++----
> >   1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> > index f57a8f2..369c8bf 100755
> > --- a/tools/perf/pmu-events/jevents.py
> > +++ b/tools/perf/pmu-events/jevents.py
> > @@ -275,11 +275,14 @@ class JsonEvent:
> >         }
> >         return table[unit] if unit in table else f'uncore_{unit.lower()}'
> >
> > -    eventcode = 0
> > +    eventcode = None
> >       if 'EventCode' in jd:
> >         eventcode = int(jd['EventCode'].split(',', 1)[0], 0)
> >       if 'ExtSel' in jd:
> > -      eventcode |= int(jd['ExtSel']) << 8
> > +      if eventcode is None:
> > +        eventcode = int(jd['ExtSel']) << 8
> > +      else:
> > +        eventcode |= int(jd['ExtSel']) << 8
> >       configcode = int(jd['ConfigCode'], 0) if 'ConfigCode' in jd else None
> >       self.name = jd['EventName'].lower() if 'EventName' in jd else None
> >       self.topic = ''
> > @@ -317,7 +320,11 @@ class JsonEvent:
> >       if precise and self.desc and '(Precise Event)' not in self.desc:
> >         extra_desc += ' (Must be precise)' if precise == '2' else (' (Precise '
> >                                                                    'event)')
> > -    event = f'config={llx(configcode)}' if configcode is not None else f'event={llx(eventcode)}'
> > +    event = None
> > +    if eventcode is not None:
> > +      event = f'event={llx(eventcode)}'
> > +    elif configcode is not None:
> > +      event = f'config={llx(configcode)}'
> >       event_fields = [
> >           ('AnyThread', 'any='),
> >           ('PortMask', 'ch_mask='),
> > @@ -327,10 +334,15 @@ class JsonEvent:
> >           ('Invert', 'inv='),
> >           ('SampleAfterValue', 'period='),
> >           ('UMask', 'umask='),
> > +        ('NodeType', 'type='),
> > +        ('EventIdCode', 'eventid='),
>
> FWIW, this smells like another brewing scalability problem, given that
> these are entirely driver-specific. Not sure off-hand how feasible it
> might be, but my instinct says that a neat solution would be to encode
> them right in the JSON, e.g.:
>
>         "FormatAttr": { "type": 0x5 }
>
> such that jevents should then only really need to consider whether an
> event is defined in terms of a raw "ConfigCode", one or more
> "FormatAttr"s which it can then parse dynamically, or reasonable special
> cases like "EventCode" (given how "event" is one of the most commonly
> used formats).

Hi Robin,

I'm not sure about scalability but I think it is a problem that we
encode names into the event that should correspond with formats, but
we don't test that the PMU driver in question supports the format. If
we tar up sysfs directories I think we can check/test this and it
makes sense for the perf tool to be able to parse a sysfs type
directory structure.

I think the hard coded names and matching to dictionary entries is
suboptimal, we should really be able to know the formats, abstract
this, etc. The code is deliberately done this way so that we could
migrate away from a legacy C version of this code and generate an
identical pmu-events.c. As much as is possible the python code matches
the C code, but now the transition has happened there is no reason to
maintain this behavior.

Thanks,
Ian

> Thanks,
> Robin.
>
> >       ]
> >       for key, value in event_fields:
> >         if key in jd and jd[key] != '0':
> > -        event += ',' + value + jd[key]
> > +        if event:
> > +          event += ',' + value + jd[key]
> > +        else:
> > +          event = value + jd[key]
> >       if filter:
> >         event += f',{filter}'
> >       if msr:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 4/8] perf jevents: Support more event fields
@ 2023-08-25  4:42       ` Ian Rogers
  0 siblings, 0 replies; 57+ messages in thread
From: Ian Rogers @ 2023-08-25  4:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jing Zhang, John Garry, Will Deacon, James Clark,
	Arnaldo Carvalho de Melo, Mark Rutland, Mike Leach, Leo Yan,
	Namhyung Kim, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, linux-kernel, linux-arm-kernel,
	linux-perf-users, linux-doc, Zhuo Song, Shuai Xue

On Wed, Aug 23, 2023 at 2:12 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2023-08-21 09:36, Jing Zhang wrote:
> > The previous code assumes an event has either an "event=" or "config"
> > field at the beginning. For CMN neither of these may be present, as an
> > event is typically "type=xx,eventid=xxx".
> >
> > If EventCode and ConfigCode is not added in the alias JSON file, the
> > event description will add "event=0" by default. So, even if the event
> > field is added "eventid=xxx" and "type=xxx", the CMN events final
> > parsing result will be "event=0, eventid=xxx, type=xxx".
> >
> > Therefore, when EventCode and ConfigCode are missing in JSON, "event=0"
> > is no longer added by default. And add EventIdCode and Type to the event
> > field.
> >
> > I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
> > they are consistent.
> >
> > Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> > ---
> >   tools/perf/pmu-events/jevents.py | 20 ++++++++++++++++----
> >   1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> > index f57a8f2..369c8bf 100755
> > --- a/tools/perf/pmu-events/jevents.py
> > +++ b/tools/perf/pmu-events/jevents.py
> > @@ -275,11 +275,14 @@ class JsonEvent:
> >         }
> >         return table[unit] if unit in table else f'uncore_{unit.lower()}'
> >
> > -    eventcode = 0
> > +    eventcode = None
> >       if 'EventCode' in jd:
> >         eventcode = int(jd['EventCode'].split(',', 1)[0], 0)
> >       if 'ExtSel' in jd:
> > -      eventcode |= int(jd['ExtSel']) << 8
> > +      if eventcode is None:
> > +        eventcode = int(jd['ExtSel']) << 8
> > +      else:
> > +        eventcode |= int(jd['ExtSel']) << 8
> >       configcode = int(jd['ConfigCode'], 0) if 'ConfigCode' in jd else None
> >       self.name = jd['EventName'].lower() if 'EventName' in jd else None
> >       self.topic = ''
> > @@ -317,7 +320,11 @@ class JsonEvent:
> >       if precise and self.desc and '(Precise Event)' not in self.desc:
> >         extra_desc += ' (Must be precise)' if precise == '2' else (' (Precise '
> >                                                                    'event)')
> > -    event = f'config={llx(configcode)}' if configcode is not None else f'event={llx(eventcode)}'
> > +    event = None
> > +    if eventcode is not None:
> > +      event = f'event={llx(eventcode)}'
> > +    elif configcode is not None:
> > +      event = f'config={llx(configcode)}'
> >       event_fields = [
> >           ('AnyThread', 'any='),
> >           ('PortMask', 'ch_mask='),
> > @@ -327,10 +334,15 @@ class JsonEvent:
> >           ('Invert', 'inv='),
> >           ('SampleAfterValue', 'period='),
> >           ('UMask', 'umask='),
> > +        ('NodeType', 'type='),
> > +        ('EventIdCode', 'eventid='),
>
> FWIW, this smells like another brewing scalability problem, given that
> these are entirely driver-specific. Not sure off-hand how feasible it
> might be, but my instinct says that a neat solution would be to encode
> them right in the JSON, e.g.:
>
>         "FormatAttr": { "type": 0x5 }
>
> such that jevents should then only really need to consider whether an
> event is defined in terms of a raw "ConfigCode", one or more
> "FormatAttr"s which it can then parse dynamically, or reasonable special
> cases like "EventCode" (given how "event" is one of the most commonly
> used formats).

Hi Robin,

I'm not sure about scalability but I think it is a problem that we
encode names into the event that should correspond with formats, but
we don't test that the PMU driver in question supports the format. If
we tar up sysfs directories I think we can check/test this and it
makes sense for the perf tool to be able to parse a sysfs type
directory structure.

I think the hard coded names and matching to dictionary entries is
suboptimal, we should really be able to know the formats, abstract
this, etc. The code is deliberately done this way so that we could
migrate away from a legacy C version of this code and generate an
identical pmu-events.c. As much as is possible the python code matches
the C code, but now the transition has happened there is no reason to
maintain this behavior.

Thanks,
Ian

> Thanks,
> Robin.
>
> >       ]
> >       for key, value in event_fields:
> >         if key in jd and jd[key] != '0':
> > -        event += ',' + value + jd[key]
> > +        if event:
> > +          event += ',' + value + jd[key]
> > +        else:
> > +          event = value + jd[key]
> >       if filter:
> >         event += f',{filter}'
> >       if msr:

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
  2023-08-25  4:11     ` Ian Rogers
@ 2023-08-25  6:12       ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:12 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:11, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>>
>> The same PMU driver has different PMU identifiers due to different
>> hardware versions and types, but they may have some common PMU event.
>> Since a Compat value can only match one identifier, when adding the
>> same event alias to PMUs with different identifiers, each identifier
>> needs to be defined once, which is not streamlined enough.
>>
>> So let "Compat" supports matching multiple identifiers for uncore PMU
>> alias. For example, the Compat value {43401;436*} can match the PMU
>> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
>> the prefix "436", that is, all CMN650, where "*" is a wildcard.
>> Tokens in Unit field are delimited by ';' with no spaces.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>  tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>>  tools/perf/util/pmu.h |  1 +
>>  2 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index ad209c8..6402423 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>>         return res;
>>  }
>>
>> +bool pmu_uncore_identifier_match(const char *id, const char *compat)
> 
> static?
> 

This function needs to be called in utils/metricgroup.c, so it cannot be static.

>> +{
>> +       char *tmp = NULL, *tok, *str;
>> +       bool res;
> 
> Initialize to false to avoid the goto.
> 

ok,no problem.

>> +       int n;
> 
> Move into the scope of the for loop, to reduce the scope.
> 

ok

>> +
>> +       /*
>> +        * The strdup() call is necessary here because "compat" is a const str*
>> +        * type and cannot be used as an argument to strtok_r().
>> +        */
>> +       str = strdup(compat);
>> +       if (!str)
>> +               return false;
>> +
>> +       tok = strtok_r(str, ";", &tmp);
>> +       for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
>> +               n = strlen(tok);
>> +               if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
>> +                   !strcmp(id, tok)) {
> 
> We use fnmatch for a similar check:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1982
> 

ok

>> +                       res = true;
>> +                       goto out;
> 
> With "res=false;" above this can just be a regular break.
> 

ok, thank you!

> Thanks,
> Ian
> 
>> +               }
>> +       }
>> +       res = false;
>> +out:
>> +       free(str);
>> +       return res;
>> +}
>> +
>>  struct pmu_add_cpu_aliases_map_data {
>>         struct list_head *head;
>>         const char *name;
>> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>>         if (!pe->compat || !pe->pmu)
>>                 return 0;
>>
>> -       if (!strcmp(pmu->id, pe->compat) &&
>> -           pmu_uncore_alias_match(pe->pmu, pmu->name)) {
>> +       if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
>> +           pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>>                 __perf_pmu__new_alias(idata->head, -1,
>>                                       (char *)pe->name,
>>                                       (char *)pe->desc,
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index b9a02de..9d4385d 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>>  char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>>  const struct pmu_events_table *pmu_events_table__find(void);
>>  const struct pmu_metrics_table *pmu_metrics_table__find(void);
>> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>>  void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>>
>>  int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
@ 2023-08-25  6:12       ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:12 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:11, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>>
>> The same PMU driver has different PMU identifiers due to different
>> hardware versions and types, but they may have some common PMU event.
>> Since a Compat value can only match one identifier, when adding the
>> same event alias to PMUs with different identifiers, each identifier
>> needs to be defined once, which is not streamlined enough.
>>
>> So let "Compat" supports matching multiple identifiers for uncore PMU
>> alias. For example, the Compat value {43401;436*} can match the PMU
>> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
>> the prefix "436", that is, all CMN650, where "*" is a wildcard.
>> Tokens in Unit field are delimited by ';' with no spaces.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>  tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>>  tools/perf/util/pmu.h |  1 +
>>  2 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index ad209c8..6402423 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>>         return res;
>>  }
>>
>> +bool pmu_uncore_identifier_match(const char *id, const char *compat)
> 
> static?
> 

This function needs to be called in utils/metricgroup.c, so it cannot be static.

>> +{
>> +       char *tmp = NULL, *tok, *str;
>> +       bool res;
> 
> Initialize to false to avoid the goto.
> 

ok,no problem.

>> +       int n;
> 
> Move into the scope of the for loop, to reduce the scope.
> 

ok

>> +
>> +       /*
>> +        * The strdup() call is necessary here because "compat" is a const str*
>> +        * type and cannot be used as an argument to strtok_r().
>> +        */
>> +       str = strdup(compat);
>> +       if (!str)
>> +               return false;
>> +
>> +       tok = strtok_r(str, ";", &tmp);
>> +       for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
>> +               n = strlen(tok);
>> +               if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
>> +                   !strcmp(id, tok)) {
> 
> We use fnmatch for a similar check:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1982
> 

ok

>> +                       res = true;
>> +                       goto out;
> 
> With "res=false;" above this can just be a regular break.
> 

ok, thank you!

> Thanks,
> Ian
> 
>> +               }
>> +       }
>> +       res = false;
>> +out:
>> +       free(str);
>> +       return res;
>> +}
>> +
>>  struct pmu_add_cpu_aliases_map_data {
>>         struct list_head *head;
>>         const char *name;
>> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>>         if (!pe->compat || !pe->pmu)
>>                 return 0;
>>
>> -       if (!strcmp(pmu->id, pe->compat) &&
>> -           pmu_uncore_alias_match(pe->pmu, pmu->name)) {
>> +       if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
>> +           pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>>                 __perf_pmu__new_alias(idata->head, -1,
>>                                       (char *)pe->name,
>>                                       (char *)pe->desc,
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index b9a02de..9d4385d 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>>  char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>>  const struct pmu_events_table *pmu_events_table__find(void);
>>  const struct pmu_metrics_table *pmu_metrics_table__find(void);
>> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>>  void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>>
>>  int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
>> --
>> 1.8.3.1
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 3/8] perf vendor events: Supplement the omitted EventCode
  2023-08-25  4:24   ` Ian Rogers
@ 2023-08-25  6:28     ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:28 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:24, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> If there is an "event=0" in the event description, the EventCode can
>> be omitted in the JSON file, and jevent.py will automatically fill in
>> "event=0" during parsing.
>>
>> However, for some events where EventCode and ConfigCode are missing,
>> it is not necessary to automatically fill in "event=0", such as the
>> CMN event description which is typically "type=xxx, eventid=xxx".
>>
>> Therefore, before modifying jevent.py to prevent it from automatically
>> adding "event=0" by default, it is necessary to fill in all omitted
>> EventCodes first.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> 
> As these files are generated, the generator script needs updating.
> However, I don't think this change makes sense as the event=0 is
> overwritten in the case of an arch_std event:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n369
> So yes event=0 was filled in, but it was then overwritten.
> 

Yes, arch_std_event will indeed be overwritten, but the events I added EventCode are not arch_std_event,
and the x86 architecture does not define arch_std_event. I used JEVENT_ARCH=all to compile and check,
there is no problem. In patch 4, I modified jevent.py, no longer adding "event=0" by default, and comparing
the pmu_event.c before and after, they are consistent.

Thanks,
Jing

> Thanks,
> Ian
> 
>> ---
>>  tools/perf/pmu-events/arch/x86/alderlake/pipeline.json     |  9 +++++++++
>>  tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json    |  3 +++
>>  tools/perf/pmu-events/arch/x86/broadwell/pipeline.json     |  4 ++++
>>  tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json   |  4 ++++
>>  .../perf/pmu-events/arch/x86/broadwellde/uncore-cache.json |  2 ++
>>  .../arch/x86/broadwellde/uncore-interconnect.json          |  1 +
>>  .../pmu-events/arch/x86/broadwellde/uncore-memory.json     |  1 +
>>  .../perf/pmu-events/arch/x86/broadwellde/uncore-power.json |  1 +
>>  tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json    |  4 ++++
>>  .../perf/pmu-events/arch/x86/broadwellx/uncore-cache.json  |  2 ++
>>  .../arch/x86/broadwellx/uncore-interconnect.json           | 13 +++++++++++++
>>  .../perf/pmu-events/arch/x86/broadwellx/uncore-memory.json |  2 ++
>>  .../perf/pmu-events/arch/x86/broadwellx/uncore-power.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json  |  4 ++++
>>  .../pmu-events/arch/x86/cascadelakex/uncore-cache.json     |  2 ++
>>  .../arch/x86/cascadelakex/uncore-interconnect.json         |  1 +
>>  tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json |  1 +
>>  .../pmu-events/arch/x86/cascadelakex/uncore-memory.json    |  1 +
>>  .../pmu-events/arch/x86/cascadelakex/uncore-power.json     |  1 +
>>  tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json   |  2 ++
>>  tools/perf/pmu-events/arch/x86/goldmont/pipeline.json      |  3 +++
>>  tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json  |  3 +++
>>  tools/perf/pmu-events/arch/x86/grandridge/pipeline.json    |  3 +++
>>  tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json |  4 ++++
>>  tools/perf/pmu-events/arch/x86/haswell/pipeline.json       |  4 ++++
>>  tools/perf/pmu-events/arch/x86/haswellx/pipeline.json      |  4 ++++
>>  tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json  |  2 ++
>>  .../pmu-events/arch/x86/haswellx/uncore-interconnect.json  | 14 ++++++++++++++
>>  tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json |  2 ++
>>  tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/icelake/pipeline.json       |  4 ++++
>>  tools/perf/pmu-events/arch/x86/icelakex/pipeline.json      |  4 ++++
>>  tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json  |  1 +
>>  .../pmu-events/arch/x86/icelakex/uncore-interconnect.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json |  1 +
>>  tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json     |  3 +++
>>  tools/perf/pmu-events/arch/x86/ivytown/pipeline.json       |  4 ++++
>>  tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json   |  2 ++
>>  .../pmu-events/arch/x86/ivytown/uncore-interconnect.json   | 11 +++++++++++
>>  tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json   |  1 +
>>  tools/perf/pmu-events/arch/x86/jaketown/pipeline.json      |  4 ++++
>>  tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json  |  2 ++
>>  .../pmu-events/arch/x86/jaketown/uncore-interconnect.json  | 12 ++++++++++++
>>  tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json |  1 +
>>  tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json  |  2 ++
>>  .../perf/pmu-events/arch/x86/knightslanding/pipeline.json  |  3 +++
>>  .../pmu-events/arch/x86/knightslanding/uncore-cache.json   |  1 +
>>  .../pmu-events/arch/x86/knightslanding/uncore-memory.json  |  4 ++++
>>  tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json    |  8 ++++++++
>>  tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json   |  4 ++++
>>  .../perf/pmu-events/arch/x86/sapphirerapids/pipeline.json  |  5 +++++
>>  tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json  |  4 ++++
>>  tools/perf/pmu-events/arch/x86/silvermont/pipeline.json    |  3 +++
>>  tools/perf/pmu-events/arch/x86/skylake/pipeline.json       |  4 ++++
>>  tools/perf/pmu-events/arch/x86/skylakex/pipeline.json      |  4 ++++
>>  tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json  |  2 ++
>>  .../pmu-events/arch/x86/skylakex/uncore-interconnect.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json     |  1 +
>>  tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json |  1 +
>>  tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json    |  2 ++
>>  .../perf/pmu-events/arch/x86/snowridgex/uncore-cache.json  |  1 +
>>  .../arch/x86/snowridgex/uncore-interconnect.json           |  1 +
>>  .../perf/pmu-events/arch/x86/snowridgex/uncore-memory.json |  1 +
>>  .../perf/pmu-events/arch/x86/snowridgex/uncore-power.json  |  1 +
>>  tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json     |  5 +++++
>>  68 files changed, 211 insertions(+)
>>
>> diff --git a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
>> index cb5b861..7054426 100644
>> --- a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
>> @@ -489,6 +489,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.CORE",
>>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
>>          "SampleAfterValue": "2000003",
>> @@ -550,6 +551,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>>          "SampleAfterValue": "2000003",
>> @@ -558,6 +560,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -584,6 +587,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
>>          "SampleAfterValue": "2000003",
>> @@ -592,6 +596,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -743,6 +748,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
>> @@ -752,6 +758,7 @@
>>      },
>>      {
>>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
>> @@ -796,6 +803,7 @@
>>      },
>>      {
>>          "BriefDescription": "Precise instruction retired with PEBS precise-distribution",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.PREC_DIST",
>>          "PEBS": "1",
>>          "PublicDescription": "A version of INST_RETIRED that allows for a precise distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR++) feature to fix bias in how retired instructions get sampled. Use on Fixed Counter 0.",
>> @@ -1160,6 +1168,7 @@
>>      },
>>      {
>>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "TOPDOWN.SLOTS",
>>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>>          "SampleAfterValue": "10000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
>> index fa53ff1..345d1c8 100644
>> --- a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
>> @@ -211,6 +211,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.CORE",
>>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1.",
>>          "SampleAfterValue": "2000003",
>> @@ -225,6 +226,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>>          "SampleAfterValue": "2000003",
>> @@ -240,6 +242,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted core clock cycles. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
>> index 9a902d2..b114d0d 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
>> @@ -336,6 +336,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -359,6 +360,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -366,6 +368,7 @@
>>      },
>>      {
>>          "AnyThread": "1",
>> +        "EventCode": "0x0",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>> @@ -514,6 +517,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
>> index 9a902d2..ce90d058 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
>> @@ -336,6 +336,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -359,6 +360,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -367,6 +369,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -514,6 +517,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
>> index 56bba6d..117be19 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
>> @@ -8,6 +8,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_C_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CBOX"
>> @@ -1501,6 +1502,7 @@
>>      },
>>      {
>>          "BriefDescription": "uclks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_H_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
>> index 8a327e0..ce54bd3 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
>> @@ -19,6 +19,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clocks in the IRP",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_I_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Number of clocks in the IRP.",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
>> index a764234..32c46bd 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
>> @@ -131,6 +131,7 @@
>>      },
>>      {
>>          "BriefDescription": "DRAM Clockticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_DCLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC"
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
>> index 83d2013..f57eb8e 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "pclk Cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
>> index 9a902d2..ce90d058 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
>> @@ -336,6 +336,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. \nNote: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  This event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -359,6 +360,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -367,6 +369,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -514,6 +517,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. \nCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
>> index 400d784..346f5cf 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
>> @@ -183,6 +183,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_C_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CBOX"
>> @@ -1689,6 +1690,7 @@
>>      },
>>      {
>>          "BriefDescription": "uclks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_H_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
>> index e61a23f..df96e41 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "Number of non data (control) flits transmitted . Derived from unc_q_txl_flits_g0.non_data",
>> +        "EventCode": "0x0",
>>          "EventName": "QPI_CTL_BANDWIDTH_TX",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
>> @@ -10,6 +11,7 @@
>>      },
>>      {
>>          "BriefDescription": "Number of data flits transmitted . Derived from unc_q_txl_flits_g0.data",
>> +        "EventCode": "0x0",
>>          "EventName": "QPI_DATA_BANDWIDTH_TX",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
>> @@ -37,6 +39,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clocks in the IRP",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_I_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Number of clocks in the IRP.",
>> @@ -1400,6 +1403,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
>> @@ -1408,6 +1412,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
>> @@ -1416,6 +1421,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
>> @@ -1424,6 +1430,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
>> +               "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
>> @@ -1432,6 +1439,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
>> @@ -1440,6 +1448,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
>> @@ -1448,6 +1457,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
>> @@ -1456,6 +1466,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
>> @@ -1464,6 +1475,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
>> @@ -3162,6 +3174,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_S_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "SBOX"
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
>> index b5a33e7a..0c5888d 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
>> @@ -158,12 +158,14 @@
>>      },
>>      {
>>          "BriefDescription": "Clockticks in the Memory Controller using one of the programmable counters",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_CLOCKTICKS_P",
>>          "PerPkg": "1",
>>          "Unit": "iMC"
>>      },
>>      {
>>          "BriefDescription": "This event is deprecated. Refer to new event UNC_M_CLOCKTICKS_P",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_DCLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC"
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
>> index 83d2013..f57eb8e 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "pclk Cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
>> index 0f06e31..99346e1 100644
>> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
>> @@ -191,6 +191,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -222,6 +223,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -230,6 +232,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -369,6 +372,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
>> index 2c88053..ba7a6f6 100644
>> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
>> @@ -512,6 +512,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore cache clock ticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_CHA_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts clockticks of the clock controlling the uncore caching and home agent (CHA).",
>> @@ -5792,6 +5793,7 @@
>>      },
>>      {
>>          "BriefDescription": "This event is deprecated. Refer to new event UNC_CHA_CLOCKTICKS",
>> +        "EventCode": "0x0",
>>          "Deprecated": "1",
>>          "EventName": "UNC_C_CLOCKTICKS",
>>          "PerPkg": "1",
>> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
>> index 725780f..43d7b24 100644
>> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
>> @@ -1090,6 +1090,7 @@
>>      },
>>      {
>>          "BriefDescription": "Cycles - at UCLK",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M2M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "M2M"
>> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
>> index 743c91f..377d54f 100644
>> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
>> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
>> @@ -1271,6 +1271,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counting disabled",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_IIO_NOTHING",
>>          "PerPkg": "1",
>>          "Unit": "IIO"
>> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
>> index f761856..77bb0ea 100644
>> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
>> @@ -167,6 +167,7 @@
>>      },
>>      {
>>          "BriefDescription": "Memory controller clock ticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts clockticks of the fixed frequency clock of the memory controller using one of the programmable counters.",
>> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
>> index c6254af..a01b279 100644
>> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "pclk Cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> diff --git a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
>> index 9dd8c90..3388cd5 100644
>> --- a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
>> @@ -150,6 +150,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>>          "SampleAfterValue": "2000003",
>> @@ -179,6 +180,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
>> diff --git a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
>> index acb8974..79806e7 100644
>> --- a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
>> @@ -143,6 +143,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when core is not halted  (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.CORE",
>>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.  You cannot collect a PEBs record for this event.",
>>          "SampleAfterValue": "2000003",
>> @@ -165,6 +166,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when core is not halted  (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction.  In mobile systems the core frequency may change from time.  This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  This event uses fixed counter 2.  You cannot collect a PEBs record for this event.",
>>          "SampleAfterValue": "2000003",
>> @@ -187,6 +189,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.  This event uses fixed counter 0.  You cannot collect a PEBs record for this event.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
>> index 33ef331..1be1b50 100644
>> --- a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
>> @@ -143,6 +143,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when core is not halted  (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.CORE",
>>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state.  The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time.  This event uses fixed counter 1.  You cannot collect a PEBs record for this event.",
>>          "SampleAfterValue": "2000003",
>> @@ -165,6 +166,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when core is not halted  (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction.  In mobile systems the core frequency may change from time.  This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  This event uses fixed counter 2.  You cannot collect a PEBs record for this event.",
>>          "SampleAfterValue": "2000003",
>> @@ -187,6 +189,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "2",
>>          "PublicDescription": "Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.  This event uses fixed counter 0.  You cannot collect a PEBs record for this event.",
>> diff --git a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
>> index 4121295..5335a7b 100644
>> --- a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
>> @@ -29,6 +29,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x3"
>> @@ -43,6 +44,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -55,6 +57,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
>> index 764c043..6ca34b9 100644
>> --- a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
>> @@ -17,6 +17,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -32,6 +33,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -46,6 +48,7 @@
>>      },
>>      {
>>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
>> @@ -78,6 +81,7 @@
>>      },
>>      {
>>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "TOPDOWN.SLOTS",
>>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>>          "SampleAfterValue": "10000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
>> index 540f437..0d5eafd 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
>> @@ -303,6 +303,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.",
>>          "SampleAfterValue": "2000003",
>> @@ -327,6 +328,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.",
>>          "SampleAfterValue": "2000003",
>> @@ -335,6 +337,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -436,6 +439,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "Errata": "HSD140, HSD143",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
>> index 540f437..0d5eafd 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
>> @@ -303,6 +303,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.",
>>          "SampleAfterValue": "2000003",
>> @@ -327,6 +328,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.",
>>          "SampleAfterValue": "2000003",
>> @@ -335,6 +337,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -436,6 +439,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "Errata": "HSD140, HSD143",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
>> index 9227cc2..64e2fb4 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
>> @@ -183,6 +183,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_C_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CBOX"
>> @@ -1698,6 +1699,7 @@
>>      },
>>      {
>>          "BriefDescription": "uclks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_H_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
>> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
>> index 954e8198..7c4fc13 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "Number of non data (control) flits transmitted . Derived from unc_q_txl_flits_g0.non_data",
>> +        "EventCode": "0x0",
>>          "EventName": "QPI_CTL_BANDWIDTH_TX",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
>> @@ -10,6 +11,7 @@
>>      },
>>      {
>>          "BriefDescription": "Number of data flits transmitted . Derived from unc_q_txl_flits_g0.data",
>> +        "EventCode": "0x0",
>>          "EventName": "QPI_DATA_BANDWIDTH_TX",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
>> @@ -37,6 +39,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clocks in the IRP",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_I_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Number of clocks in the IRP.",
>> @@ -1401,6 +1404,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
>> @@ -1409,6 +1413,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
>> @@ -1417,6 +1422,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
>> @@ -1425,6 +1431,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
>> @@ -1433,6 +1440,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
>> @@ -1441,6 +1449,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
>> @@ -1449,6 +1458,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
>> @@ -1457,6 +1467,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
>> @@ -1465,6 +1476,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
>> @@ -3136,6 +3148,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_S_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "SBOX"
>> @@ -3823,6 +3836,7 @@
>>      },
>>      {
>>          "BriefDescription": "UNC_U_CLOCKTICKS",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_U_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "UBOX"
>> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
>> index c005f51..124c3ae 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
>> @@ -151,12 +151,14 @@
>>      },
>>      {
>>          "BriefDescription": "DRAM Clockticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC"
>>      },
>>      {
>>          "BriefDescription": "DRAM Clockticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_DCLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC"
>> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
>> index daebf10..9276058 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "pclk Cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> diff --git a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
>> index 154fee4..0789412 100644
>> --- a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
>> @@ -193,6 +193,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -208,6 +209,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -359,6 +361,7 @@
>>      },
>>      {
>>          "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.PREC_DIST",
>>          "PEBS": "1",
>>          "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
>> @@ -562,6 +565,7 @@
>>      },
>>      {
>>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "TOPDOWN.SLOTS",
>>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>>          "SampleAfterValue": "10000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
>> index 442a4c7..9cfb341 100644
>> --- a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
>> @@ -193,6 +193,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -208,6 +209,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -359,6 +361,7 @@
>>      },
>>      {
>>          "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.PREC_DIST",
>>          "PEBS": "1",
>>          "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
>> @@ -544,6 +547,7 @@
>>      },
>>      {
>>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "TOPDOWN.SLOTS",
>>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>>          "SampleAfterValue": "10000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
>> index b6ce14e..ae57663 100644
>> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
>> @@ -892,6 +892,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clockticks of the uncore caching and home agent (CHA)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_CHA_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CHA"
>> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
>> index 8ac5907..1b821b6 100644
>> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
>> @@ -1419,6 +1419,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clockticks of the mesh to memory (M2M)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M2M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "M2M"
>> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
>> index 814d959..b0b2f27 100644
>> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
>> @@ -100,6 +100,7 @@
>>      },
>>      {
>>          "BriefDescription": "DRAM Clockticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC"
>> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
>> index ee4dac6..9c4cd59 100644
>> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "Clockticks of the power control unit (PCU)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Clockticks of the power control unit (PCU) : The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
>> index 30a3da9..2df2d21 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
>> @@ -326,6 +326,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x3"
>> @@ -348,6 +349,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -355,6 +357,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "PublicDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
>> index 30a3da9..6f6f281 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
>> @@ -326,6 +326,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x3"
>> @@ -348,6 +349,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -355,6 +357,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "PublicDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>>          "SampleAfterValue": "2000003",
>> @@ -510,6 +513,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x1"
>> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
>> index 8bf2706..31e58fb 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_C_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CBOX"
>> @@ -1533,6 +1534,7 @@
>>      },
>>      {
>>          "BriefDescription": "uclks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_H_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
>> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
>> index ccf45153..f2492ec7 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
>> @@ -109,6 +109,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clocks in the IRP",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_I_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Number of clocks in the IRP.",
>> @@ -1522,6 +1523,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QPI.  Each flit contains 64b of data.  This includes both DRS and NCB data flits (coherent and non-coherent).  This can be used to calculate the data bandwidth of the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This does not include the header flits that go in data packets.",
>> @@ -1530,6 +1532,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transmitted across QPI.  This basically tracks the protocol overhead on the QPI link.  One can get a good picture of the QPI-link characteristics by evaluating the protocol flits, data flits, and idle/null flits.  This includes the header flits for data packets.",
>> @@ -1538,6 +1541,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.",
>> @@ -1546,6 +1550,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the data flits (not the header).",
>> @@ -1554,6 +1559,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the total number of protocol flits transmitted over QPI on the DRS (Data Response) channel.  DRS flits are used to transmit data with coherency.  This does not count data flits transmitted over the NCB channel which transmits non-coherent data.  This includes only the header flits (not the data).  This includes extended headers.",
>> @@ -1562,6 +1568,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of flits transmitted over QPI on the home channel.",
>> @@ -1570,6 +1577,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of non-request flits transmitted over QPI on the home channel.  These are most commonly snoop responses, and this event can be used as a proxy for that.",
>> @@ -1578,6 +1586,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of data request transmitted over QPI on the home channel.  This basically counts the number of remote memory requests transmitted over QPI.  In conjunction with the local read count in the Home Agent, one can calculate the number of LLC Misses.",
>> @@ -1586,6 +1595,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three groups that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each flit is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four fits, each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as data bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual data and an additional 16 bits of other information.  To calculate data bandwidth, one should therefore do: data flits * 8B / time.; Counts the number of snoop request flits transmitted over QPI.  These requests are contained in the snoop channel.  This does not include snoop responses, which are transmitted on the home channel.",
>> @@ -3104,6 +3114,7 @@
>>      },
>>      {
>>          "EventName": "UNC_U_CLOCKTICKS",
>> +        "EventCode": "0x0",
>>          "PerPkg": "1",
>>          "Unit": "UBOX"
>>      },
>> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
>> index 6550934..869a320 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
>> @@ -131,6 +131,7 @@
>>      },
>>      {
>>          "BriefDescription": "DRAM Clockticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_DCLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC"
>> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
>> index 5df1ebf..0a5d0c3 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "pclk Cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
>> index d0edfde..76b515d 100644
>> --- a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
>> @@ -329,6 +329,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -351,6 +352,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -359,6 +361,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -432,6 +435,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
>> index 63395e7e..160f1c4 100644
>> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_C_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CBOX"
>> @@ -863,6 +864,7 @@
>>      },
>>      {
>>          "BriefDescription": "uclks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_H_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of uclks in the HA.  This will be slightly different than the count in the Ubox because of enable/freeze delays.  The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the QPI Agent.",
>> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
>> index 874f15e..45f2966 100644
>> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
>> @@ -109,6 +109,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clocks in the IRP",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_I_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Number of clocks in the IRP.",
>> @@ -847,6 +848,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
>> @@ -855,6 +857,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Idle and Null Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.IDLE",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
>> @@ -863,6 +866,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 0; Non-Data protocol Tx Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  It includes filters for Idle, protocol, and Data Flits.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L0p.",
>> @@ -871,6 +875,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both Header and Data)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
>> @@ -879,6 +884,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
>> @@ -887,6 +893,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
>> @@ -895,6 +902,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
>> @@ -903,6 +911,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
>> @@ -911,6 +920,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; HOM Request Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
>> @@ -919,6 +929,7 @@
>>      },
>>      {
>>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of flits transmitted across the QPI Link.  This is one of three 'groups' that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'flit' is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) mode, flits are made up of four 'fits', each of which contains 20 bits of data (along with some additional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as many fits to transmit a flit.  When one talks about QPI 'speed' (for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note that this is not the same as 'data' bandwidth.  For example, when we are transferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual 'data' and an additional 16 bits of other information.  To calculate 'data' bandwidth, one should therefore do: data flits * 8B / time.",
>> @@ -1576,6 +1587,7 @@
>>      },
>>      {
>>          "EventName": "UNC_U_CLOCKTICKS",
>> +        "EventCode": "0x0",
>>          "PerPkg": "1",
>>          "Unit": "UBOX"
>>      },
>> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
>> index 6dcc9415..2385b0a 100644
>> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
>> @@ -65,6 +65,7 @@
>>      },
>>      {
>>          "BriefDescription": "uclks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Uncore Fixed Counter - uclks",
>> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
>> index b3ee5d7..f453afd 100644
>> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "pclk Cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> @@ -216,6 +217,7 @@
>>      },
>>      {
>>          "BriefDescription": "Cycles spent changing Frequency",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_FREQ_TRANS_CYCLES",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts the number of cycles when the system is changing frequency.  This can not be filtered by thread ID.  One can also use it with the occupancy counter that monitors number of threads in C0 to estimate the performance impact that frequency transitions had on the system.",
>> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
>> index 3dc5321..a74d45a 100644
>> --- a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
>> @@ -150,12 +150,14 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x3"
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter",
>>          "SampleAfterValue": "2000003",
>> @@ -177,6 +179,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions that retire.  For instructions that consist of multiple micro-ops, this event counts exactly once, as the last micro-op of the instruction retires.  The event continues counting while instructions retire, including during interrupt service routines caused by hardware interrupts, faults or traps.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
>> index 1b8dcfa..c062253 100644
>> --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
>> @@ -3246,6 +3246,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore Clocks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_H_U_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CHA"
>> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
>> index fb75297..3575baa 100644
>> --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
>> @@ -41,6 +41,7 @@
>>      },
>>      {
>>          "BriefDescription": "ECLK count",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_E_E_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "EDC_ECLK"
>> @@ -55,6 +56,7 @@
>>      },
>>      {
>>          "BriefDescription": "UCLK count",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_E_U_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "EDC_UCLK"
>> @@ -93,12 +95,14 @@
>>      },
>>      {
>>          "BriefDescription": "DCLK count",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_D_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC_DCLK"
>>      },
>>      {
>>          "BriefDescription": "UCLK count",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_U_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "iMC_UCLK"
>> diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
>> index 6397894..0de3572 100644
>> --- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
>> @@ -37,6 +37,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.CORE",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2",
>> @@ -51,6 +52,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x3",
>> @@ -58,6 +60,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -75,6 +78,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2",
>> @@ -82,6 +86,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -105,6 +110,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "SampleAfterValue": "2000003",
>> @@ -113,6 +119,7 @@
>>      },
>>      {
>>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
>> @@ -157,6 +164,7 @@
>>      },
>>      {
>>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "TOPDOWN.SLOTS",
>>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>>          "SampleAfterValue": "10000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
>> index ecaf94c..973a5f4 100644
>> --- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
>> @@ -337,6 +337,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -359,6 +360,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "This event counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -367,6 +369,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -440,6 +443,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
>> index 72e9bdfa..ada2c34 100644
>> --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
>> @@ -284,6 +284,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -299,6 +300,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -426,6 +428,7 @@
>>      },
>>      {
>>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
>> @@ -457,6 +460,7 @@
>>      },
>>      {
>>          "BriefDescription": "Precise instruction retired with PEBS precise-distribution",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.PREC_DIST",
>>          "PEBS": "1",
>>          "PublicDescription": "A version of INST_RETIRED that allows for a precise distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR++) feature to fix bias in how retired instructions get sampled. Use on Fixed Counter 0.",
>> @@ -719,6 +723,7 @@
>>      },
>>      {
>>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "TOPDOWN.SLOTS",
>>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>>          "SampleAfterValue": "10000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
>> index 4121295..67be689 100644
>> --- a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
>> @@ -17,6 +17,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.CORE",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -29,6 +30,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x3"
>> @@ -43,6 +45,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -55,6 +58,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
>> index 2d4214b..6423c01 100644
>> --- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
>> @@ -143,6 +143,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.CORE",
>>          "PublicDescription": "Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.  The core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. In systems with a constant core frequency, this event can give you a measurement of the elapsed time while the core was not in halt state by dividing the event count by the core frequency. This event is architecturally defined and is a designated fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use the core frequency which may change from time to time.  CPU_CLK_UNHALTE.REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  The fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the programmable events are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.",
>>          "SampleAfterValue": "2000003",
>> @@ -165,6 +166,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.  The core frequency may change from time. This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  Divide this event count by core frequency to determine the elapsed time while the core was not in halt state.  Divide this event count by core frequency to determine the elapsed time while the core was not in halt state.  This event is architecturally defined and is a designated fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use the core frequency which may change from time to time.  CPU_CLK_UNHALTE.REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time.  The fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the programmable events are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.",
>>          "SampleAfterValue": "2000003",
>> @@ -180,6 +182,7 @@
>>      },
>>      {
>>          "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "This event counts the number of instructions that retire.  For instructions that consist of multiple micro-ops, this event counts exactly once, as the last micro-op of the instruction retires.  The event continues counting while instructions retire, including during interrupt service routines caused by hardware interrupts, faults or traps.  Background: Modern microprocessors employ extensive pipelining and speculative techniques.  Since sometimes an instruction is started but never completed, the notion of \"retirement\" is introduced.  A retired instruction is one that commits its states. Or stated differently, an instruction might be abandoned at some point. No instruction is truly finished until it retires.  This counter measures the number of completed instructions.  The fixed event is INST_RETIRED.ANY and the programmable event is INST_RETIRED.ANY_P.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
>> index 2dfc3af..53f1381 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
>> @@ -182,6 +182,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -213,6 +214,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -221,6 +223,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -360,6 +363,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
>> index 0f06e31..99346e1 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
>> @@ -191,6 +191,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -222,6 +223,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -230,6 +232,7 @@
>>      {
>>          "AnyThread": "1",
>>          "BriefDescription": "Core cycles when at least one thread on the physical core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
>>          "SampleAfterValue": "2000003",
>>          "UMask": "0x2"
>> @@ -369,6 +372,7 @@
>>      },
>>      {
>>          "BriefDescription": "Instructions retired from execution.",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PublicDescription": "Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
>>          "SampleAfterValue": "2000003",
>> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
>> index 543dfc1..4df1294 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
>> @@ -460,6 +460,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clockticks of the uncore caching & home agent (CHA)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_CHA_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts clockticks of the clock controlling the uncore caching and home agent (CHA).",
>> @@ -5678,6 +5679,7 @@
>>      {
>>          "BriefDescription": "This event is deprecated. Refer to new event UNC_CHA_CLOCKTICKS",
>>          "Deprecated": "1",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_C_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CHA"
>> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
>> index 26a5a20..40f609c 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
>> @@ -1090,6 +1090,7 @@
>>      },
>>      {
>>          "BriefDescription": "Cycles - at UCLK",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M2M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "M2M"
>> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
>> index 2a3a709..21a6a0f 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
>> @@ -1271,6 +1271,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counting disabled",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_IIO_NOTHING",
>>          "PerPkg": "1",
>>          "Unit": "IIO"
>> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
>> index 6f8ff22..a7ce916 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
>> @@ -167,6 +167,7 @@
>>      },
>>      {
>>          "BriefDescription": "Memory controller clock ticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Counts clockticks of the fixed frequency clock of the memory controller using one of the programmable counters.",
>> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
>> index c6254af..a01b279 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "pclk Cycles",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  This event counts the number of pclk cycles measured while the counter was enabled.  The pclk, like the Memory Controller's dclk, counts at a constant rate making it a good measure of actual wall time.",
>> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
>> index 9dd8c90..3388cd5 100644
>> --- a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
>> @@ -150,6 +150,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the number of unhalted reference clock cycles at TSC frequency. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is not affected by core frequency changes and increments at a fixed frequency that is also used for the Time Stamp Counter (TSC). This event uses fixed counter 2.",
>>          "SampleAfterValue": "2000003",
>> @@ -179,6 +180,7 @@
>>      },
>>      {
>>          "BriefDescription": "Counts the total number of instructions retired. (Fixed event)",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the total number of instructions that retired. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. This event continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0.",
>> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
>> index a68a5bb..279381b 100644
>> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
>> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
>> @@ -872,6 +872,7 @@
>>      },
>>      {
>>          "BriefDescription": "Uncore cache clock ticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_CHA_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "CHA"
>> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
>> index de38400..399536f 100644
>> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
>> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
>> @@ -1419,6 +1419,7 @@
>>      },
>>      {
>>          "BriefDescription": "Clockticks of the mesh to memory (M2M)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M2M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "M2M"
>> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
>> index 530e9b71..b24ba35 100644
>> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
>> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
>> @@ -120,6 +120,7 @@
>>      },
>>      {
>>          "BriefDescription": "Memory controller clock ticks",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_M_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "PublicDescription": "Clockticks of the integrated memory controller (IMC)",
>> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
>> index 27fc155..5c04d6e 100644
>> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
>> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
>> @@ -1,6 +1,7 @@
>>  [
>>      {
>>          "BriefDescription": "Clockticks of the power control unit (PCU)",
>> +        "EventCode": "0x0",
>>          "EventName": "UNC_P_CLOCKTICKS",
>>          "PerPkg": "1",
>>          "Unit": "PCU"
>> diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
>> index a0aeeb8..54a81f9 100644
>> --- a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
>> @@ -193,6 +193,7 @@
>>      },
>>      {
>>          "BriefDescription": "Reference cycles when the core is not in halt state.",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
>>          "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
>>          "SampleAfterValue": "2000003",
>> @@ -208,6 +209,7 @@
>>      },
>>      {
>>          "BriefDescription": "Core cycles when the thread is not in halt state",
>> +        "EventCode": "0x0",
>>          "EventName": "CPU_CLK_UNHALTED.THREAD",
>>          "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
>>          "SampleAfterValue": "2000003",
>> @@ -352,6 +354,7 @@
>>      },
>>      {
>>          "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.ANY",
>>          "PEBS": "1",
>>          "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
>> @@ -377,6 +380,7 @@
>>      },
>>      {
>>          "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution",
>> +        "EventCode": "0x0",
>>          "EventName": "INST_RETIRED.PREC_DIST",
>>          "PEBS": "1",
>>          "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
>> @@ -569,6 +573,7 @@
>>      },
>>      {
>>          "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
>> +        "EventCode": "0x0",
>>          "EventName": "TOPDOWN.SLOTS",
>>          "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
>>          "SampleAfterValue": "10000003",
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 5/8] perf test: Make matching_pmu effective
  2023-08-25  4:27     ` Ian Rogers
@ 2023-08-25  6:30       ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:30 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:27, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> The perf_pmu_test_event.matching_pmu didn't work. No matter what its
>> value is, it does not affect the test results. So let matching_pmu be
>> used for matching perf_pmu_test_pmu.pmu.name.
> 
> Could you rebase this onto the latest perf-tools-next, I'd like to test this.
> 

Ok, I will rebase it onto the latest perf-tools-next in next version.

Thanks,
Jing

> Thanks,
> Ian
> 
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>  tools/perf/tests/pmu-events.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
>> index 1dff863b..3204252 100644
>> --- a/tools/perf/tests/pmu-events.c
>> +++ b/tools/perf/tests/pmu-events.c
>> @@ -238,7 +238,7 @@ struct perf_pmu_test_pmu {
>>         },
>>         .alias_str = "event=0x2b",
>>         .alias_long_desc = "ddr write-cycles event. Unit: uncore_sys_ddr_pmu ",
>> -       .matching_pmu = "uncore_sys_ddr_pmu",
>> +       .matching_pmu = "uncore_sys_ddr_pmu0",
>>  };
>>
>>  static const struct perf_pmu_test_event sys_ccn_pmu_read_cycles = {
>> @@ -252,7 +252,7 @@ struct perf_pmu_test_pmu {
>>         },
>>         .alias_str = "config=0x2c",
>>         .alias_long_desc = "ccn read-cycles event. Unit: uncore_sys_ccn_pmu ",
>> -       .matching_pmu = "uncore_sys_ccn_pmu",
>> +       .matching_pmu = "uncore_sys_ccn_pmu4",
>>  };
>>
>>  static const struct perf_pmu_test_event *sys_events[] = {
>> @@ -599,6 +599,11 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>                         struct pmu_event const *event = &test_event->event;
>>
>>                         if (!strcmp(event->name, alias->name)) {
>> +                               if (strcmp(pmu_name, test_event->matching_pmu)) {
>> +                                       pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s vs %s\n",
>> +                                                       pmu_name, test_event->matching_pmu, pmu_name);
>> +                                       continue;
>> +                               }
>>                                 if (compare_alias_to_test_event(alias,
>>                                                         test_event,
>>                                                         pmu_name)) {
>> --
>> 1.8.3.1
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 5/8] perf test: Make matching_pmu effective
@ 2023-08-25  6:30       ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:30 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:27, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> The perf_pmu_test_event.matching_pmu didn't work. No matter what its
>> value is, it does not affect the test results. So let matching_pmu be
>> used for matching perf_pmu_test_pmu.pmu.name.
> 
> Could you rebase this onto the latest perf-tools-next, I'd like to test this.
> 

Ok, I will rebase it onto the latest perf-tools-next in next version.

Thanks,
Jing

> Thanks,
> Ian
> 
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>  tools/perf/tests/pmu-events.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
>> index 1dff863b..3204252 100644
>> --- a/tools/perf/tests/pmu-events.c
>> +++ b/tools/perf/tests/pmu-events.c
>> @@ -238,7 +238,7 @@ struct perf_pmu_test_pmu {
>>         },
>>         .alias_str = "event=0x2b",
>>         .alias_long_desc = "ddr write-cycles event. Unit: uncore_sys_ddr_pmu ",
>> -       .matching_pmu = "uncore_sys_ddr_pmu",
>> +       .matching_pmu = "uncore_sys_ddr_pmu0",
>>  };
>>
>>  static const struct perf_pmu_test_event sys_ccn_pmu_read_cycles = {
>> @@ -252,7 +252,7 @@ struct perf_pmu_test_pmu {
>>         },
>>         .alias_str = "config=0x2c",
>>         .alias_long_desc = "ccn read-cycles event. Unit: uncore_sys_ccn_pmu ",
>> -       .matching_pmu = "uncore_sys_ccn_pmu",
>> +       .matching_pmu = "uncore_sys_ccn_pmu4",
>>  };
>>
>>  static const struct perf_pmu_test_event *sys_events[] = {
>> @@ -599,6 +599,11 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>                         struct pmu_event const *event = &test_event->event;
>>
>>                         if (!strcmp(event->name, alias->name)) {
>> +                               if (strcmp(pmu_name, test_event->matching_pmu)) {
>> +                                       pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s vs %s\n",
>> +                                                       pmu_name, test_event->matching_pmu, pmu_name);
>> +                                       continue;
>> +                               }
>>                                 if (compare_alias_to_test_event(alias,
>>                                                         test_event,
>>                                                         pmu_name)) {
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field.
  2023-08-25  4:30     ` Ian Rogers
@ 2023-08-25  6:30       ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:30 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:30, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> Add new event test for uncore system event which is used to verify the
>> functionality of "Compat" matching multiple identifiers and the new event
>> fields "EventIdCode" and "Type".
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> 
> Thanks for the tests! I've no issue with them beside the already
> mentioned ';'. This will need updating for:
> https://lore.kernel.org/lkml/20230824183212.374787-1-irogers@google.com/
> https://lore.kernel.org/lkml/20230825024002.801955-1-irogers@google.com/
> 

Ok, will do.

Thanks,
Jing

> Thanks,
> Ian
> 
>> ---
>>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |  8 ++++
>>  tools/perf/pmu-events/empty-pmu-events.c           |  8 ++++
>>  tools/perf/tests/pmu-events.c                      | 55 ++++++++++++++++++++++
>>  3 files changed, 71 insertions(+)
>>
>> diff --git a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
>> index c7e7528..06b886d 100644
>> --- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
>> +++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
>> @@ -12,5 +12,13 @@
>>             "EventName": "sys_ccn_pmu.read_cycles",
>>             "Unit": "sys_ccn_pmu",
>>             "Compat": "0x01"
>> +   },
>> +   {
>> +           "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
>> +           "NodeType": "0x05",
>> +           "EventIdCode": "0x01",
>> +           "EventName": "sys_cmn_pmu.hnf_cache_miss",
>> +           "Unit": "sys_cmn_pmu",
>> +           "Compat": "434*;436*;43c*;43a01"
>>     }
>>  ]
>> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
>> index e74defb..25be18a 100644
>> --- a/tools/perf/pmu-events/empty-pmu-events.c
>> +++ b/tools/perf/pmu-events/empty-pmu-events.c
>> @@ -245,6 +245,14 @@ struct pmu_events_map {
>>                 .pmu = "uncore_sys_ccn_pmu",
>>         },
>>         {
>> +               .name = "sys_cmn_pmu.hnf_cache_miss",
>> +               .event = "type=0x05,eventid=0x01",
>> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
>> +               .compat = "434*;436*;43c*;43a01",
>> +               .topic = "uncore",
>> +               .pmu = "uncore_sys_cmn_pmu",
>> +       },
>> +       {
>>                 .name = 0,
>>                 .event = 0,
>>                 .desc = 0,
>> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
>> index 3204252..79fb3e2 100644
>> --- a/tools/perf/tests/pmu-events.c
>> +++ b/tools/perf/tests/pmu-events.c
>> @@ -255,9 +255,24 @@ struct perf_pmu_test_pmu {
>>         .matching_pmu = "uncore_sys_ccn_pmu4",
>>  };
>>
>> +static const struct perf_pmu_test_event sys_cmn_pmu_hnf_cache_miss = {
>> +       .event = {
>> +               .name = "sys_cmn_pmu.hnf_cache_miss",
>> +               .event = "type=0x05,eventid=0x01",
>> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
>> +               .topic = "uncore",
>> +               .pmu = "uncore_sys_cmn_pmu",
>> +               .compat = "434*;436*;43c*;43a01",
>> +       },
>> +       .alias_str = "type=0x5,eventid=0x1",
>> +       .alias_long_desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
>> +       .matching_pmu = "uncore_sys_cmn_pmu0",
>> +};
>> +
>>  static const struct perf_pmu_test_event *sys_events[] = {
>>         &sys_ddr_pmu_write_cycles,
>>         &sys_ccn_pmu_read_cycles,
>> +       &sys_cmn_pmu_hnf_cache_miss,
>>         NULL
>>  };
>>
>> @@ -704,6 +719,46 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>                         &sys_ccn_pmu_read_cycles,
>>                 },
>>         },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43401",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43602",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43c03",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43a01",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       }
>>  };
>>
>>  /* Test that aliases generated are as expected */
>> --
>> 1.8.3.1
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field.
@ 2023-08-25  6:30       ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:30 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:30, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> Add new event test for uncore system event which is used to verify the
>> functionality of "Compat" matching multiple identifiers and the new event
>> fields "EventIdCode" and "Type".
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> 
> Thanks for the tests! I've no issue with them beside the already
> mentioned ';'. This will need updating for:
> https://lore.kernel.org/lkml/20230824183212.374787-1-irogers@google.com/
> https://lore.kernel.org/lkml/20230825024002.801955-1-irogers@google.com/
> 

Ok, will do.

Thanks,
Jing

> Thanks,
> Ian
> 
>> ---
>>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |  8 ++++
>>  tools/perf/pmu-events/empty-pmu-events.c           |  8 ++++
>>  tools/perf/tests/pmu-events.c                      | 55 ++++++++++++++++++++++
>>  3 files changed, 71 insertions(+)
>>
>> diff --git a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
>> index c7e7528..06b886d 100644
>> --- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
>> +++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
>> @@ -12,5 +12,13 @@
>>             "EventName": "sys_ccn_pmu.read_cycles",
>>             "Unit": "sys_ccn_pmu",
>>             "Compat": "0x01"
>> +   },
>> +   {
>> +           "BriefDescription": "Counts total cache misses in first lookup result (high priority).",
>> +           "NodeType": "0x05",
>> +           "EventIdCode": "0x01",
>> +           "EventName": "sys_cmn_pmu.hnf_cache_miss",
>> +           "Unit": "sys_cmn_pmu",
>> +           "Compat": "434*;436*;43c*;43a01"
>>     }
>>  ]
>> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
>> index e74defb..25be18a 100644
>> --- a/tools/perf/pmu-events/empty-pmu-events.c
>> +++ b/tools/perf/pmu-events/empty-pmu-events.c
>> @@ -245,6 +245,14 @@ struct pmu_events_map {
>>                 .pmu = "uncore_sys_ccn_pmu",
>>         },
>>         {
>> +               .name = "sys_cmn_pmu.hnf_cache_miss",
>> +               .event = "type=0x05,eventid=0x01",
>> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
>> +               .compat = "434*;436*;43c*;43a01",
>> +               .topic = "uncore",
>> +               .pmu = "uncore_sys_cmn_pmu",
>> +       },
>> +       {
>>                 .name = 0,
>>                 .event = 0,
>>                 .desc = 0,
>> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
>> index 3204252..79fb3e2 100644
>> --- a/tools/perf/tests/pmu-events.c
>> +++ b/tools/perf/tests/pmu-events.c
>> @@ -255,9 +255,24 @@ struct perf_pmu_test_pmu {
>>         .matching_pmu = "uncore_sys_ccn_pmu4",
>>  };
>>
>> +static const struct perf_pmu_test_event sys_cmn_pmu_hnf_cache_miss = {
>> +       .event = {
>> +               .name = "sys_cmn_pmu.hnf_cache_miss",
>> +               .event = "type=0x05,eventid=0x01",
>> +               .desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
>> +               .topic = "uncore",
>> +               .pmu = "uncore_sys_cmn_pmu",
>> +               .compat = "434*;436*;43c*;43a01",
>> +       },
>> +       .alias_str = "type=0x5,eventid=0x1",
>> +       .alias_long_desc = "Counts total cache misses in first lookup result (high priority). Unit: uncore_sys_cmn_pmu ",
>> +       .matching_pmu = "uncore_sys_cmn_pmu0",
>> +};
>> +
>>  static const struct perf_pmu_test_event *sys_events[] = {
>>         &sys_ddr_pmu_write_cycles,
>>         &sys_ccn_pmu_read_cycles,
>> +       &sys_cmn_pmu_hnf_cache_miss,
>>         NULL
>>  };
>>
>> @@ -704,6 +719,46 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>                         &sys_ccn_pmu_read_cycles,
>>                 },
>>         },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43401",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43602",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43c03",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       },
>> +       {
>> +               .pmu = {
>> +                       .name = (char *)"uncore_sys_cmn_pmu0",
>> +                       .is_uncore = 1,
>> +                       .id = (char *)"43a01",
>> +               },
>> +               .aliases = {
>> +                       &sys_cmn_pmu_hnf_cache_miss,
>> +               },
>> +       }
>>  };
>>
>>  /* Test that aliases generated are as expected */
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN
  2023-08-25  4:13     ` Ian Rogers
@ 2023-08-25  6:47       ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:47 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:13, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
>> metrics which are general and compatible for any SoC with CMN-ANY.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  | 74 ++++++++++++++++++++++
>>  1 file changed, 74 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>> new file mode 100644
>> index 0000000..64db534
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>> @@ -0,0 +1,74 @@
>> +[
>> +       {
>> +               "MetricName": "slc_miss_rate",
>> +               "BriefDescription": "The system level cache miss rate.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
> 
> Here a ';' is used as a separator, but for "Unit" ',' is used as a
> separator. Is there a reason for the inconsistency?
> 

John and I have previously discussed this issue, and in fact, I deliberately used ';'.

I use a semicolon instead of a comma because I want to distinguish it from the function
of the comma in "Unit" and avoid confusion between the use of commas in "Unit" and "Compat".
Because in Unit, commas act as wildcards, and in “Compat”, the semicolon means "or". So
I think semicolons are more appropriate.

Thanks,
Jing

> Thanks,
> Ian
> 
>> +       },
>> +       {
>> +               "MetricName": "hnf_message_retry_rate",
>> +               "BriefDescription": "HN-F message retry rate indicates whether a lack of credits is causing the bottlenecks.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_pocq_retry / hnf_pocq_reqs_recvd",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "sf_hit_rate",
>> +               "BriefDescription": "Snoop filter hit rate can be used to measure the snoop filter efficiency.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_sf_hit / hnf_slc_sf_cache_access",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "mc_message_retry_rate",
>> +               "BriefDescription": "The memory controller request retries rate indicates whether the memory controller is the bottleneck.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_mc_retries / hnf_mc_reqs",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "rni_actual_read_bandwidth.all",
>> +               "BriefDescription": "This event measure the actual bandwidth that RN-I bridge sends to the interconnect.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "rnid_rxdat_flits * 32 / 1e6 / duration_time",
>> +               "ScaleUnit": "1MB/s",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "rni_actual_write_bandwidth.all",
>> +               "BriefDescription": "This event measures the actual write bandwidth at RN-I bridges.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "rnid_txdat_flits * 32 / 1e6 / duration_time",
>> +               "ScaleUnit": "1MB/s",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "rni_retry_rate",
>> +               "BriefDescription": "RN-I bridge retry rate indicates whether the memory controller is the bottleneck.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "rnid_txreq_flits_retried / rnid_txreq_flits_total",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "sbsx_actual_write_bandwidth.all",
>> +               "BriefDescription": "sbsx actual write bandwidth.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "sbsx_txdat_flitv * 32 / 1e6 / duration_time",
>> +               "ScaleUnit": "1MB/s",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       }
>> +]
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-08-25  6:47       ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  6:47 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, James Clark, Arnaldo Carvalho de Melo,
	Mark Rutland, Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	linux-kernel, linux-arm-kernel, linux-perf-users, linux-doc,
	Zhuo Song, Shuai Xue



在 2023/8/25 下午12:13, Ian Rogers 写道:
> On Mon, Aug 21, 2023 at 1:36 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
>> metrics which are general and compatible for any SoC with CMN-ANY.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  | 74 ++++++++++++++++++++++
>>  1 file changed, 74 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>> new file mode 100644
>> index 0000000..64db534
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>> @@ -0,0 +1,74 @@
>> +[
>> +       {
>> +               "MetricName": "slc_miss_rate",
>> +               "BriefDescription": "The system level cache miss rate.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
> 
> Here a ';' is used as a separator, but for "Unit" ',' is used as a
> separator. Is there a reason for the inconsistency?
> 

John and I have previously discussed this issue, and in fact, I deliberately used ';'.

I use a semicolon instead of a comma because I want to distinguish it from the function
of the comma in "Unit" and avoid confusion between the use of commas in "Unit" and "Compat".
Because in Unit, commas act as wildcards, and in “Compat”, the semicolon means "or". So
I think semicolons are more appropriate.

Thanks,
Jing

> Thanks,
> Ian
> 
>> +       },
>> +       {
>> +               "MetricName": "hnf_message_retry_rate",
>> +               "BriefDescription": "HN-F message retry rate indicates whether a lack of credits is causing the bottlenecks.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_pocq_retry / hnf_pocq_reqs_recvd",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "sf_hit_rate",
>> +               "BriefDescription": "Snoop filter hit rate can be used to measure the snoop filter efficiency.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_sf_hit / hnf_slc_sf_cache_access",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "mc_message_retry_rate",
>> +               "BriefDescription": "The memory controller request retries rate indicates whether the memory controller is the bottleneck.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "hnf_mc_retries / hnf_mc_reqs",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "rni_actual_read_bandwidth.all",
>> +               "BriefDescription": "This event measure the actual bandwidth that RN-I bridge sends to the interconnect.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "rnid_rxdat_flits * 32 / 1e6 / duration_time",
>> +               "ScaleUnit": "1MB/s",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "rni_actual_write_bandwidth.all",
>> +               "BriefDescription": "This event measures the actual write bandwidth at RN-I bridges.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "rnid_txdat_flits * 32 / 1e6 / duration_time",
>> +               "ScaleUnit": "1MB/s",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "rni_retry_rate",
>> +               "BriefDescription": "RN-I bridge retry rate indicates whether the memory controller is the bottleneck.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "rnid_txreq_flits_retried / rnid_txreq_flits_total",
>> +               "ScaleUnit": "100%",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       },
>> +       {
>> +               "MetricName": "sbsx_actual_write_bandwidth.all",
>> +               "BriefDescription": "sbsx actual write bandwidth.",
>> +               "MetricGroup": "cmn",
>> +               "MetricExpr": "sbsx_txdat_flitv * 32 / 1e6 / duration_time",
>> +               "ScaleUnit": "1MB/s",
>> +               "Unit": "arm_cmn",
>> +               "Compat": "434*;436*;43c*;43a*"
>> +       }
>> +]
>> --
>> 1.8.3.1
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
  2023-08-24 15:05     ` Robin Murphy
@ 2023-08-25  8:40       ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  8:40 UTC (permalink / raw)
  To: Robin Murphy, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/8/24 下午11:05, Robin Murphy 写道:
> On 21/08/2023 9:36 am, Jing Zhang wrote:
>> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>>
>> The same PMU driver has different PMU identifiers due to different
>> hardware versions and types, but they may have some common PMU event.
>> Since a Compat value can only match one identifier, when adding the
>> same event alias to PMUs with different identifiers, each identifier
>> needs to be defined once, which is not streamlined enough.
>>
>> So let "Compat" supports matching multiple identifiers for uncore PMU
>> alias. For example, the Compat value {43401;436*} can match the PMU
>> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
>> the prefix "436", that is, all CMN650, where "*" is a wildcard.
>> Tokens in Unit field are delimited by ';' with no spaces.
> 
> I wonder is there any possibility of supporting multiple values as a JSON array, rather than a single delimited string? Otherwise, if we're putting restrictions on what characters a driver can expose as an identifier, then I think that really wants explicitly documenting. AFAICT there's currently not even any documentation of the de-facto ABI that it's expected to be a free-form string rather than completely arbitrary binary data.
> 

I'm sorry I almost missed this message, as it was in my spam folder.

If we put multiple values as an array, its parsing in jevent.py will become complicated.
I agree that we need to document the character restrictions for driver identifier composition.
Both Unit and Compat have the same problem, so certain characters need to be restricted in
the identifiers and names of drivers. However, it seems that there is no such document.

Thanks,
Jing

> Thanks,
> Robin.
> 
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>>   tools/perf/util/pmu.h |  1 +
>>   2 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index ad209c8..6402423 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>>       return res;
>>   }
>>   +bool pmu_uncore_identifier_match(const char *id, const char *compat)
>> +{
>> +    char *tmp = NULL, *tok, *str;
>> +    bool res;
>> +    int n;
>> +
>> +    /*
>> +     * The strdup() call is necessary here because "compat" is a const str*
>> +     * type and cannot be used as an argument to strtok_r().
>> +     */
>> +    str = strdup(compat);
>> +    if (!str)
>> +        return false;
>> +
>> +    tok = strtok_r(str, ";", &tmp);
>> +    for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
>> +        n = strlen(tok);
>> +        if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
>> +            !strcmp(id, tok)) {
>> +            res = true;
>> +            goto out;
>> +        }
>> +    }
>> +    res = false;
>> +out:
>> +    free(str);
>> +    return res;
>> +}
>> +
>>   struct pmu_add_cpu_aliases_map_data {
>>       struct list_head *head;
>>       const char *name;
>> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>>       if (!pe->compat || !pe->pmu)
>>           return 0;
>>   -    if (!strcmp(pmu->id, pe->compat) &&
>> -        pmu_uncore_alias_match(pe->pmu, pmu->name)) {
>> +    if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
>> +        pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>>           __perf_pmu__new_alias(idata->head, -1,
>>                         (char *)pe->name,
>>                         (char *)pe->desc,
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index b9a02de..9d4385d 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>>   char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>>   const struct pmu_events_table *pmu_events_table__find(void);
>>   const struct pmu_metrics_table *pmu_metrics_table__find(void);
>> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>>   void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>>     int perf_pmu__convert_scale(const char *scale, char **end, double *sval);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers
@ 2023-08-25  8:40       ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-08-25  8:40 UTC (permalink / raw)
  To: Robin Murphy, John Garry, Ian Rogers
  Cc: Will Deacon, James Clark, Arnaldo Carvalho de Melo, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/8/24 下午11:05, Robin Murphy 写道:
> On 21/08/2023 9:36 am, Jing Zhang wrote:
>> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>>
>> The same PMU driver has different PMU identifiers due to different
>> hardware versions and types, but they may have some common PMU event.
>> Since a Compat value can only match one identifier, when adding the
>> same event alias to PMUs with different identifiers, each identifier
>> needs to be defined once, which is not streamlined enough.
>>
>> So let "Compat" supports matching multiple identifiers for uncore PMU
>> alias. For example, the Compat value {43401;436*} can match the PMU
>> identifier "43401", that is, CMN600_r0p0, and the PMU identifier with
>> the prefix "436", that is, all CMN650, where "*" is a wildcard.
>> Tokens in Unit field are delimited by ';' with no spaces.
> 
> I wonder is there any possibility of supporting multiple values as a JSON array, rather than a single delimited string? Otherwise, if we're putting restrictions on what characters a driver can expose as an identifier, then I think that really wants explicitly documenting. AFAICT there's currently not even any documentation of the de-facto ABI that it's expected to be a free-form string rather than completely arbitrary binary data.
> 

I'm sorry I almost missed this message, as it was in my spam folder.

If we put multiple values as an array, its parsing in jevent.py will become complicated.
I agree that we need to document the character restrictions for driver identifier composition.
Both Unit and Compat have the same problem, so certain characters need to be restricted in
the identifiers and names of drivers. However, it seems that there is no such document.

Thanks,
Jing

> Thanks,
> Robin.
> 
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Reviewed-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++--
>>   tools/perf/util/pmu.h |  1 +
>>   2 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index ad209c8..6402423 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>>       return res;
>>   }
>>   +bool pmu_uncore_identifier_match(const char *id, const char *compat)
>> +{
>> +    char *tmp = NULL, *tok, *str;
>> +    bool res;
>> +    int n;
>> +
>> +    /*
>> +     * The strdup() call is necessary here because "compat" is a const str*
>> +     * type and cannot be used as an argument to strtok_r().
>> +     */
>> +    str = strdup(compat);
>> +    if (!str)
>> +        return false;
>> +
>> +    tok = strtok_r(str, ";", &tmp);
>> +    for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
>> +        n = strlen(tok);
>> +        if ((tok[n - 1] == '*' && !strncmp(id, tok, n - 1)) ||
>> +            !strcmp(id, tok)) {
>> +            res = true;
>> +            goto out;
>> +        }
>> +    }
>> +    res = false;
>> +out:
>> +    free(str);
>> +    return res;
>> +}
>> +
>>   struct pmu_add_cpu_aliases_map_data {
>>       struct list_head *head;
>>       const char *name;
>> @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu_event *pe,
>>       if (!pe->compat || !pe->pmu)
>>           return 0;
>>   -    if (!strcmp(pmu->id, pe->compat) &&
>> -        pmu_uncore_alias_match(pe->pmu, pmu->name)) {
>> +    if (pmu_uncore_alias_match(pe->pmu, pmu->name) &&
>> +        pmu_uncore_identifier_match(pmu->id, pe->compat)) {
>>           __perf_pmu__new_alias(idata->head, -1,
>>                         (char *)pe->name,
>>                         (char *)pe->desc,
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index b9a02de..9d4385d 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, struct perf_pmu *pmu,
>>   char *perf_pmu__getcpuid(struct perf_pmu *pmu);
>>   const struct pmu_events_table *pmu_events_table__find(void);
>>   const struct pmu_metrics_table *pmu_metrics_table__find(void);
>> +bool pmu_uncore_identifier_match(const char *id, const char *compat);
>>   void perf_pmu_free_alias(struct perf_pmu_alias *alias);
>>     int perf_pmu__convert_scale(const char *scale, char **end, double *sval);

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
  2023-08-21  8:36 ` Jing Zhang
@ 2023-09-06 16:05   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 57+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-09-06 16:05 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Ian Rogers, Will Deacon, James Clark, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

Em Mon, Aug 21, 2023 at 04:36:09PM +0800, Jing Zhang escreveu:
> Changes since v6:
> - Supplement the omitted EventCode;
> - Keep the original way of ConfigCode;
> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>   can succeed when build with NO_JEVENT=1.
> - Link: https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/

Waiting for a v8, from what I saw there are several more review comments
to address, right?

- Arnaldo
 
> Jing Zhang (8):
>   perf pmu: "Compat" supports matching multiple identifiers
>   perf metric: "Compat" supports matching multiple identifiers
>   perf vendor events: Supplement the omitted EventCode
>   perf jevents: Support more event fields
>   perf test: Make matching_pmu effective
>   perf test: Add pmu-event test for "Compat" and new event_field.
>   perf jevents: Add support for Arm CMN PMU aliasing
>   perf vendor events: Add JSON metrics for Arm CMN
> 
>  .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>  .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>  .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>  .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>  .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>  .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>  .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>  .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>  .../arch/x86/broadwellde/uncore-power.json         |   1 +
>  .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>  .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>  .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>  .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>  .../arch/x86/broadwellx/uncore-power.json          |   1 +
>  .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>  .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>  .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>  .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>  .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>  .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>  .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>  .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>  .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>  .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>  .../arch/x86/graniterapids/pipeline.json           |   4 +
>  .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>  .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>  .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>  .../arch/x86/haswellx/uncore-memory.json           |   2 +
>  .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>  .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>  .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>  .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>  .../arch/x86/icelakex/uncore-memory.json           |   1 +
>  .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>  .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>  .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>  .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>  .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>  .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>  .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>  .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>  .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>  .../arch/x86/jaketown/uncore-memory.json           |   1 +
>  .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>  .../arch/x86/knightslanding/pipeline.json          |   3 +
>  .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>  .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>  .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>  .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>  .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>  .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>  .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>  .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>  .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>  .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>  .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>  .../arch/x86/skylakex/uncore-memory.json           |   1 +
>  .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>  .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>  .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>  .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>  .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>  .../arch/x86/snowridgex/uncore-power.json          |   1 +
>  .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>  tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>  tools/perf/pmu-events/jevents.py                   |  21 +-
>  tools/perf/tests/pmu-events.c                      |  64 ++++-
>  tools/perf/util/metricgroup.c                      |   2 +-
>  tools/perf/util/pmu.c                              |  33 ++-
>  tools/perf/util/pmu.h                              |   1 +
>  77 files changed, 679 insertions(+), 9 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> 
> -- 
> 1.8.3.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-09-06 16:05   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 57+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-09-06 16:05 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Ian Rogers, Will Deacon, James Clark, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue

Em Mon, Aug 21, 2023 at 04:36:09PM +0800, Jing Zhang escreveu:
> Changes since v6:
> - Supplement the omitted EventCode;
> - Keep the original way of ConfigCode;
> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>   can succeed when build with NO_JEVENT=1.
> - Link: https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/

Waiting for a v8, from what I saw there are several more review comments
to address, right?

- Arnaldo
 
> Jing Zhang (8):
>   perf pmu: "Compat" supports matching multiple identifiers
>   perf metric: "Compat" supports matching multiple identifiers
>   perf vendor events: Supplement the omitted EventCode
>   perf jevents: Support more event fields
>   perf test: Make matching_pmu effective
>   perf test: Add pmu-event test for "Compat" and new event_field.
>   perf jevents: Add support for Arm CMN PMU aliasing
>   perf vendor events: Add JSON metrics for Arm CMN
> 
>  .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>  .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>  .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>  .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>  .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>  .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>  .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>  .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>  .../arch/x86/broadwellde/uncore-power.json         |   1 +
>  .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>  .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>  .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>  .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>  .../arch/x86/broadwellx/uncore-power.json          |   1 +
>  .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>  .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>  .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>  .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>  .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>  .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>  .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>  .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>  .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>  .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>  .../arch/x86/graniterapids/pipeline.json           |   4 +
>  .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>  .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>  .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>  .../arch/x86/haswellx/uncore-memory.json           |   2 +
>  .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>  .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>  .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>  .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>  .../arch/x86/icelakex/uncore-memory.json           |   1 +
>  .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>  .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>  .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>  .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>  .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>  .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>  .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>  .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>  .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>  .../arch/x86/jaketown/uncore-memory.json           |   1 +
>  .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>  .../arch/x86/knightslanding/pipeline.json          |   3 +
>  .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>  .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>  .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>  .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>  .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>  .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>  .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>  .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>  .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>  .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>  .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>  .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>  .../arch/x86/skylakex/uncore-memory.json           |   1 +
>  .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>  .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>  .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>  .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>  .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>  .../arch/x86/snowridgex/uncore-power.json          |   1 +
>  .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>  tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>  tools/perf/pmu-events/jevents.py                   |  21 +-
>  tools/perf/tests/pmu-events.c                      |  64 ++++-
>  tools/perf/util/metricgroup.c                      |   2 +-
>  tools/perf/util/pmu.c                              |  33 ++-
>  tools/perf/util/pmu.h                              |   1 +
>  77 files changed, 679 insertions(+), 9 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
> 
> -- 
> 1.8.3.1
> 

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
  2023-09-06 16:05   ` Arnaldo Carvalho de Melo
@ 2023-09-07  2:42     ` Jing Zhang
  -1 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-09-07  2:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: John Garry, Ian Rogers, Will Deacon, James Clark, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/9/7 上午12:05, Arnaldo Carvalho de Melo 写道:
> Em Mon, Aug 21, 2023 at 04:36:09PM +0800, Jing Zhang escreveu:
>> Changes since v6:
>> - Supplement the omitted EventCode;
>> - Keep the original way of ConfigCode;
>> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>>   can succeed when build with NO_JEVENT=1.
>> - Link: https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/
> 
> Waiting for a v8, from what I saw there are several more review comments
> to address, right?
> 

Yes, I will send v8 soon.

Thanks,
Jing

> - Arnaldo
>  
>> Jing Zhang (8):
>>   perf pmu: "Compat" supports matching multiple identifiers
>>   perf metric: "Compat" supports matching multiple identifiers
>>   perf vendor events: Supplement the omitted EventCode
>>   perf jevents: Support more event fields
>>   perf test: Make matching_pmu effective
>>   perf test: Add pmu-event test for "Compat" and new event_field.
>>   perf jevents: Add support for Arm CMN PMU aliasing
>>   perf vendor events: Add JSON metrics for Arm CMN
>>
>>  .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>>  .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>>  .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>>  .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>>  .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>>  .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>>  .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>>  .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>>  .../arch/x86/broadwellde/uncore-power.json         |   1 +
>>  .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>>  .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>>  .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>>  .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>>  .../arch/x86/broadwellx/uncore-power.json          |   1 +
>>  .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>>  .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>>  .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>>  .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>>  .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>>  .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>>  .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>>  .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>>  .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>>  .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>>  .../arch/x86/graniterapids/pipeline.json           |   4 +
>>  .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>>  .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>>  .../arch/x86/haswellx/uncore-memory.json           |   2 +
>>  .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>>  .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>>  .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>>  .../arch/x86/icelakex/uncore-memory.json           |   1 +
>>  .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>>  .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>>  .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>>  .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>>  .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>>  .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>>  .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>>  .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>>  .../arch/x86/jaketown/uncore-memory.json           |   1 +
>>  .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>>  .../arch/x86/knightslanding/pipeline.json          |   3 +
>>  .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>>  .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>>  .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>>  .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>>  .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>>  .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>>  .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>>  .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>>  .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>>  .../arch/x86/skylakex/uncore-memory.json           |   1 +
>>  .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>>  .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>>  .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>>  .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>>  .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>>  .../arch/x86/snowridgex/uncore-power.json          |   1 +
>>  .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>>  tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>>  tools/perf/pmu-events/jevents.py                   |  21 +-
>>  tools/perf/tests/pmu-events.c                      |  64 ++++-
>>  tools/perf/util/metricgroup.c                      |   2 +-
>>  tools/perf/util/pmu.c                              |  33 ++-
>>  tools/perf/util/pmu.h                              |   1 +
>>  77 files changed, 679 insertions(+), 9 deletions(-)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>>
>> -- 
>> 1.8.3.1
>>
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN
@ 2023-09-07  2:42     ` Jing Zhang
  0 siblings, 0 replies; 57+ messages in thread
From: Jing Zhang @ 2023-09-07  2:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: John Garry, Ian Rogers, Will Deacon, James Clark, Mark Rutland,
	Mike Leach, Leo Yan, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, linux-kernel,
	linux-arm-kernel, linux-perf-users, linux-doc, Zhuo Song,
	Shuai Xue



在 2023/9/7 上午12:05, Arnaldo Carvalho de Melo 写道:
> Em Mon, Aug 21, 2023 at 04:36:09PM +0800, Jing Zhang escreveu:
>> Changes since v6:
>> - Supplement the omitted EventCode;
>> - Keep the original way of ConfigCode;
>> - Supplement the test in empty-pmu-events.c, so that the pmu event test
>>   can succeed when build with NO_JEVENT=1.
>> - Link: https://lore.kernel.org/all/1691394685-61240-1-git-send-email-renyu.zj@linux.alibaba.com/
> 
> Waiting for a v8, from what I saw there are several more review comments
> to address, right?
> 

Yes, I will send v8 soon.

Thanks,
Jing

> - Arnaldo
>  
>> Jing Zhang (8):
>>   perf pmu: "Compat" supports matching multiple identifiers
>>   perf metric: "Compat" supports matching multiple identifiers
>>   perf vendor events: Supplement the omitted EventCode
>>   perf jevents: Support more event fields
>>   perf test: Make matching_pmu effective
>>   perf test: Add pmu-event test for "Compat" and new event_field.
>>   perf jevents: Add support for Arm CMN PMU aliasing
>>   perf vendor events: Add JSON metrics for Arm CMN
>>
>>  .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json     | 266 +++++++++++++++++++++
>>  .../pmu-events/arch/arm64/arm/cmn/sys/metric.json  |  74 ++++++
>>  .../pmu-events/arch/test/test_soc/sys/uncore.json  |   8 +
>>  .../pmu-events/arch/x86/alderlake/pipeline.json    |   9 +
>>  .../pmu-events/arch/x86/alderlaken/pipeline.json   |   3 +
>>  .../pmu-events/arch/x86/broadwell/pipeline.json    |   4 +
>>  .../pmu-events/arch/x86/broadwellde/pipeline.json  |   4 +
>>  .../arch/x86/broadwellde/uncore-cache.json         |   2 +
>>  .../arch/x86/broadwellde/uncore-interconnect.json  |   1 +
>>  .../arch/x86/broadwellde/uncore-memory.json        |   1 +
>>  .../arch/x86/broadwellde/uncore-power.json         |   1 +
>>  .../pmu-events/arch/x86/broadwellx/pipeline.json   |   4 +
>>  .../arch/x86/broadwellx/uncore-cache.json          |   2 +
>>  .../arch/x86/broadwellx/uncore-interconnect.json   |  13 +
>>  .../arch/x86/broadwellx/uncore-memory.json         |   2 +
>>  .../arch/x86/broadwellx/uncore-power.json          |   1 +
>>  .../pmu-events/arch/x86/cascadelakex/pipeline.json |   4 +
>>  .../arch/x86/cascadelakex/uncore-cache.json        |   2 +
>>  .../arch/x86/cascadelakex/uncore-interconnect.json |   1 +
>>  .../arch/x86/cascadelakex/uncore-io.json           |   1 +
>>  .../arch/x86/cascadelakex/uncore-memory.json       |   1 +
>>  .../arch/x86/cascadelakex/uncore-power.json        |   1 +
>>  .../pmu-events/arch/x86/elkhartlake/pipeline.json  |   2 +
>>  .../pmu-events/arch/x86/goldmont/pipeline.json     |   3 +
>>  .../pmu-events/arch/x86/goldmontplus/pipeline.json |   3 +
>>  .../pmu-events/arch/x86/grandridge/pipeline.json   |   3 +
>>  .../arch/x86/graniterapids/pipeline.json           |   4 +
>>  .../perf/pmu-events/arch/x86/haswell/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/haswellx/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/haswellx/uncore-cache.json |   2 +
>>  .../arch/x86/haswellx/uncore-interconnect.json     |  14 ++
>>  .../arch/x86/haswellx/uncore-memory.json           |   2 +
>>  .../pmu-events/arch/x86/haswellx/uncore-power.json |   1 +
>>  .../perf/pmu-events/arch/x86/icelake/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/icelakex/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/icelakex/uncore-cache.json |   1 +
>>  .../arch/x86/icelakex/uncore-interconnect.json     |   1 +
>>  .../arch/x86/icelakex/uncore-memory.json           |   1 +
>>  .../pmu-events/arch/x86/icelakex/uncore-power.json |   1 +
>>  .../pmu-events/arch/x86/ivybridge/pipeline.json    |   3 +
>>  .../perf/pmu-events/arch/x86/ivytown/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/ivytown/uncore-cache.json  |   2 +
>>  .../arch/x86/ivytown/uncore-interconnect.json      |  11 +
>>  .../pmu-events/arch/x86/ivytown/uncore-memory.json |   1 +
>>  .../pmu-events/arch/x86/ivytown/uncore-power.json  |   1 +
>>  .../pmu-events/arch/x86/jaketown/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/jaketown/uncore-cache.json |   2 +
>>  .../arch/x86/jaketown/uncore-interconnect.json     |  12 +
>>  .../arch/x86/jaketown/uncore-memory.json           |   1 +
>>  .../pmu-events/arch/x86/jaketown/uncore-power.json |   2 +
>>  .../arch/x86/knightslanding/pipeline.json          |   3 +
>>  .../arch/x86/knightslanding/uncore-cache.json      |   1 +
>>  .../arch/x86/knightslanding/uncore-memory.json     |   4 +
>>  .../pmu-events/arch/x86/meteorlake/pipeline.json   |   8 +
>>  .../pmu-events/arch/x86/sandybridge/pipeline.json  |   4 +
>>  .../arch/x86/sapphirerapids/pipeline.json          |   5 +
>>  .../pmu-events/arch/x86/sierraforest/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/silvermont/pipeline.json   |   3 +
>>  .../perf/pmu-events/arch/x86/skylake/pipeline.json |   4 +
>>  .../pmu-events/arch/x86/skylakex/pipeline.json     |   4 +
>>  .../pmu-events/arch/x86/skylakex/uncore-cache.json |   2 +
>>  .../arch/x86/skylakex/uncore-interconnect.json     |   1 +
>>  .../pmu-events/arch/x86/skylakex/uncore-io.json    |   1 +
>>  .../arch/x86/skylakex/uncore-memory.json           |   1 +
>>  .../pmu-events/arch/x86/skylakex/uncore-power.json |   1 +
>>  .../pmu-events/arch/x86/snowridgex/pipeline.json   |   2 +
>>  .../arch/x86/snowridgex/uncore-cache.json          |   1 +
>>  .../arch/x86/snowridgex/uncore-interconnect.json   |   1 +
>>  .../arch/x86/snowridgex/uncore-memory.json         |   1 +
>>  .../arch/x86/snowridgex/uncore-power.json          |   1 +
>>  .../pmu-events/arch/x86/tigerlake/pipeline.json    |   5 +
>>  tools/perf/pmu-events/empty-pmu-events.c           |   8 +
>>  tools/perf/pmu-events/jevents.py                   |  21 +-
>>  tools/perf/tests/pmu-events.c                      |  64 ++++-
>>  tools/perf/util/metricgroup.c                      |   2 +-
>>  tools/perf/util/pmu.c                              |  33 ++-
>>  tools/perf/util/pmu.h                              |   1 +
>>  77 files changed, 679 insertions(+), 9 deletions(-)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json
>>
>> -- 
>> 1.8.3.1
>>
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2023-09-07  2:42 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21  8:36 [PATCH v7 0/8] perf vendor events: Add JSON metrics for Arm CMN Jing Zhang
2023-08-21  8:36 ` Jing Zhang
2023-08-21  8:36 ` [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers Jing Zhang
2023-08-21  8:36   ` Jing Zhang
2023-08-24 15:05   ` Robin Murphy
2023-08-24 15:05     ` Robin Murphy
2023-08-25  8:40     ` Jing Zhang
2023-08-25  8:40       ` Jing Zhang
2023-08-25  4:11   ` Ian Rogers
2023-08-25  4:11     ` Ian Rogers
2023-08-25  6:12     ` Jing Zhang
2023-08-25  6:12       ` Jing Zhang
2023-08-21  8:36 ` [PATCH v7 2/8] perf metric: " Jing Zhang
2023-08-21  8:36   ` Jing Zhang
2023-08-25  4:16   ` Ian Rogers
2023-08-25  4:16     ` Ian Rogers
2023-08-21  8:36 ` [PATCH v7 3/8] perf vendor events: Supplement the omitted EventCode Jing Zhang
2023-08-25  4:24   ` Ian Rogers
2023-08-25  6:28     ` Jing Zhang
2023-08-21  8:36 ` [PATCH v7 4/8] perf jevents: Support more event fields Jing Zhang
2023-08-21  8:36   ` Jing Zhang
2023-08-23  9:12   ` Robin Murphy
2023-08-23  9:12     ` Robin Murphy
2023-08-25  4:42     ` Ian Rogers
2023-08-25  4:42       ` Ian Rogers
2023-08-21  8:36 ` [PATCH v7 5/8] perf test: Make matching_pmu effective Jing Zhang
2023-08-21  8:36   ` Jing Zhang
2023-08-25  4:27   ` Ian Rogers
2023-08-25  4:27     ` Ian Rogers
2023-08-25  6:30     ` Jing Zhang
2023-08-25  6:30       ` Jing Zhang
2023-08-21  8:36 ` [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field Jing Zhang
2023-08-21  8:36   ` Jing Zhang
2023-08-25  4:30   ` Ian Rogers
2023-08-25  4:30     ` Ian Rogers
2023-08-25  6:30     ` Jing Zhang
2023-08-25  6:30       ` Jing Zhang
2023-08-21  8:36 ` [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing Jing Zhang
2023-08-21  8:36   ` Jing Zhang
2023-08-23  9:33   ` Robin Murphy
2023-08-23  9:33     ` Robin Murphy
2023-08-24  2:12     ` Jing Zhang
2023-08-24  2:12       ` Jing Zhang
2023-08-21  8:36 ` [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN Jing Zhang
2023-08-21  8:36   ` Jing Zhang
2023-08-25  4:13   ` Ian Rogers
2023-08-25  4:13     ` Ian Rogers
2023-08-25  6:47     ` Jing Zhang
2023-08-25  6:47       ` Jing Zhang
2023-08-23  8:12 ` [PATCH v7 0/8] " John Garry
2023-08-23  8:12   ` John Garry
2023-08-24  2:33   ` Jing Zhang
2023-08-24  2:33     ` Jing Zhang
2023-09-06 16:05 ` Arnaldo Carvalho de Melo
2023-09-06 16:05   ` Arnaldo Carvalho de Melo
2023-09-07  2:42   ` Jing Zhang
2023-09-07  2:42     ` Jing Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.