All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Rogers <irogers@google.com>
To: John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>,
	James Clark <james.clark@arm.com>,
	Mike Leach <mike.leach@linaro.org>, Leo Yan <leo.yan@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Zhengjun Xing <zhengjun.xing@linux.intel.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-perf-users@vger.kernel.org
Cc: Stephane Eranian <eranian@google.com>, Ian Rogers <irogers@google.com>
Subject: [PATCH v3 17/17] perf jevents: Fold strings optimization
Date: Fri, 29 Jul 2022 00:43:51 -0700	[thread overview]
Message-ID: <20220729074351.138260-18-irogers@google.com> (raw)
In-Reply-To: <20220729074351.138260-1-irogers@google.com>

If a shorter string ends a longer string then the shorter string may
reuse the longer string at an offset. For example, on x86 the event
arith.cycles_div_busy and cycles_div_busy can be folded, even though
they have difference names the strings are identical after 6
characters. cycles_div_busy can reuse the arith.cycles_div_busy string
at an offset of 6.

In pmu-events.c this looks like the following where the 'also:' lists
folded strings:
/* offset=177541 */ "arith.cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000" /* also: cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000 */

As jevents.py combines multiple strings for an event into a larger
string, the amount of folding is minimal as all parts of the event must
align. Other organizations can benefit more from folding, but lose space
by say recording more offsets.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py | 55 ++++++++++++++++++++++++++++----
 1 file changed, 48 insertions(+), 7 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index a5e162558994..e5545758c92d 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -80,7 +80,9 @@ class BigCString:
   are all the other C strings (to avoid memory issues the string
   itself is held as a list of strings). The offsets within the big
   string are recorded and when stored to disk these don't need
-  relocation.
+  relocation. To reduce the size of the string further, identical
+  strings are merged. If a longer string ends-with the same value as a
+  shorter string, these entries are also merged.
   """
   strings: Set[str]
   big_string: Sequence[str]
@@ -96,6 +98,33 @@ class BigCString:
   def compute(self) -> None:
     """Called once all strings are added to compute the string and offsets."""
 
+    folded_strings = {}
+    # Determine if two strings can be folded, ie. let 1 string use the
+    # end of another. First reverse all strings and sort them.
+    sorted_reversed_strings = sorted([x[::-1] for x in self.strings])
+
+    # Strings 'xyz' and 'yz' will now be [ 'zy', 'zyx' ]. Scan forward
+    # for each string to see if there is a better candidate to fold it
+    # into, in the example rather than using 'yz' we can use'xyz' at
+    # an offset of 1. We record which string can be folded into which
+    # in folded_strings, we don't need to record the offset as it is
+    # trivially computed from the string lengths.
+    for pos,s in enumerate(sorted_reversed_strings):
+      best_pos = pos
+      for check_pos in range(pos + 1, len(sorted_reversed_strings)):
+        if sorted_reversed_strings[check_pos].startswith(s):
+          best_pos = check_pos
+        else:
+          break
+      if pos != best_pos:
+        folded_strings[s[::-1]] = sorted_reversed_strings[best_pos][::-1]
+
+    # Compute reverse mappings for debugging.
+    fold_into_strings = collections.defaultdict(set)
+    for key, val in folded_strings.items():
+      if key != val:
+        fold_into_strings[val].add(key)
+
     # big_string_offset is the current location within the C string
     # being appended to - comments, etc. don't count. big_string is
     # the string contents represented as a list. Strings are immutable
@@ -104,13 +133,25 @@ class BigCString:
     big_string_offset = 0
     self.big_string = []
     self.offsets = {}
-    # Emit all strings in a sorted manner.
+
+    # Emit all strings that aren't folded in a sorted manner.
     for s in sorted(self.strings):
-      self.offsets[s] = big_string_offset
-      self.big_string.append(f'/* offset={big_string_offset} */ "')
-      self.big_string.append(s)
-      self.big_string.append('"\n')
-      big_string_offset += c_len(s)
+      if s not in folded_strings:
+        self.offsets[s] = big_string_offset
+        self.big_string.append(f'/* offset={big_string_offset} */ "')
+        self.big_string.append(s)
+        self.big_string.append('"')
+        if s in fold_into_strings:
+          self.big_string.append(' /* also: ' + ', '.join(fold_into_strings[s]) + ' */')
+        self.big_string.append('\n')
+        big_string_offset += c_len(s)
+        continue
+
+    # Compute the offsets of the folded strings.
+    for s in folded_strings.keys():
+      assert s not in self.offsets
+      folded_s = folded_strings[s]
+      self.offsets[s] = self.offsets[folded_s] + c_len(folded_s) - c_len(s)
 
 _bcs = BigCString()
 
-- 
2.37.1.455.g008518b4e5-goog


WARNING: multiple messages have this Message-ID (diff)
From: Ian Rogers <irogers@google.com>
To: John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>,
	 James Clark <james.clark@arm.com>,
	Mike Leach <mike.leach@linaro.org>,  Leo Yan <leo.yan@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	 Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,  Namhyung Kim <namhyung@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	 Zhengjun Xing <zhengjun.xing@linux.intel.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	 Kan Liang <kan.liang@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	 linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	 linux-perf-users@vger.kernel.org
Cc: Stephane Eranian <eranian@google.com>, Ian Rogers <irogers@google.com>
Subject: [PATCH v3 17/17] perf jevents: Fold strings optimization
Date: Fri, 29 Jul 2022 00:43:51 -0700	[thread overview]
Message-ID: <20220729074351.138260-18-irogers@google.com> (raw)
In-Reply-To: <20220729074351.138260-1-irogers@google.com>

If a shorter string ends a longer string then the shorter string may
reuse the longer string at an offset. For example, on x86 the event
arith.cycles_div_busy and cycles_div_busy can be folded, even though
they have difference names the strings are identical after 6
characters. cycles_div_busy can reuse the arith.cycles_div_busy string
at an offset of 6.

In pmu-events.c this looks like the following where the 'also:' lists
folded strings:
/* offset=177541 */ "arith.cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000" /* also: cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000 */

As jevents.py combines multiple strings for an event into a larger
string, the amount of folding is minimal as all parts of the event must
align. Other organizations can benefit more from folding, but lose space
by say recording more offsets.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py | 55 ++++++++++++++++++++++++++++----
 1 file changed, 48 insertions(+), 7 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index a5e162558994..e5545758c92d 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -80,7 +80,9 @@ class BigCString:
   are all the other C strings (to avoid memory issues the string
   itself is held as a list of strings). The offsets within the big
   string are recorded and when stored to disk these don't need
-  relocation.
+  relocation. To reduce the size of the string further, identical
+  strings are merged. If a longer string ends-with the same value as a
+  shorter string, these entries are also merged.
   """
   strings: Set[str]
   big_string: Sequence[str]
@@ -96,6 +98,33 @@ class BigCString:
   def compute(self) -> None:
     """Called once all strings are added to compute the string and offsets."""
 
+    folded_strings = {}
+    # Determine if two strings can be folded, ie. let 1 string use the
+    # end of another. First reverse all strings and sort them.
+    sorted_reversed_strings = sorted([x[::-1] for x in self.strings])
+
+    # Strings 'xyz' and 'yz' will now be [ 'zy', 'zyx' ]. Scan forward
+    # for each string to see if there is a better candidate to fold it
+    # into, in the example rather than using 'yz' we can use'xyz' at
+    # an offset of 1. We record which string can be folded into which
+    # in folded_strings, we don't need to record the offset as it is
+    # trivially computed from the string lengths.
+    for pos,s in enumerate(sorted_reversed_strings):
+      best_pos = pos
+      for check_pos in range(pos + 1, len(sorted_reversed_strings)):
+        if sorted_reversed_strings[check_pos].startswith(s):
+          best_pos = check_pos
+        else:
+          break
+      if pos != best_pos:
+        folded_strings[s[::-1]] = sorted_reversed_strings[best_pos][::-1]
+
+    # Compute reverse mappings for debugging.
+    fold_into_strings = collections.defaultdict(set)
+    for key, val in folded_strings.items():
+      if key != val:
+        fold_into_strings[val].add(key)
+
     # big_string_offset is the current location within the C string
     # being appended to - comments, etc. don't count. big_string is
     # the string contents represented as a list. Strings are immutable
@@ -104,13 +133,25 @@ class BigCString:
     big_string_offset = 0
     self.big_string = []
     self.offsets = {}
-    # Emit all strings in a sorted manner.
+
+    # Emit all strings that aren't folded in a sorted manner.
     for s in sorted(self.strings):
-      self.offsets[s] = big_string_offset
-      self.big_string.append(f'/* offset={big_string_offset} */ "')
-      self.big_string.append(s)
-      self.big_string.append('"\n')
-      big_string_offset += c_len(s)
+      if s not in folded_strings:
+        self.offsets[s] = big_string_offset
+        self.big_string.append(f'/* offset={big_string_offset} */ "')
+        self.big_string.append(s)
+        self.big_string.append('"')
+        if s in fold_into_strings:
+          self.big_string.append(' /* also: ' + ', '.join(fold_into_strings[s]) + ' */')
+        self.big_string.append('\n')
+        big_string_offset += c_len(s)
+        continue
+
+    # Compute the offsets of the folded strings.
+    for s in folded_strings.keys():
+      assert s not in self.offsets
+      folded_s = folded_strings[s]
+      self.offsets[s] = self.offsets[folded_s] + c_len(folded_s) - c_len(s)
 
 _bcs = BigCString()
 
-- 
2.37.1.455.g008518b4e5-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2022-07-29  7:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-29  7:43 [PATCH v3 00/17] Compress the pmu_event tables Ian Rogers
2022-07-29  7:43 ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 01/17] perf jevents: Clean up pytype warnings Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 02/17] perf jevents: Simplify generation of C-string Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 03/17] perf jevents: Add JEVENTS_ARCH make option Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 04/17] perf jevent: Add an 'all' architecture argument Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 05/17] perf jevents: Remove the type/version variables Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  8:29   ` John Garry
2022-07-29  8:29     ` John Garry
2022-07-29 14:24     ` Ian Rogers
2022-07-29 14:24       ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 06/17] perf jevents: Provide path to json file on error Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 07/17] perf jevents: Sort json files entries Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 08/17] perf pmu-events: Hide pmu_sys_event_tables Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 09/17] perf pmu-events: Avoid passing pmu_events_map Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 10/17] perf pmu-events: Hide pmu_events_map Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 11/17] perf test: Use full metric resolution Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 12/17] perf pmu-events: Move test events/metrics to json Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 13/17] perf pmu-events: Don't assume pmu_event is an array Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 14/17] perf pmu-events: Hide the pmu_events Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 15/17] perf metrics: Copy entire pmu_event in find metric Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 16/17] perf jevents: Compress the pmu_events_table Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` Ian Rogers [this message]
2022-07-29  7:43   ` [PATCH v3 17/17] perf jevents: Fold strings optimization Ian Rogers
2022-07-29 15:03 ` [PATCH v3 00/17] Compress the pmu_event tables John Garry
2022-07-29 15:03   ` John Garry
2022-07-29 17:27   ` Ian Rogers
2022-07-29 17:27     ` Ian Rogers
2022-08-02  9:08     ` John Garry
2022-08-02  9:08       ` John Garry
2022-08-05  8:11       ` John Garry
2022-08-05  8:11         ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220729074351.138260-18-irogers@google.com \
    --to=irogers@google.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=eranian@google.com \
    --cc=james.clark@arm.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=will@kernel.org \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.