All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kris Van Hees <kris.van.hees@oracle.com>
To: linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
	linux-modules@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org
Cc: Kris Van Hees <kris.van.hees@oracle.com>,
	Nick Alcock <nick.alcock@oracle.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Jiri Olsa <olsajiri@gmail.com>
Subject: [PATCH 4/6] module: script to generate offset ranges for builtin modules
Date: Fri,  8 Dec 2023 00:07:50 -0500	[thread overview]
Message-ID: <20231208050752.2787575-5-kris.van.hees@oracle.com> (raw)
In-Reply-To: <20231208050752.2787575-1-kris.van.hees@oracle.com>

The offset range data for builtin modules is generated using:
 - modules.builtin.objs: associates object files with module names
 - vmlinux.o: provides load order of sections and offset of first member per
    section
 - vmlinux.o.map: provides offset of object file content per section

The generated data will look like:

.text 00000000-00000000 = _text
.text 0000baf0-0000cb10 amd_uncore
.text 0009bd10-0009c8e0 iosf_mbi
...
.text 008e6660-008e9630 snd_soc_wcd_mbhc
.text 008e9630-008ea610 snd_soc_wcd9335 snd_soc_wcd934x snd_soc_wcd938x
.text 008ea610-008ea780 snd_soc_wcd9335
...
.data 00000000-00000000 = _sdata
.data 0000f020-0000f680 amd_uncore

For each ELF section, it lists the offset of the first symbol.  This can
be used to determine the base address of the section at runtime.

Next, it lists (in strict ascending order) offset ranges in that section
that cover the symbols of one or more builtin modules.  Multiple ranges
can apply to a single module, and ranges can be shared between modules.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
---
 scripts/generate_builtin_ranges.awk | 149 ++++++++++++++++++++++++++++
 1 file changed, 149 insertions(+)
 create mode 100755 scripts/generate_builtin_ranges.awk

diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
new file mode 100755
index 000000000000..d5d668c97bd7
--- /dev/null
+++ b/scripts/generate_builtin_ranges.awk
@@ -0,0 +1,149 @@
+#!/usr/bin/gawk -f
+
+FNR == 1 {
+	FC++;
+}
+
+# (1) Build a mapping to associate object files with built-in module names.
+#
+# The first file argument is used as input (modules.builtin.objs).
+#
+FC == 1 {
+	sub(/:/, "");
+	mod = $1;
+	sub(/([^/]*\/)+/, "", mod);
+	sub(/\.o$/, "", mod);
+	gsub(/-/, "_", mod);
+
+	if (NF > 1) {
+		for (i = 2; i <= NF; i++) {
+			if ($i in mods)
+				mods[$i] = mods[$i] " " mod;
+			else
+				mods[$i] = mod;
+		}
+	} else
+		mods[$1] = mod;
+
+	next;
+}
+
+# (2) Determine the load address for each section.
+#
+# The second file argument is used as input (vmlinux.map).
+# Since some AWK implementations cannot handle large integers, we strip of the
+# first 4 hex digits from the address.  This is safe because the kernel space
+# is not large enough for addresses to extend into those digits.
+#
+FC == 2 && /^\./ && NF > 2 {
+	if (type)
+		delete sect_addend[type];
+
+	if ($1 ~ /percpu/)
+		next;
+
+	raw_addr = $2;
+	addr_prefix = "^" substr($2, 1, 6);
+	sub(addr_prefix, "0x", $2);
+	base = strtonum($2);
+	type = $1;
+	anchor = 0;
+	sect_base[type] = base;
+
+	next;
+}
+
+!type {
+	next;
+}
+
+# (3) We need to determine the base address of the section so that ranges can
+# be expressed based on offsets from the base address.  This accomodates the
+# kernel sections getting loaded at different addresses than what is recorded
+# in vmlinux.map.
+#
+# At runtime, we will need to determine the base address of each section we are
+# interested in.  We do that by recording the offset of the first symbol in the
+# section.  Once we know the address of this symbol in the running kernel, we
+# can calculate the base address of the section.
+#
+# If possible, we use an explicit anchor symbol (sym = .) listed at the base
+# address (offset 0).
+#
+# If there is no such symbol, we record the first symbol in the section along
+# with its offset.
+#
+# We also determine the offset of the first member in the section in case the
+# final linking inserts some content between the start of the section and the
+# first member.  I.e. in that case, vmlinux.map will list the first member at
+# a non-zero offset whereas vmlinux.o.map will list it at offset 0.  We record
+# the addend so we can apply it when processing vmlinux.o.map (next).
+#
+FC == 2 && !anchor && raw_addr == $1 && $3 == "=" && $4 == "." {
+	anchor = sprintf("%s %08x-%08x = %s", type, 0, 0, $2);
+	sect_anchor[type] = anchor;
+
+	next;
+}
+
+FC == 2 && !anchor && $1 ~ /^0x/ && $2 !~ /^0x/ && NF <= 4 {
+	sub(addr_prefix, "0x", $1);
+	addr = strtonum($1) - base;
+	anchor = sprintf("%s %08x-%08x = %s", type, addr, addr, $2);
+	sect_anchor[type] = anchor;
+
+	next;
+}
+
+FC == 2 && base && /^ \./ && $1 == type && NF == 4 {
+	sub(addr_prefix, "0x", $2);
+	addr = strtonum($2);
+	sect_addend[type] = addr - base;
+
+	if (anchor) {
+		base = 0;
+		type = 0;
+	}
+
+	next;
+}
+
+# (4) Collect offset ranges (relative to the section base address) for built-in
+# modules.
+#
+FC == 3 && /^ \./ && NF == 4 && $3 != "0x0" {
+	type = $1;
+	if (!(type in sect_addend))
+		next;
+
+	sub(addr_prefix, "0x", $2);
+	addr = strtonum($2) + sect_addend[type];
+
+	if ($4 in mods)
+		mod = mods[$4];
+	else
+		mod = "";
+
+	if (mod == mod_name)
+		next;
+
+	if (mod_name) {
+		idx = mod_start + sect_base[type] + sect_addend[type];
+		entries[idx] = sprintf("%s %08x-%08x %s", type, mod_start, addr, mod_name);
+		count[type]++;
+	}
+
+	mod_name = mod;
+	mod_start = addr;
+}
+
+END {
+	for (type in count) {
+		if (type in sect_anchor)
+			entries[sect_base[type]] = sect_anchor[type];
+	}
+
+	n = asorti(entries, indices);
+	for (i = 1; i <= n; i++)
+		print entries[indices[i]];
+}
-- 
2.42.0


  parent reply	other threads:[~2023-12-08  5:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-08  5:07 [PATCH 0/6] Generate address range data for built-in modules Kris Van Hees
2023-12-08  5:07 ` [PATCH 1/6] kbuild: add modules.builtin.objs Kris Van Hees
2023-12-08  5:07 ` [PATCH 2/6] module: add CONFIG_BUILTIN_RANGES option Kris Van Hees
2023-12-08 22:59   ` Masami Hiramatsu
2023-12-11 16:29     ` Kris Van Hees
2023-12-08  5:07 ` [PATCH 3/6] kbuild: generate a linker map for vmlinux.o Kris Van Hees
2023-12-08  5:07 ` Kris Van Hees [this message]
2023-12-08  5:07 ` [PATCH 5/6] kbuild: generate modules.builtin.ranges when linking the kernel Kris Van Hees
2023-12-08  5:07 ` [PATCH 6/6] module: add install target for modules.builtin.ranges Kris Van Hees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231208050752.2787575-5-kris.van.hees@oracle.com \
    --to=kris.van.hees@oracle.com \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nick.alcock@oracle.com \
    --cc=olsajiri@gmail.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.