All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes
@ 2019-11-14 22:30 eugene.loh
  2019-11-15 16:47 ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: eugene.loh @ 2019-11-14 22:30 UTC (permalink / raw)
  To: eugene.loh
  Cc: rostedt, corbet, yamada.masahiro, michal.lkml, jeyu,
	linux-kbuild, maz, songliubraving, tglx, jacob.e.keller

From: Kris Van Hees <kris.van.hees@oracle.com>

/proc/kallsyms is very useful for tracers and other tools that need to
map kernel symbols to addresses.

It would be useful if:

- there were a mapping between kernel symbol and module name
  that only changed when the kernel source code is changed.
  This mapping should not change simply because a module
  becomes built into the kernel.

- there were symbol size information to determine whether an
  address is within a symbol or outside it, especially given
  that there could be huge gaps between symbols.

Therefore:

- Introduce a new config parameter CONFIG_KALLMODSYMS.

- Generate a file "modules_thick.builtin" that maps from
  the thin archives that make up built-in modules to their
  constituent object files.

- Generate a linker map ".tmp_vmlinux.map", converting it
  into ".tmp_vmlinux.ranges", mapping address ranges to
  object files.

- Change scripts/kallsyms.c stdin from "nm" to "nm -S" so that
  symbol sizes are available.  Have sort_symbols() incorporate
  size info.  Emit size info in the *.s output file.  Skip the
  .init.scratch section.

- If CONFIG_KALLMODSYMS, have scripts/kallsyms also read
  "modules_thick.builtin" and ".tmp_vmlinux.ranges" to map
  symbol addresses to built-in-module names and then write
  those module names and per-symbol module information to
  the *.s output file.

- Change module_get_kallsym() to return symbol size as well.

- In kernel/kallsyms:
  - Use new, accurate symbol size information in get_symbol_pos(),
    both to identify the correct symbol and to return correct size
    information.
  - Introduce a field builtin_module to say if the symbol is in a
    built-in module.
  - If CONFIG_KALLMODSYMS, produce a new /proc/kallmodsyms file,
    akin to /proc/kallsyms but with built-in-module names and symbol
    sizes.

The resulting /proc/kallmodsyms file looks like this:
    ffffffff8b013d20 409 t pt_buffer_setup_aux
    ffffffff8b014130 11f T intel_pt_interrupt
    ffffffff8b014250 2d T cpu_emergency_stop_pt
    ffffffff8b014280 13a t rapl_pmu_event_init      [intel_rapl_perf]
    ffffffff8b0143c0 bb t rapl_event_update [intel_rapl_perf]
    ffffffff8b014480 10 t rapl_pmu_event_read       [intel_rapl_perf]
    ffffffff8b014490 a3 t rapl_cpu_offline  [intel_rapl_perf]
    ffffffff8b014540 24 t __rapl_event_show [intel_rapl_perf]
    ffffffff8b014570 f2 t rapl_pmu_event_stop       [intel_rapl_perf]
This is emitted even if intel_rapl_perf is built into the kernel.

As with /proc/kallsyms, non-root usage produces addresses that are
all zero;  symbol sizes are treated similarly.

Programs that consume /proc/kallmodsyms should note that unlike
/proc/kallsyms, kernel symbols for built-in modules may appear
interspersed with other symbols that are part of different modules or
of the kernel.

Orabug: 29891866
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
---
 .gitignore                  |   1 +
 Documentation/dontdiff      |   1 +
 Makefile                    |  41 ++-
 include/linux/module.h      |   7 +-
 init/Kconfig                |   8 +
 kernel/kallsyms.c           | 145 +++++++---
 kernel/module.c             |   4 +-
 scripts/Makefile.modbuiltin |  20 +-
 scripts/kallsyms.c          | 559 +++++++++++++++++++++++++++++++++++-
 scripts/link-vmlinux.sh     |  23 +-
 scripts/namespace.pl        |   6 +
 11 files changed, 748 insertions(+), 67 deletions(-)

diff --git a/.gitignore b/.gitignore
index 70580bdd352c..474491775a1a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -47,6 +47,7 @@
 Module.symvers
 modules.builtin
 modules.order
+modules_thick.builtin
 
 #
 # Top-level generic files
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 9f4392876099..32ee05f91410 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -180,6 +180,7 @@ modpost
 modules.builtin
 modules.builtin.modinfo
 modules.order
+modules_thick.builtin
 modversions.h*
 nconf
 nconf-cfg
diff --git a/Makefile b/Makefile
index 49363caa7079..15b4e897cd3e 100644
--- a/Makefile
+++ b/Makefile
@@ -1077,7 +1077,7 @@ cmd_link-vmlinux =                                                 \
 	$(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
-vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
+vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
 	+$(call if_changed,link-vmlinux)
 
 targets := vmlinux
@@ -1292,17 +1292,6 @@ modules: $(if $(KBUILD_BUILTIN),vmlinux) modules.order modules.builtin
 modules.order: descend
 	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
 
-modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
-
-modules.builtin: $(modbuiltin-dirs)
-	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
-
-PHONY += $(modbuiltin-dirs)
-# tristate.conf is not included from this Makefile. Add it as a prerequisite
-# here to make it self-healing in case somebody accidentally removes it.
-$(modbuiltin-dirs): include/config/tristate.conf
-	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@)
-
 # Target to prepare building external modules
 PHONY += modules_prepare
 modules_prepare: prepare
@@ -1355,6 +1344,33 @@ modules modules_install:
 
 endif # CONFIG_MODULES
 
+# modules.builtin has a 'thick' form which maps from kernel modules (or rather
+# the object file names they would have had had they not been built in) to their
+# constituent object files: kallsyms uses this to determine which modules any
+# given object file is part of.  (We cannot eliminate the slight redundancy
+# here without double-expansion.)
+
+modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
+
+modbuiltin-thick-dirs := $(addprefix _modbuiltin_thick_, $(build-dirs))
+
+modules.builtin: $(modbuiltin-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+modules_thick.builtin: $(modbuiltin-thick-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+PHONY += $(modbuiltin-dirs) $(modbuiltin-thick-dirs)
+# tristate.conf is not included from this Makefile. Add it as a prerequisite
+# here to make it self-healing in case somebody accidentally removes it.
+$(modbuiltin-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@) \
+			builtin-file=modules.builtin
+
+$(modbuiltin-thick-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_thick_%,%,$@) \
+			builtin-file=modules_thick.builtin
+
 ###
 # Cleaning is done on three levels.
 # make clean     Delete most generated files
@@ -1674,6 +1690,7 @@ clean: $(clean-dirs)
 		-o -name '*.asn1.[ch]' \
 		-o -name '*.symtypes' -o -name 'modules.order' \
 		-o -name modules.builtin -o -name '.tmp_*.o.*' \
+		-o -name modules_thick.builtin \
 		-o -name '*.c.[012]*.*' \
 		-o -name '*.ll' \
 		-o -name '*.gcno' \) -type f -print | xargs rm -f
diff --git a/include/linux/module.h b/include/linux/module.h
index 6d20895e7739..b4f3f680a77d 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -592,7 +592,8 @@ bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
 /* Returns 0 and fills in value, defined and namebuf, or -ERANGE if
    symnum out of range. */
 int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
-			char *name, char *module_name, int *exported);
+		       char *name, char *module_name, unsigned long *size,
+		       int *exported);
 
 /* Look for this name: can be of form module:name. */
 unsigned long module_kallsyms_lookup_name(const char *name);
@@ -774,8 +775,8 @@ static inline int lookup_module_symbol_attrs(unsigned long addr, unsigned long *
 }
 
 static inline int module_get_kallsym(unsigned int symnum, unsigned long *value,
-					char *type, char *name,
-					char *module_name, int *exported)
+				     char *type, char *name, char *module_name,
+				     unsigned long *size, int *exported)
 {
 	return -ERANGE;
 }
diff --git a/init/Kconfig b/init/Kconfig
index ff6108deaad2..15294af2a1e9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1429,6 +1429,14 @@ config POSIX_TIMERS
 
 	  If unsure say y.
 
+config KALLMODSYMS
+	default y
+	bool "Enable support for /proc/kallmodsyms" if EXPERT
+	depends on KALLSYMS
+	help
+	  This option enables the /proc/kallmodsyms file, which maps symbols
+	  to addresses and their associated modules.
+
 config PRINTK
 	default y
 	bool "Enable support for printk" if EXPERT
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 136ce049c4ad..a51fdf73f9b3 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -32,6 +32,7 @@
  */
 extern const unsigned long kallsyms_addresses[] __weak;
 extern const int kallsyms_offsets[] __weak;
+extern const unsigned long kallsyms_sizes[] __weak;
 extern const u8 kallsyms_names[] __weak;
 
 /*
@@ -46,6 +47,8 @@ __attribute__((weak, section(".rodata")));
 
 extern const u8 kallsyms_token_table[] __weak;
 extern const u16 kallsyms_token_index[] __weak;
+extern const char kallsyms_modules[] __weak;
+extern const u32 kallsyms_symbol_modules[] __weak;
 
 extern const unsigned int kallsyms_markers[] __weak;
 
@@ -195,12 +198,24 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 }
 EXPORT_SYMBOL_GPL(kallsyms_on_each_symbol);
 
+/*
+ * The caller passes in an address, and we return an index to the symbol --
+ * potentially also size and offset information.
+ * But an address might map to multiple symbols because:
+ *   - some symbols might have zero size
+ *   - some symbols might be aliases of one another
+ *   - some symbols might span (encompass) others
+ * The symbols should already be ordered so that, for a particular address,
+ * we first have the zero-size ones, then the biggest, then the smallest.
+ * So we find the index by:
+ *   - finding the last symbol with the target address
+ *   - backing the index up so long as both the address and size are unchanged
+ */
 static unsigned long get_symbol_pos(unsigned long addr,
 				    unsigned long *symbolsize,
 				    unsigned long *offset)
 {
-	unsigned long symbol_start = 0, symbol_end = 0;
-	unsigned long i, low, high, mid;
+	unsigned long low, high, mid;
 
 	/* This kernel should never had been booted. */
 	if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
@@ -221,36 +236,17 @@ static unsigned long get_symbol_pos(unsigned long addr,
 	}
 
 	/*
-	 * Search for the first aliased symbol. Aliased
-	 * symbols are symbols with the same address.
+	 * Search for the first aliased symbol.
 	 */
-	while (low && kallsyms_sym_address(low-1) == kallsyms_sym_address(low))
+	while (low
+	    && kallsyms_sym_address(low-1) == kallsyms_sym_address(low)
+	    && kallsyms_sizes[low-1] == kallsyms_sizes[low])
 		--low;
 
-	symbol_start = kallsyms_sym_address(low);
-
-	/* Search for next non-aliased symbol. */
-	for (i = low + 1; i < kallsyms_num_syms; i++) {
-		if (kallsyms_sym_address(i) > symbol_start) {
-			symbol_end = kallsyms_sym_address(i);
-			break;
-		}
-	}
-
-	/* If we found no next symbol, we use the end of the section. */
-	if (!symbol_end) {
-		if (is_kernel_inittext(addr))
-			symbol_end = (unsigned long)_einittext;
-		else if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
-			symbol_end = (unsigned long)_end;
-		else
-			symbol_end = (unsigned long)_etext;
-	}
-
 	if (symbolsize)
-		*symbolsize = symbol_end - symbol_start;
+		*symbolsize = kallsyms_sizes[low];
 	if (offset)
-		*offset = addr - symbol_start;
+		*offset = addr - kallsyms_sym_address(low);
 
 	return low;
 }
@@ -270,6 +266,7 @@ int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize,
 	return !!module_address_lookup(addr, symbolsize, offset, NULL, namebuf) ||
 	       !!__bpf_address_lookup(addr, symbolsize, offset, namebuf);
 }
+EXPORT_SYMBOL_GPL(kallsyms_lookup_size_offset);
 
 /*
  * Lookup an address
@@ -440,9 +437,11 @@ struct kallsym_iter {
 	loff_t pos_ftrace_mod_end;
 	unsigned long value;
 	unsigned int nameoff; /* If iterating in core kernel symbols. */
+	unsigned long size;
 	char type;
 	char name[KSYM_NAME_LEN];
 	char module_name[MODULE_NAME_LEN];
+	int builtin_module;
 	int exported;
 	int show_value;
 };
@@ -472,7 +471,9 @@ static int get_ksymbol_mod(struct kallsym_iter *iter)
 	int ret = module_get_kallsym(iter->pos - iter->pos_arch_end,
 				     &iter->value, &iter->type,
 				     iter->name, iter->module_name,
-				     &iter->exported);
+				     &iter->size, &iter->exported);
+	iter->builtin_module = 0;
+
 	if (ret < 0) {
 		iter->pos_mod_end = iter->pos;
 		return 0;
@@ -508,10 +509,22 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
 static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
 {
 	unsigned off = iter->nameoff;
+	u32 mod_index = 0;
+
+	if (kallsyms_symbol_modules)
+		mod_index = kallsyms_symbol_modules[iter->pos];
 
-	iter->module_name[0] = '\0';
+	if (mod_index == 0 || kallsyms_modules == NULL) {
+		iter->module_name[0] = '\0';
+		iter->builtin_module = 0;
+	} else {
+		strcpy(iter->module_name, &kallsyms_modules[mod_index]);
+		iter->builtin_module = 1;
+	}
+	iter->exported = 0;
 	iter->value = kallsyms_sym_address(iter->pos);
 
+	iter->size = kallsyms_sizes[iter->pos];
 	iter->type = kallsyms_get_symbol_type(off);
 
 	off = kallsyms_expand_symbol(off, iter->name, ARRAY_SIZE(iter->name));
@@ -556,7 +569,7 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
 }
 
 /* Returns false if pos at or past end of file. */
-static int update_iter(struct kallsym_iter *iter, loff_t pos)
+int update_iter(struct kallsym_iter *iter, loff_t pos)
 {
 	/* Module symbols can be accessed randomly. */
 	if (pos >= kallsyms_num_syms)
@@ -592,18 +605,22 @@ static void s_stop(struct seq_file *m, void *p)
 {
 }
 
-static int s_show(struct seq_file *m, void *p)
+static int s_show_internal(struct seq_file *m, void *p, int builtin_modules)
 {
 	void *value;
 	struct kallsym_iter *iter = m->private;
+	unsigned long size;
 
 	/* Some debugging symbols have no name.  Ignore them. */
 	if (!iter->name[0])
 		return 0;
 
 	value = iter->show_value ? (void *)iter->value : NULL;
+	size = iter->show_value ? iter->size : 0;
 
-	if (iter->module_name[0]) {
+	if ((iter->builtin_module == 0 && iter->module_name[0]) ||
+	    (iter->builtin_module != 0 && iter->module_name[0] &&
+	     builtin_modules != 0)) {
 		char type;
 
 		/*
@@ -612,14 +629,34 @@ static int s_show(struct seq_file *m, void *p)
 		 */
 		type = iter->exported ? toupper(iter->type) :
 					tolower(iter->type);
-		seq_printf(m, "%px %c %s\t[%s]\n", value,
-			   type, iter->name, iter->module_name);
-	} else
+		if (builtin_modules)
+			seq_printf(m, "%px %lx %c %s\t[%s]\n", value,
+				   size, type, iter->name,
+				   iter->module_name);
+		else
+			seq_printf(m, "%px %c %s\t[%s]\n", value,
+				   type, iter->name, iter->module_name);
+	} else if (builtin_modules)
+		seq_printf(m, "%px %lx %c %s\n", value, size,
+			   iter->type, iter->name);
+	else
 		seq_printf(m, "%px %c %s\n", value,
 			   iter->type, iter->name);
 	return 0;
 }
 
+static int s_show(struct seq_file *m, void *p)
+{
+	return s_show_internal(m, p, 0);
+}
+
+#ifdef CONFIG_KALLMODSYMS
+static int s_mod_show(struct seq_file *m, void *p)
+{
+	return s_show_internal(m, p, 1);
+}
+#endif
+
 static const struct seq_operations kallsyms_op = {
 	.start = s_start,
 	.next = s_next,
@@ -627,6 +664,15 @@ static const struct seq_operations kallsyms_op = {
 	.show = s_show
 };
 
+#ifdef CONFIG_KALLMODSYMS
+static const struct seq_operations kallmodsyms_op = {
+	.start = s_start,
+	.next = s_next,
+	.stop = s_stop,
+	.show = s_mod_show
+};
+#endif
+
 static inline int kallsyms_for_perf(void)
 {
 #ifdef CONFIG_PERF_EVENTS
@@ -661,7 +707,8 @@ int kallsyms_show_value(void)
 	}
 }
 
-static int kallsyms_open(struct inode *inode, struct file *file)
+static int kallsyms_open_internal(struct inode *inode, struct file *file,
+	const struct seq_operations *ops)
 {
 	/*
 	 * We keep iterator in m->private, since normal case is to
@@ -669,7 +716,7 @@ static int kallsyms_open(struct inode *inode, struct file *file)
 	 * using get_symbol_offset for every symbol.
 	 */
 	struct kallsym_iter *iter;
-	iter = __seq_open_private(file, &kallsyms_op, sizeof(*iter));
+	iter = __seq_open_private(file, ops, sizeof(*iter));
 	if (!iter)
 		return -ENOMEM;
 	reset_iter(iter, 0);
@@ -678,6 +725,18 @@ static int kallsyms_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static int kallsyms_open(struct inode *inode, struct file *file)
+{
+	return kallsyms_open_internal(inode, file, &kallsyms_op);
+}
+
+#ifdef CONFIG_KALLMODSYMS
+static int kallmodsyms_open(struct inode *inode, struct file *file)
+{
+	return kallsyms_open_internal(inode, file, &kallmodsyms_op);
+}
+#endif
+
 #ifdef	CONFIG_KGDB_KDB
 const char *kdb_walk_kallsyms(loff_t *pos)
 {
@@ -705,9 +764,21 @@ static const struct file_operations kallsyms_operations = {
 	.release = seq_release_private,
 };
 
+#ifdef CONFIG_KALLMODSYMS
+static const struct file_operations kallmodsyms_operations = {
+	.open = kallmodsyms_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release_private,
+};
+#endif
+
 static int __init kallsyms_init(void)
 {
 	proc_create("kallsyms", 0444, NULL, &kallsyms_operations);
+#ifdef CONFIG_KALLMODSYMS
+	proc_create("kallmodsyms", 0444, NULL, &kallmodsyms_operations);
+#endif
 	return 0;
 }
 device_initcall(kallsyms_init);
diff --git a/kernel/module.c b/kernel/module.c
index ff2d7359a418..4a31e1e94acd 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4184,7 +4184,8 @@ int lookup_module_symbol_attrs(unsigned long addr, unsigned long *size,
 }
 
 int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
-			char *name, char *module_name, int *exported)
+		       char *name, char *module_name, unsigned long *size,
+		       int *exported)
 {
 	struct module *mod;
 
@@ -4203,6 +4204,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
 			strlcpy(name, kallsyms_symbol_name(kallsyms, symnum), KSYM_NAME_LEN);
 			strlcpy(module_name, mod->name, MODULE_NAME_LEN);
 			*exported = is_exported(name, *value, mod);
+			*size = kallsyms->symtab[symnum].st_size;
 			preempt_enable();
 			return 0;
 		}
diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
index 7d4711b88656..06f31e58111e 100644
--- a/scripts/Makefile.modbuiltin
+++ b/scripts/Makefile.modbuiltin
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 # ==========================================================================
-# Generating modules.builtin
+# Generating modules.builtin and modules_thick.builtin
 # ==========================================================================
 
 src := $(obj)
@@ -30,19 +30,29 @@ __subdir-Y     := $(patsubst %/,%,$(filter %/, $(obj-Y)))
 subdir-Y       += $(__subdir-Y)
 subdir-ym      := $(sort $(subdir-y) $(subdir-Y) $(subdir-m))
 subdir-ym      := $(addprefix $(obj)/,$(subdir-ym))
-obj-Y          := $(addprefix $(obj)/,$(obj-Y))
+pathobj-Y      := $(addprefix $(obj)/,$(obj-Y))
 
 modbuiltin-subdirs := $(patsubst %,%/modules.builtin, $(subdir-ym))
-modbuiltin-mods    := $(filter %.ko, $(obj-Y:.o=.ko))
+modbuiltin-mods    := $(filter %.ko, $(pathobj-Y:.o=.ko))
 modbuiltin-target  := $(obj)/modules.builtin
+modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
+modthickbuiltin-target  := $(obj)/modules_thick.builtin
 
-__modbuiltin: $(modbuiltin-target) $(subdir-ym)
+__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
 	@:
 
 $(modbuiltin-target): $(subdir-ym) FORCE
 	$(Q)(for m in $(modbuiltin-mods); do echo $$m; done;	\
 	cat /dev/null $(modbuiltin-subdirs)) > $@
 
+$(modthickbuiltin-target): $(subdir-ym) FORCE
+	$(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
+		printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
+		printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
+			$($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
+		printf "\n" >> $@; ) \
+	cat /dev/null $(modthickbuiltin-subdirs) >> $@;
+
 PHONY += FORCE
 
 FORCE:
@@ -52,6 +62,6 @@ FORCE:
 
 PHONY += $(subdir-ym)
 $(subdir-ym):
-	$(Q)$(MAKE) $(modbuiltin)=$@
+	$(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
 
 .PHONY: $(PHONY)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index ae6504d07fd6..e88653b00f36 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -5,7 +5,10 @@
  * This software may be used and distributed according to the terms
  * of the GNU General Public License, incorporated herein by reference.
  *
- * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
+ * Usage: nm -n -S vmlinux
+ *        | scripts/kallsyms [--all-symbols] [--absolute-percpu]
+ *             [--base-relative] [--builtin=modules_thick.builtin]
+ *        > symbols.S
  *
  *      Table compression uses all the unused char codes on the symbols and
  *  maps these to the most used substrings (tokens). For instance, it might
@@ -18,12 +21,19 @@
  *
  */
 
+#define _GNU_SOURCE 1
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <ctype.h>
 #include <limits.h>
 
+#include "../include/generated/autoconf.h"
+
+#ifdef CONFIG_KALLMODSYMS
+#include <errno.h>
+#endif
+
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
 #endif
@@ -32,10 +42,14 @@
 
 struct sym_entry {
 	unsigned long long addr;
+	unsigned long long size;
 	unsigned int len;
 	unsigned int start_pos;
 	unsigned char *sym;
 	unsigned int percpu_absolute;
+#ifdef CONFIG_KALLMODSYMS
+	unsigned int module;
+#endif
 };
 
 struct addr_range {
@@ -68,11 +82,118 @@ static int token_profit[0x10000];
 static unsigned char best_table[256][2];
 static unsigned char best_table_len[256];
 
+#ifdef CONFIG_KALLMODSYMS
+static unsigned int strhash(const char *s)
+{
+	/* fnv32 hash */
+	unsigned int hash = 2166136261U;
+
+	for (; *s; s++)
+		hash = (hash ^ *s) * 0x01000193;
+	return hash;
+}
+
+#define OBJ2MOD_BITS 10
+#define OBJ2MOD_N (1 << OBJ2MOD_BITS)
+#define OBJ2MOD_MASK (OBJ2MOD_N - 1)
+struct obj2mod_elem {
+	char *obj;
+	int mod;
+	struct obj2mod_elem *next;
+};
+
+static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
+
+static void obj2mod_init(void)
+{
+	memset(obj2mod, 0, sizeof(obj2mod));
+}
+
+static void obj2mod_put(char *obj, int mod)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem = malloc(sizeof(struct obj2mod_elem));
+
+	if (!elem) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		exit(1);
+	}
+
+	elem->obj = strdup(obj);
+	if (!elem->obj) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		free(elem);
+		exit(1);
+	}
+
+	elem->mod = mod;
+	elem->next = obj2mod[i];
+	obj2mod[i] = elem;
+}
+
+static int obj2mod_get(char *obj)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem;
+
+	for (elem = obj2mod[i]; elem; elem = elem->next)
+		if (strcmp(elem->obj, obj) == 0)
+			return elem->mod;
+	return 0;
+}
+
+static void obj2mod_free(void)
+{
+	int i;
+
+	for (i = 0; i < OBJ2MOD_N; i++) {
+		struct obj2mod_elem *elem = obj2mod[i];
+		struct obj2mod_elem *next;
+
+		while (elem) {
+			next = elem->next;
+			free(elem->obj);
+			free(elem);
+			elem = next;
+		}
+	}
+}
+
+/*
+ * The builtin module names.  The "offset" points to the name as if
+ * all builtin module names were concatenated to a single string.
+ */
+static unsigned int builtin_module_size;	/* number allocated */
+static unsigned int builtin_module_len;		/* number assigned */
+static char **builtin_modules;			/* array of module names */
+static unsigned int *builtin_module_offsets;	/* offset */
+
+/*
+ * modules_thick.builtin iteration state.
+ */
+struct modules_thick_iter {
+	FILE *f;
+	char *line;
+	size_t line_size;
+};
+
+/*
+ * An ordered list of address ranges and how they map to built-in modules.
+ */
+struct addrmap_entry {
+	unsigned long long addr;
+	unsigned long long size;
+	unsigned int module;
+};
+static struct addrmap_entry *addrmap;
+static int addrmap_num, addrmap_alloced;
+#endif
 
 static void usage(void)
 {
-	fprintf(stderr, "Usage: kallsyms [--all-symbols] "
-			"[--base-relative] < in.map > out.S\n");
+	fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
+			"[--base-relative] [--builtin=modules_thick.builtin] "
+			"< nm_vmlinux.out > symbols.S\n");
 	exit(1);
 }
 
@@ -107,13 +228,32 @@ static int check_symbol_range(const char *sym, unsigned long long addr,
 	return 1;
 }
 
+#ifdef CONFIG_KALLMODSYMS
+static int addrmap_compare(const void *keyp, const void *rangep)
+{
+	unsigned long long addr = *((const unsigned long long *)keyp);
+	const struct addrmap_entry *range = (const struct addrmap_entry *)rangep;
+
+	if (addr < range->addr)
+		return -1;
+	if (addr < range->addr + range->size)
+		return 0;
+	return 1;
+}
+#endif
+
 static int read_symbol(FILE *in, struct sym_entry *s)
 {
 	char sym[500], stype;
-	int rc;
+	int rc, init_scratch = 0;
+#ifdef CONFIG_KALLMODSYMS
+	struct addrmap_entry *range;
+#endif
 
-	rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, sym);
-	if (rc != 3) {
+read_another:
+	rc = fscanf(in, "%llx %llx %c %499s\n",
+		    &s->addr, &s->size, &stype, sym);
+	if (rc != 4) {
 		if (rc != EOF && fgets(sym, 500, in) == NULL)
 			fprintf(stderr, "Read error or end of file.\n");
 		return -1;
@@ -125,6 +265,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 		return -1;
 	}
 
+	/* skip the .init.scratch section */
+	if (strcmp(sym, "__init_scratch_end") == 0) {
+		init_scratch = 0;
+		goto read_another;
+	}
+	if (strcmp(sym, "__init_scratch_begin") == 0)
+		init_scratch = 1;
+	if (init_scratch)
+		goto read_another;
+
 	/* Ignore most absolute/undefined (?) symbols. */
 	if (strcmp(sym, "_text") == 0)
 		_text = s->addr;
@@ -154,6 +304,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 	else if (!strncmp(sym, ".LASANPC", 8))
 		return -1;
 
+#ifdef CONFIG_KALLMODSYMS
+	/* look up the builtin module this is part of (if any) */
+	range = (struct addrmap_entry *) bsearch(&s->addr,
+	    addrmap, addrmap_num, sizeof(*addrmap), &addrmap_compare);
+	if (range)
+		s->module = builtin_module_offsets[range->module];
+	else
+		s->module = 0;
+#endif
+
 	/* include the type field in the symbol name, so that it gets
 	 * compressed together */
 	s->len = strlen(sym) + 1;
@@ -201,11 +361,14 @@ static int symbol_valid(struct sym_entry *s)
 		"kallsyms_addresses",
 		"kallsyms_offsets",
 		"kallsyms_relative_base",
+		"kallsyms_sizes",
 		"kallsyms_num_syms",
 		"kallsyms_names",
 		"kallsyms_markers",
 		"kallsyms_token_table",
 		"kallsyms_token_index",
+		"kallsyms_symbol_modules",
+		"kallsyms_modules",
 
 	/* Exclude linker generated symbols which vary between passes */
 		"_SDA_BASE_",		/* ppc */
@@ -405,6 +568,11 @@ static void write_src(void)
 		printf("\n");
 	}
 
+	output_label("kallsyms_sizes");
+	for (i = 0; i < table_cnt; i++)
+		printf("\tPTR\t%#llx\n", table[i].size);
+	printf("\n");
+
 	output_label("kallsyms_num_syms");
 	printf("\t.long\t%u\n", table_cnt);
 	printf("\n");
@@ -454,8 +622,22 @@ static void write_src(void)
 	for (i = 0; i < 256; i++)
 		printf("\t.short\t%d\n", best_idx[i]);
 	printf("\n");
-}
 
+#ifdef CONFIG_KALLMODSYMS
+	output_label("kallsyms_modules");
+	for (i = 0; i < builtin_module_len; i++)
+		printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
+	printf("\n");
+
+	for (i = 0; i < builtin_module_len; i++)
+		free(builtin_modules[i]);
+
+	output_label("kallsyms_symbol_modules");
+	for (i = 0; i < table_cnt; i++)
+		printf("\t.int\t%d\n", table[i].module);
+	printf("\n");
+#endif
+}
 
 /* table lookup compression functions */
 
@@ -683,6 +865,18 @@ static int compare_symbols(const void *a, const void *b)
 	if (sa->addr < sb->addr)
 		return -1;
 
+	/* zero-size markers before nonzero-size symbols */
+	if (sa->size > 0 && sb->size == 0)
+		return 1;
+	if (sa->size == 0 && sb->size > 0)
+		return -1;
+
+	/* sort by size (large size preceding symbols it encompasses) */
+	if (sa->size < sb->size)
+		return 1;
+	if (sa->size > sb->size)
+		return -1;
+
 	/* sort by "weakness" type */
 	wa = (sa->sym[0] == 'w') || (sa->sym[0] == 'W');
 	wb = (sb->sym[0] == 'w') || (sb->sym[0] == 'W');
@@ -738,23 +932,372 @@ static void record_relative_base(void)
 			relative_base = table[i].addr;
 }
 
+#ifdef CONFIG_KALLMODSYMS
+/*
+ * Read a modules_thick.builtin file.
+ */
+
+/*
+ * Construct a modules_thick.builtin iterator.
+ */
+static struct modules_thick_iter *
+modules_thick_iter_new(const char *modules_thick_file)
+{
+	struct modules_thick_iter *i;
+
+	i = calloc(1, sizeof(struct modules_thick_iter));
+	if (i == NULL)
+		return NULL;
+
+	i->f = fopen(modules_thick_file, "r");
+
+	if (i->f == NULL) {
+		fprintf(stderr, "Cannot open builtin module file %s: %s\n",
+			modules_thick_file, strerror(errno));
+		return NULL;
+	}
+
+	return i;
+}
+
+/*
+ * Iterate, returning a new null-terminated array of object file names, and a
+ * new dynamically-allocated module name.  (The module name passed in is freed.)
+ *
+ * The array of object file names should be freed by the caller: the strings it
+ * points to are owned by the iterator, and should not be freed.
+ */
+static char ** __attribute__((__nonnull__))
+modules_thick_iter_next(struct modules_thick_iter *i, char **module_name)
+{
+	size_t npaths = 1;
+	char **module_paths;
+	char *last_slash;
+	char *last_dot;
+	char *trailing_linefeed;
+	char *object_name = i->line;
+	char *dash;
+	int composite = 0;
+
+	/*
+	 * Read in all module entries, computing the suffixless, pathless name
+	 * of the module and building the next arrayful of object file names for
+	 * return.
+	 *
+	 * Modules can consist of multiple files: in this case, the portion
+	 * before the colon is the path to the module (as before): the portion
+	 * after the colon is a space-separated list of files that should be *
+	 * considered part of this module.  In this case, the portion before the
+	 * name is an "object file" that does not actually exist: it is merged
+	 * into built-in.a without ever being written out.
+	 *
+	 * All module names have - translated to _, to match what is done to the
+	 * names of the same things when built as modules.
+	 */
+
+	/*
+	 * Reinvocation of exhausted iterator. Return NULL, once.
+	 */
+retry:
+	if (getline(&i->line, &i->line_size, i->f) < 0) {
+		if (ferror(i->f)) {
+			fprintf(stderr,
+				"Error reading from modules_thick file: %s\n",
+				strerror(errno));
+			exit(1);
+		}
+		rewind(i->f);
+		return NULL;
+	}
+
+	if (i->line[0] == '\0')
+		goto retry;
+
+	/*
+	 * Slice the line in two at the colon, if any.  If there is anything
+	 * past the ': ', this is a composite module.  (We allow for no colon
+	 * for robustness, even though one should always be present.)
+	 */
+	if (strchr(i->line, ':') != NULL) {
+		char *name_start;
+
+		object_name = strchr(i->line, ':');
+		*object_name = '\0';
+		object_name++;
+		name_start = object_name + strspn(object_name, " \n");
+		if (*name_start != '\0') {
+			composite = 1;
+			object_name = name_start;
+		}
+	}
+
+	/*
+	 * Figure out the module name.
+	 */
+	last_slash = strrchr(i->line, '/');
+	last_slash = (!last_slash) ? i->line :
+		last_slash + 1;
+	free(*module_name);
+	*module_name = strdup(last_slash);
+	dash = *module_name;
+
+	while (dash != NULL) {
+		dash = strchr(dash, '-');
+		if (dash != NULL)
+			*dash = '_';
+	}
+
+	last_dot = strrchr(*module_name, '.');
+	if (last_dot != NULL)
+		*last_dot = '\0';
+
+	trailing_linefeed = strchr(object_name, '\n');
+	if (trailing_linefeed != NULL)
+		*trailing_linefeed = '\0';
+
+	/*
+	 * Multifile separator? Object file names explicitly stated:
+	 * slice them up and shuffle them in.
+	 *
+	 * The array size may be an overestimate if any object file
+	 * names start or end with spaces (very unlikely) but cannot be
+	 * an underestimate.  (Check for it anyway.)
+	 */
+	if (composite) {
+		char *one_object;
+
+		for (npaths = 0, one_object = object_name;
+		     one_object != NULL;
+		     npaths++, one_object = strchr(one_object + 1, ' '))
+			;
+	}
+
+	module_paths = malloc((npaths + 1) * sizeof(char *));
+	if (!module_paths) {
+		fprintf(stderr, "%s: out of memory on module %s\n", __func__,
+			*module_name);
+		exit(1);
+	}
+
+	if (composite) {
+		char *one_object;
+		size_t i = 0;
+
+		while ((one_object = strsep(&object_name, " ")) != NULL) {
+			if (i >= npaths) {
+				fprintf(stderr, "%s: npaths overflow on module "
+					"%s: this is a bug.\n", __func__,
+					*module_name);
+				exit(1);
+			}
+
+			module_paths[i++] = one_object;
+		}
+	} else
+		module_paths[0] = i->line;	/* untransformed module name */
+
+	module_paths[npaths] = NULL;
+
+	return module_paths;
+}
+
+/*
+ * Free an iterator. Can be called while iteration is underway, so even
+ * state that is freed at the end of iteration must be freed here too.
+ */
+static void
+modules_thick_iter_free(struct modules_thick_iter *i)
+{
+	if (i == NULL)
+		return;
+	fclose(i->f);
+	free(i->line);
+	free(i);
+}
+
+/*
+ * Expand the builtin modules list.
+ */
+static void expand_builtin_modules(void)
+{
+	builtin_module_size += 50;
+
+	builtin_modules = realloc(builtin_modules,
+				  sizeof(*builtin_modules) *
+				  builtin_module_size);
+	builtin_module_offsets = realloc(builtin_module_offsets,
+					 sizeof(*builtin_module_offsets) *
+					 builtin_module_size);
+
+	if (!builtin_modules || !builtin_module_offsets) {
+		fprintf(stderr, "kallsyms failure: out of memory.\n");
+		exit(EXIT_FAILURE);
+	}
+}
+
+/*
+ * Add a single built-in module (possibly composed of many files) to the
+ * modules list.  Take the offset of the current module and return it
+ * (purely for simplicity's sake in the caller).
+ */
+static size_t add_builtin_module(const char *module_name, char **module_paths,
+				 size_t offset)
+{
+	/* map the module's object paths to the module offset */
+	while (*module_paths) {
+		obj2mod_put(*module_paths, builtin_module_len);
+		module_paths++;
+	}
+
+	/* add the module name */
+	if (builtin_module_size <= builtin_module_len)
+		expand_builtin_modules();
+	builtin_modules[builtin_module_len] = strdup(module_name);
+	builtin_module_offsets[builtin_module_len] = offset;
+	builtin_module_len++;
+
+	return (offset + strlen(module_name) + 1);
+}
+
+/*
+ * Read the linker map.
+ */
+static void read_linker_map(void)
+{
+	unsigned long long addr, size;
+	char obj[PATH_MAX+1];
+	FILE *f = fopen(".tmp_vmlinux.ranges", "r");
+
+	if (!f) {
+		fprintf(stderr, "Cannot open '.tmp_vmlinux.ranges'.\n");
+		exit(1);
+	}
+
+	addrmap_num = 0;
+	addrmap_alloced = 4096;
+	addrmap = malloc(sizeof(*addrmap) * addrmap_alloced);
+	if (!addrmap)
+		goto oom;
+
+	/*
+	 * For each address range (addr,size) and object, add to addrmap
+	 * the range and the built-in module to which the object maps.
+	 */
+	while (fscanf(f, "%llx %llx %s\n", &addr, &size, obj) == 3) {
+		int m = obj2mod_get(obj);
+
+		if (addr == 0 || size == 0 || m == 0)
+			continue;
+
+		if (addrmap_num >= addrmap_alloced) {
+			addrmap_alloced *= 2;
+			addrmap = realloc(addrmap,
+			    sizeof(*addrmap) * addrmap_alloced);
+			if (!addrmap)
+				goto oom;
+		}
+
+		addrmap[addrmap_num].addr = addr;
+		addrmap[addrmap_num].size = size;
+		addrmap[addrmap_num].module = m;
+		addrmap_num++;
+	}
+	fclose(f);
+	return;
+
+oom:
+	fprintf(stderr, "kallsyms: out of memory\n");
+	exit(1);
+}
+
+/*
+ * Read "modules_thick.builtin" (the list of built-in modules).  Construct:
+ *   - builtin_modules: array of built-in-module names
+ *   - builtin_module_offsets: array of offsets that will later be
+ *       used to access a concatenated list of built-in-module names
+ *   - obj2mod: a temporary, many-to-one, hash mapping
+ *       from object-file paths to built-in-module names
+ * Read ".tmp_vmlinux.ranges" (the linker map).
+ *   - addrmap[] maps address ranges to built-in module names (using obj2mod)
+ */
+static void read_modules(const char *modules_builtin)
+{
+	struct modules_thick_iter *i;
+	size_t offset = 0;
+	char *module_name = NULL;
+	char **module_paths;
+
+	obj2mod_init();
+
+	/*
+	 * builtin_modules[0] is a null entry signifying a symbol that cannot be
+	 * modular.
+	 */
+	builtin_module_size = 50;
+	builtin_modules = malloc(sizeof(*builtin_modules) *
+				 builtin_module_size);
+	builtin_module_offsets = malloc(sizeof(*builtin_module_offsets) *
+				 builtin_module_size);
+	if (!builtin_modules || !builtin_module_offsets) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		exit(1);
+	}
+	builtin_modules[0] = strdup("");
+	builtin_module_offsets[0] = 0;
+	builtin_module_len = 1;
+	offset++;
+
+	/*
+	 * Iterate over all modules in modules_thick.builtin and add each.
+	 */
+	i = modules_thick_iter_new(modules_builtin);
+	if (i == NULL) {
+		fprintf(stderr, "Cannot iterate over builtin modules.\n");
+		exit(1);
+	}
+
+	while ((module_paths = modules_thick_iter_next(i, &module_name))) {
+		offset = add_builtin_module(module_name, module_paths, offset);
+		free(module_paths);
+		module_paths = NULL;
+	}
+
+	free(module_name);
+	modules_thick_iter_free(i);
+
+	/*
+	 * Read linker map.
+	 */
+	read_linker_map();
+
+	obj2mod_free();
+}
+#else
+static void read_modules(const char *unused) {}
+#endif /* CONFIG_KALLMODSYMS */
+
 int main(int argc, char **argv)
 {
+	const char *modules_builtin = "modules_thick.builtin";
+
 	if (argc >= 2) {
 		int i;
 		for (i = 1; i < argc; i++) {
-			if(strcmp(argv[i], "--all-symbols") == 0)
+			if (strcmp(argv[i], "--all-symbols") == 0)
 				all_symbols = 1;
 			else if (strcmp(argv[i], "--absolute-percpu") == 0)
 				absolute_percpu = 1;
 			else if (strcmp(argv[i], "--base-relative") == 0)
 				base_relative = 1;
+			else if (strncmp(argv[i], "--builtin=", 10) == 0)
+				modules_builtin = &argv[i][10];
 			else
 				usage();
 		}
 	} else if (argc != 1)
 		usage();
 
+	read_modules(modules_builtin);
 	read_map(stdin);
 	if (absolute_percpu)
 		make_percpus_absolute();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 06495379fcd8..c07eae553c08 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -76,6 +76,7 @@ vmlinux_link()
 			--start-group				\
 			${KBUILD_VMLINUX_LIBS}			\
 			--end-group				\
+			-Map=.tmp_vmlinux.map			\
 			${@}"
 
 		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
@@ -88,6 +89,7 @@ vmlinux_link()
 			-Wl,--start-group			\
 			${KBUILD_VMLINUX_LIBS}			\
 			-Wl,--end-group				\
+			-Wl,-Map=.tmp_vmlinux.map		\
 			${@}"
 
 		${CC} ${CFLAGS_vmlinux}				\
@@ -138,6 +140,19 @@ kallsyms()
 	info KSYM ${2}
 	local kallsymopt;
 
+	# read the linker map to identify ranges of addresses:
+	#   - for each *.o file, report address, size, pathname
+	#       - most such lines will have four fields
+	#       - but sometimes there is a line break after the first field
+	#   - start reading at "Linker script and memory map"
+	#   - stop reading at ".brk"
+	${AWK} '
+	    /\.o$/ && start==1 { print $(NF-2), $(NF-1), $NF }
+	    /^Linker script and memory map/ { start = 1 }
+	    /^\.brk/ { exit(0) }
+	' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
+
+	# get kallsyms options
 	if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
 		kallsymopt="${kallsymopt} --all-symbols"
 	fi
@@ -150,12 +165,18 @@ kallsyms()
 		kallsymopt="${kallsymopt} --base-relative"
 	fi
 
+	# set up compilation
 	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
 		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
 
 	local afile="`basename ${2} .o`.S"
 
-	${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${afile}
+	# "nm -S" does not print symbol size when size is 0
+	# Therefore use awk to regularize the data:
+	#   - when there are only three fields, add an explicit "0"
+	#   - when there are already four fields, pass through as is
+	${NM} -n -S ${1} | ${AWK} 'NF==3 {print $1, 0, $2, $3}; NF==4' | \
+	    scripts/kallsyms ${kallsymopt} > ${afile}
 	${CC} ${aflags} -c -o ${2} ${afile}
 }
 
diff --git a/scripts/namespace.pl b/scripts/namespace.pl
index 1da7bca201a4..40f82b4c3a50 100755
--- a/scripts/namespace.pl
+++ b/scripts/namespace.pl
@@ -120,6 +120,12 @@ my %nameexception = (
     'kallsyms_addresses'=> 1,
     'kallsyms_offsets'	=> 1,
     'kallsyms_relative_base'=> 1,
+    'kallsyms_sizes'	=> 1,
+    'kallsyms_token_table'=> 1,
+    'kallsyms_token_index'=> 1,
+    'kallsyms_markers'	=> 1,
+    'kallsyms_modules'	=> 1,
+    'kallsyms_symbol_modules'=> 1,
     '__this_module'	=> 1,
     '_etext'		=> 1,
     '_edata'		=> 1,
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes
  2019-11-14 22:30 [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes eugene.loh
@ 2019-11-15 16:47 ` Steven Rostedt
  2019-11-15 17:26   ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2019-11-15 16:47 UTC (permalink / raw)
  To: eugene.loh
  Cc: corbet, yamada.masahiro, michal.lkml, jeyu, linux-kbuild, maz,
	songliubraving, tglx, jacob.e.keller, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman


[ Adding Linus, Andrew and Greg as this is something that needs higher
  level of approval for acceptance ]

On Thu, 14 Nov 2019 14:30:36 -0800
eugene.loh@oracle.com wrote:

> From: Kris Van Hees <kris.van.hees@oracle.com>
> 
> /proc/kallsyms is very useful for tracers and other tools that need to
> map kernel symbols to addresses.
> 
> It would be useful if:
> 
> - there were a mapping between kernel symbol and module name
>   that only changed when the kernel source code is changed.
>   This mapping should not change simply because a module
>   becomes built into the kernel.
> 
> - there were symbol size information to determine whether an
>   address is within a symbol or outside it, especially given
>   that there could be huge gaps between symbols.
> 
> Therefore:
> 
> - Introduce a new config parameter CONFIG_KALLMODSYMS.
> 
> - Generate a file "modules_thick.builtin" that maps from
>   the thin archives that make up built-in modules to their
>   constituent object files.
> 
> - Generate a linker map ".tmp_vmlinux.map", converting it
>   into ".tmp_vmlinux.ranges", mapping address ranges to
>   object files.
> 
> - Change scripts/kallsyms.c stdin from "nm" to "nm -S" so that
>   symbol sizes are available.  Have sort_symbols() incorporate
>   size info.  Emit size info in the *.s output file.  Skip the
>   .init.scratch section.
> 
> - If CONFIG_KALLMODSYMS, have scripts/kallsyms also read
>   "modules_thick.builtin" and ".tmp_vmlinux.ranges" to map
>   symbol addresses to built-in-module names and then write
>   those module names and per-symbol module information to
>   the *.s output file.
> 
> - Change module_get_kallsym() to return symbol size as well.
> 
> - In kernel/kallsyms:
>   - Use new, accurate symbol size information in get_symbol_pos(),
>     both to identify the correct symbol and to return correct size
>     information.
>   - Introduce a field builtin_module to say if the symbol is in a
>     built-in module.
>   - If CONFIG_KALLMODSYMS, produce a new /proc/kallmodsyms file,
>     akin to /proc/kallsyms but with built-in-module names and symbol
>     sizes.
> 
> The resulting /proc/kallmodsyms file looks like this:
>     ffffffff8b013d20 409 t pt_buffer_setup_aux
>     ffffffff8b014130 11f T intel_pt_interrupt
>     ffffffff8b014250 2d T cpu_emergency_stop_pt
>     ffffffff8b014280 13a t rapl_pmu_event_init      [intel_rapl_perf]

I personally can use this as it will help the function tracer be able
to filter on functions of modules that are builtin, where as we can not
do that today.

-- Steve


>     ffffffff8b0143c0 bb t rapl_event_update [intel_rapl_perf]
>     ffffffff8b014480 10 t rapl_pmu_event_read       [intel_rapl_perf]
>     ffffffff8b014490 a3 t rapl_cpu_offline  [intel_rapl_perf]
>     ffffffff8b014540 24 t __rapl_event_show [intel_rapl_perf]
>     ffffffff8b014570 f2 t rapl_pmu_event_stop       [intel_rapl_perf]
> This is emitted even if intel_rapl_perf is built into the kernel.
> 
> As with /proc/kallsyms, non-root usage produces addresses that are
> all zero;  symbol sizes are treated similarly.
> 
> Programs that consume /proc/kallmodsyms should note that unlike
> /proc/kallsyms, kernel symbols for built-in modules may appear
> interspersed with other symbols that are part of different modules or
> of the kernel.
> 
> Orabug: 29891866
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
> ---
>  .gitignore                  |   1 +
>  Documentation/dontdiff      |   1 +
>  Makefile                    |  41 ++-
>  include/linux/module.h      |   7 +-
>  init/Kconfig                |   8 +
>  kernel/kallsyms.c           | 145 +++++++---
>  kernel/module.c             |   4 +-
>  scripts/Makefile.modbuiltin |  20 +-
>  scripts/kallsyms.c          | 559 +++++++++++++++++++++++++++++++++++-
>  scripts/link-vmlinux.sh     |  23 +-
>  scripts/namespace.pl        |   6 +
>  11 files changed, 748 insertions(+), 67 deletions(-)
> 
> diff --git a/.gitignore b/.gitignore
> index 70580bdd352c..474491775a1a 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -47,6 +47,7 @@
>  Module.symvers
>  modules.builtin
>  modules.order
> +modules_thick.builtin
>  
>  #
>  # Top-level generic files
> diff --git a/Documentation/dontdiff b/Documentation/dontdiff
> index 9f4392876099..32ee05f91410 100644
> --- a/Documentation/dontdiff
> +++ b/Documentation/dontdiff
> @@ -180,6 +180,7 @@ modpost
>  modules.builtin
>  modules.builtin.modinfo
>  modules.order
> +modules_thick.builtin
>  modversions.h*
>  nconf
>  nconf-cfg
> diff --git a/Makefile b/Makefile
> index 49363caa7079..15b4e897cd3e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1077,7 +1077,7 @@ cmd_link-vmlinux =                                                 \
>  	$(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
>  	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
>  
> -vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
> +vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
>  	+$(call if_changed,link-vmlinux)
>  
>  targets := vmlinux
> @@ -1292,17 +1292,6 @@ modules: $(if $(KBUILD_BUILTIN),vmlinux) modules.order modules.builtin
>  modules.order: descend
>  	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
>  
> -modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
> -
> -modules.builtin: $(modbuiltin-dirs)
> -	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> -
> -PHONY += $(modbuiltin-dirs)
> -# tristate.conf is not included from this Makefile. Add it as a prerequisite
> -# here to make it self-healing in case somebody accidentally removes it.
> -$(modbuiltin-dirs): include/config/tristate.conf
> -	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@)
> -
>  # Target to prepare building external modules
>  PHONY += modules_prepare
>  modules_prepare: prepare
> @@ -1355,6 +1344,33 @@ modules modules_install:
>  
>  endif # CONFIG_MODULES
>  
> +# modules.builtin has a 'thick' form which maps from kernel modules (or rather
> +# the object file names they would have had had they not been built in) to their
> +# constituent object files: kallsyms uses this to determine which modules any
> +# given object file is part of.  (We cannot eliminate the slight redundancy
> +# here without double-expansion.)
> +
> +modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
> +
> +modbuiltin-thick-dirs := $(addprefix _modbuiltin_thick_, $(build-dirs))
> +
> +modules.builtin: $(modbuiltin-dirs)
> +	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> +
> +modules_thick.builtin: $(modbuiltin-thick-dirs)
> +	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> +
> +PHONY += $(modbuiltin-dirs) $(modbuiltin-thick-dirs)
> +# tristate.conf is not included from this Makefile. Add it as a prerequisite
> +# here to make it self-healing in case somebody accidentally removes it.
> +$(modbuiltin-dirs): include/config/tristate.conf
> +	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@) \
> +			builtin-file=modules.builtin
> +
> +$(modbuiltin-thick-dirs): include/config/tristate.conf
> +	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_thick_%,%,$@) \
> +			builtin-file=modules_thick.builtin
> +
>  ###
>  # Cleaning is done on three levels.
>  # make clean     Delete most generated files
> @@ -1674,6 +1690,7 @@ clean: $(clean-dirs)
>  		-o -name '*.asn1.[ch]' \
>  		-o -name '*.symtypes' -o -name 'modules.order' \
>  		-o -name modules.builtin -o -name '.tmp_*.o.*' \
> +		-o -name modules_thick.builtin \
>  		-o -name '*.c.[012]*.*' \
>  		-o -name '*.ll' \
>  		-o -name '*.gcno' \) -type f -print | xargs rm -f
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 6d20895e7739..b4f3f680a77d 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -592,7 +592,8 @@ bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
>  /* Returns 0 and fills in value, defined and namebuf, or -ERANGE if
>     symnum out of range. */
>  int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> -			char *name, char *module_name, int *exported);
> +		       char *name, char *module_name, unsigned long *size,
> +		       int *exported);
>  
>  /* Look for this name: can be of form module:name. */
>  unsigned long module_kallsyms_lookup_name(const char *name);
> @@ -774,8 +775,8 @@ static inline int lookup_module_symbol_attrs(unsigned long addr, unsigned long *
>  }
>  
>  static inline int module_get_kallsym(unsigned int symnum, unsigned long *value,
> -					char *type, char *name,
> -					char *module_name, int *exported)
> +				     char *type, char *name, char *module_name,
> +				     unsigned long *size, int *exported)
>  {
>  	return -ERANGE;
>  }
> diff --git a/init/Kconfig b/init/Kconfig
> index ff6108deaad2..15294af2a1e9 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1429,6 +1429,14 @@ config POSIX_TIMERS
>  
>  	  If unsure say y.
>  
> +config KALLMODSYMS
> +	default y
> +	bool "Enable support for /proc/kallmodsyms" if EXPERT
> +	depends on KALLSYMS
> +	help
> +	  This option enables the /proc/kallmodsyms file, which maps symbols
> +	  to addresses and their associated modules.
> +
>  config PRINTK
>  	default y
>  	bool "Enable support for printk" if EXPERT
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 136ce049c4ad..a51fdf73f9b3 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -32,6 +32,7 @@
>   */
>  extern const unsigned long kallsyms_addresses[] __weak;
>  extern const int kallsyms_offsets[] __weak;
> +extern const unsigned long kallsyms_sizes[] __weak;
>  extern const u8 kallsyms_names[] __weak;
>  
>  /*
> @@ -46,6 +47,8 @@ __attribute__((weak, section(".rodata")));
>  
>  extern const u8 kallsyms_token_table[] __weak;
>  extern const u16 kallsyms_token_index[] __weak;
> +extern const char kallsyms_modules[] __weak;
> +extern const u32 kallsyms_symbol_modules[] __weak;
>  
>  extern const unsigned int kallsyms_markers[] __weak;
>  
> @@ -195,12 +198,24 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
>  }
>  EXPORT_SYMBOL_GPL(kallsyms_on_each_symbol);
>  
> +/*
> + * The caller passes in an address, and we return an index to the symbol --
> + * potentially also size and offset information.
> + * But an address might map to multiple symbols because:
> + *   - some symbols might have zero size
> + *   - some symbols might be aliases of one another
> + *   - some symbols might span (encompass) others
> + * The symbols should already be ordered so that, for a particular address,
> + * we first have the zero-size ones, then the biggest, then the smallest.
> + * So we find the index by:
> + *   - finding the last symbol with the target address
> + *   - backing the index up so long as both the address and size are unchanged
> + */
>  static unsigned long get_symbol_pos(unsigned long addr,
>  				    unsigned long *symbolsize,
>  				    unsigned long *offset)
>  {
> -	unsigned long symbol_start = 0, symbol_end = 0;
> -	unsigned long i, low, high, mid;
> +	unsigned long low, high, mid;
>  
>  	/* This kernel should never had been booted. */
>  	if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
> @@ -221,36 +236,17 @@ static unsigned long get_symbol_pos(unsigned long addr,
>  	}
>  
>  	/*
> -	 * Search for the first aliased symbol. Aliased
> -	 * symbols are symbols with the same address.
> +	 * Search for the first aliased symbol.
>  	 */
> -	while (low && kallsyms_sym_address(low-1) == kallsyms_sym_address(low))
> +	while (low
> +	    && kallsyms_sym_address(low-1) == kallsyms_sym_address(low)
> +	    && kallsyms_sizes[low-1] == kallsyms_sizes[low])
>  		--low;
>  
> -	symbol_start = kallsyms_sym_address(low);
> -
> -	/* Search for next non-aliased symbol. */
> -	for (i = low + 1; i < kallsyms_num_syms; i++) {
> -		if (kallsyms_sym_address(i) > symbol_start) {
> -			symbol_end = kallsyms_sym_address(i);
> -			break;
> -		}
> -	}
> -
> -	/* If we found no next symbol, we use the end of the section. */
> -	if (!symbol_end) {
> -		if (is_kernel_inittext(addr))
> -			symbol_end = (unsigned long)_einittext;
> -		else if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
> -			symbol_end = (unsigned long)_end;
> -		else
> -			symbol_end = (unsigned long)_etext;
> -	}
> -
>  	if (symbolsize)
> -		*symbolsize = symbol_end - symbol_start;
> +		*symbolsize = kallsyms_sizes[low];
>  	if (offset)
> -		*offset = addr - symbol_start;
> +		*offset = addr - kallsyms_sym_address(low);
>  
>  	return low;
>  }
> @@ -270,6 +266,7 @@ int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize,
>  	return !!module_address_lookup(addr, symbolsize, offset, NULL, namebuf) ||
>  	       !!__bpf_address_lookup(addr, symbolsize, offset, namebuf);
>  }
> +EXPORT_SYMBOL_GPL(kallsyms_lookup_size_offset);
>  
>  /*
>   * Lookup an address
> @@ -440,9 +437,11 @@ struct kallsym_iter {
>  	loff_t pos_ftrace_mod_end;
>  	unsigned long value;
>  	unsigned int nameoff; /* If iterating in core kernel symbols. */
> +	unsigned long size;
>  	char type;
>  	char name[KSYM_NAME_LEN];
>  	char module_name[MODULE_NAME_LEN];
> +	int builtin_module;
>  	int exported;
>  	int show_value;
>  };
> @@ -472,7 +471,9 @@ static int get_ksymbol_mod(struct kallsym_iter *iter)
>  	int ret = module_get_kallsym(iter->pos - iter->pos_arch_end,
>  				     &iter->value, &iter->type,
>  				     iter->name, iter->module_name,
> -				     &iter->exported);
> +				     &iter->size, &iter->exported);
> +	iter->builtin_module = 0;
> +
>  	if (ret < 0) {
>  		iter->pos_mod_end = iter->pos;
>  		return 0;
> @@ -508,10 +509,22 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
>  static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
>  {
>  	unsigned off = iter->nameoff;
> +	u32 mod_index = 0;
> +
> +	if (kallsyms_symbol_modules)
> +		mod_index = kallsyms_symbol_modules[iter->pos];
>  
> -	iter->module_name[0] = '\0';
> +	if (mod_index == 0 || kallsyms_modules == NULL) {
> +		iter->module_name[0] = '\0';
> +		iter->builtin_module = 0;
> +	} else {
> +		strcpy(iter->module_name, &kallsyms_modules[mod_index]);
> +		iter->builtin_module = 1;
> +	}
> +	iter->exported = 0;
>  	iter->value = kallsyms_sym_address(iter->pos);
>  
> +	iter->size = kallsyms_sizes[iter->pos];
>  	iter->type = kallsyms_get_symbol_type(off);
>  
>  	off = kallsyms_expand_symbol(off, iter->name, ARRAY_SIZE(iter->name));
> @@ -556,7 +569,7 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
>  }
>  
>  /* Returns false if pos at or past end of file. */
> -static int update_iter(struct kallsym_iter *iter, loff_t pos)
> +int update_iter(struct kallsym_iter *iter, loff_t pos)
>  {
>  	/* Module symbols can be accessed randomly. */
>  	if (pos >= kallsyms_num_syms)
> @@ -592,18 +605,22 @@ static void s_stop(struct seq_file *m, void *p)
>  {
>  }
>  
> -static int s_show(struct seq_file *m, void *p)
> +static int s_show_internal(struct seq_file *m, void *p, int builtin_modules)
>  {
>  	void *value;
>  	struct kallsym_iter *iter = m->private;
> +	unsigned long size;
>  
>  	/* Some debugging symbols have no name.  Ignore them. */
>  	if (!iter->name[0])
>  		return 0;
>  
>  	value = iter->show_value ? (void *)iter->value : NULL;
> +	size = iter->show_value ? iter->size : 0;
>  
> -	if (iter->module_name[0]) {
> +	if ((iter->builtin_module == 0 && iter->module_name[0]) ||
> +	    (iter->builtin_module != 0 && iter->module_name[0] &&
> +	     builtin_modules != 0)) {
>  		char type;
>  
>  		/*
> @@ -612,14 +629,34 @@ static int s_show(struct seq_file *m, void *p)
>  		 */
>  		type = iter->exported ? toupper(iter->type) :
>  					tolower(iter->type);
> -		seq_printf(m, "%px %c %s\t[%s]\n", value,
> -			   type, iter->name, iter->module_name);
> -	} else
> +		if (builtin_modules)
> +			seq_printf(m, "%px %lx %c %s\t[%s]\n", value,
> +				   size, type, iter->name,
> +				   iter->module_name);
> +		else
> +			seq_printf(m, "%px %c %s\t[%s]\n", value,
> +				   type, iter->name, iter->module_name);
> +	} else if (builtin_modules)
> +		seq_printf(m, "%px %lx %c %s\n", value, size,
> +			   iter->type, iter->name);
> +	else
>  		seq_printf(m, "%px %c %s\n", value,
>  			   iter->type, iter->name);
>  	return 0;
>  }
>  
> +static int s_show(struct seq_file *m, void *p)
> +{
> +	return s_show_internal(m, p, 0);
> +}
> +
> +#ifdef CONFIG_KALLMODSYMS
> +static int s_mod_show(struct seq_file *m, void *p)
> +{
> +	return s_show_internal(m, p, 1);
> +}
> +#endif
> +
>  static const struct seq_operations kallsyms_op = {
>  	.start = s_start,
>  	.next = s_next,
> @@ -627,6 +664,15 @@ static const struct seq_operations kallsyms_op = {
>  	.show = s_show
>  };
>  
> +#ifdef CONFIG_KALLMODSYMS
> +static const struct seq_operations kallmodsyms_op = {
> +	.start = s_start,
> +	.next = s_next,
> +	.stop = s_stop,
> +	.show = s_mod_show
> +};
> +#endif
> +
>  static inline int kallsyms_for_perf(void)
>  {
>  #ifdef CONFIG_PERF_EVENTS
> @@ -661,7 +707,8 @@ int kallsyms_show_value(void)
>  	}
>  }
>  
> -static int kallsyms_open(struct inode *inode, struct file *file)
> +static int kallsyms_open_internal(struct inode *inode, struct file *file,
> +	const struct seq_operations *ops)
>  {
>  	/*
>  	 * We keep iterator in m->private, since normal case is to
> @@ -669,7 +716,7 @@ static int kallsyms_open(struct inode *inode, struct file *file)
>  	 * using get_symbol_offset for every symbol.
>  	 */
>  	struct kallsym_iter *iter;
> -	iter = __seq_open_private(file, &kallsyms_op, sizeof(*iter));
> +	iter = __seq_open_private(file, ops, sizeof(*iter));
>  	if (!iter)
>  		return -ENOMEM;
>  	reset_iter(iter, 0);
> @@ -678,6 +725,18 @@ static int kallsyms_open(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static int kallsyms_open(struct inode *inode, struct file *file)
> +{
> +	return kallsyms_open_internal(inode, file, &kallsyms_op);
> +}
> +
> +#ifdef CONFIG_KALLMODSYMS
> +static int kallmodsyms_open(struct inode *inode, struct file *file)
> +{
> +	return kallsyms_open_internal(inode, file, &kallmodsyms_op);
> +}
> +#endif
> +
>  #ifdef	CONFIG_KGDB_KDB
>  const char *kdb_walk_kallsyms(loff_t *pos)
>  {
> @@ -705,9 +764,21 @@ static const struct file_operations kallsyms_operations = {
>  	.release = seq_release_private,
>  };
>  
> +#ifdef CONFIG_KALLMODSYMS
> +static const struct file_operations kallmodsyms_operations = {
> +	.open = kallmodsyms_open,
> +	.read = seq_read,
> +	.llseek = seq_lseek,
> +	.release = seq_release_private,
> +};
> +#endif
> +
>  static int __init kallsyms_init(void)
>  {
>  	proc_create("kallsyms", 0444, NULL, &kallsyms_operations);
> +#ifdef CONFIG_KALLMODSYMS
> +	proc_create("kallmodsyms", 0444, NULL, &kallmodsyms_operations);
> +#endif
>  	return 0;
>  }
>  device_initcall(kallsyms_init);
> diff --git a/kernel/module.c b/kernel/module.c
> index ff2d7359a418..4a31e1e94acd 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -4184,7 +4184,8 @@ int lookup_module_symbol_attrs(unsigned long addr, unsigned long *size,
>  }
>  
>  int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> -			char *name, char *module_name, int *exported)
> +		       char *name, char *module_name, unsigned long *size,
> +		       int *exported)
>  {
>  	struct module *mod;
>  
> @@ -4203,6 +4204,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
>  			strlcpy(name, kallsyms_symbol_name(kallsyms, symnum), KSYM_NAME_LEN);
>  			strlcpy(module_name, mod->name, MODULE_NAME_LEN);
>  			*exported = is_exported(name, *value, mod);
> +			*size = kallsyms->symtab[symnum].st_size;
>  			preempt_enable();
>  			return 0;
>  		}
> diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
> index 7d4711b88656..06f31e58111e 100644
> --- a/scripts/Makefile.modbuiltin
> +++ b/scripts/Makefile.modbuiltin
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  # ==========================================================================
> -# Generating modules.builtin
> +# Generating modules.builtin and modules_thick.builtin
>  # ==========================================================================
>  
>  src := $(obj)
> @@ -30,19 +30,29 @@ __subdir-Y     := $(patsubst %/,%,$(filter %/, $(obj-Y)))
>  subdir-Y       += $(__subdir-Y)
>  subdir-ym      := $(sort $(subdir-y) $(subdir-Y) $(subdir-m))
>  subdir-ym      := $(addprefix $(obj)/,$(subdir-ym))
> -obj-Y          := $(addprefix $(obj)/,$(obj-Y))
> +pathobj-Y      := $(addprefix $(obj)/,$(obj-Y))
>  
>  modbuiltin-subdirs := $(patsubst %,%/modules.builtin, $(subdir-ym))
> -modbuiltin-mods    := $(filter %.ko, $(obj-Y:.o=.ko))
> +modbuiltin-mods    := $(filter %.ko, $(pathobj-Y:.o=.ko))
>  modbuiltin-target  := $(obj)/modules.builtin
> +modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
> +modthickbuiltin-target  := $(obj)/modules_thick.builtin
>  
> -__modbuiltin: $(modbuiltin-target) $(subdir-ym)
> +__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
>  	@:
>  
>  $(modbuiltin-target): $(subdir-ym) FORCE
>  	$(Q)(for m in $(modbuiltin-mods); do echo $$m; done;	\
>  	cat /dev/null $(modbuiltin-subdirs)) > $@
>  
> +$(modthickbuiltin-target): $(subdir-ym) FORCE
> +	$(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
> +		printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
> +		printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
> +			$($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
> +		printf "\n" >> $@; ) \
> +	cat /dev/null $(modthickbuiltin-subdirs) >> $@;
> +
>  PHONY += FORCE
>  
>  FORCE:
> @@ -52,6 +62,6 @@ FORCE:
>  
>  PHONY += $(subdir-ym)
>  $(subdir-ym):
> -	$(Q)$(MAKE) $(modbuiltin)=$@
> +	$(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
>  
>  .PHONY: $(PHONY)
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index ae6504d07fd6..e88653b00f36 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -5,7 +5,10 @@
>   * This software may be used and distributed according to the terms
>   * of the GNU General Public License, incorporated herein by reference.
>   *
> - * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
> + * Usage: nm -n -S vmlinux
> + *        | scripts/kallsyms [--all-symbols] [--absolute-percpu]
> + *             [--base-relative] [--builtin=modules_thick.builtin]
> + *        > symbols.S
>   *
>   *      Table compression uses all the unused char codes on the symbols and
>   *  maps these to the most used substrings (tokens). For instance, it might
> @@ -18,12 +21,19 @@
>   *
>   */
>  
> +#define _GNU_SOURCE 1
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <ctype.h>
>  #include <limits.h>
>  
> +#include "../include/generated/autoconf.h"
> +
> +#ifdef CONFIG_KALLMODSYMS
> +#include <errno.h>
> +#endif
> +
>  #ifndef ARRAY_SIZE
>  #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
>  #endif
> @@ -32,10 +42,14 @@
>  
>  struct sym_entry {
>  	unsigned long long addr;
> +	unsigned long long size;
>  	unsigned int len;
>  	unsigned int start_pos;
>  	unsigned char *sym;
>  	unsigned int percpu_absolute;
> +#ifdef CONFIG_KALLMODSYMS
> +	unsigned int module;
> +#endif
>  };
>  
>  struct addr_range {
> @@ -68,11 +82,118 @@ static int token_profit[0x10000];
>  static unsigned char best_table[256][2];
>  static unsigned char best_table_len[256];
>  
> +#ifdef CONFIG_KALLMODSYMS
> +static unsigned int strhash(const char *s)
> +{
> +	/* fnv32 hash */
> +	unsigned int hash = 2166136261U;
> +
> +	for (; *s; s++)
> +		hash = (hash ^ *s) * 0x01000193;
> +	return hash;
> +}
> +
> +#define OBJ2MOD_BITS 10
> +#define OBJ2MOD_N (1 << OBJ2MOD_BITS)
> +#define OBJ2MOD_MASK (OBJ2MOD_N - 1)
> +struct obj2mod_elem {
> +	char *obj;
> +	int mod;
> +	struct obj2mod_elem *next;
> +};
> +
> +static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
> +
> +static void obj2mod_init(void)
> +{
> +	memset(obj2mod, 0, sizeof(obj2mod));
> +}
> +
> +static void obj2mod_put(char *obj, int mod)
> +{
> +	int i = strhash(obj) & OBJ2MOD_MASK;
> +	struct obj2mod_elem *elem = malloc(sizeof(struct obj2mod_elem));
> +
> +	if (!elem) {
> +		fprintf(stderr, "kallsyms: out of memory\n");
> +		exit(1);
> +	}
> +
> +	elem->obj = strdup(obj);
> +	if (!elem->obj) {
> +		fprintf(stderr, "kallsyms: out of memory\n");
> +		free(elem);
> +		exit(1);
> +	}
> +
> +	elem->mod = mod;
> +	elem->next = obj2mod[i];
> +	obj2mod[i] = elem;
> +}
> +
> +static int obj2mod_get(char *obj)
> +{
> +	int i = strhash(obj) & OBJ2MOD_MASK;
> +	struct obj2mod_elem *elem;
> +
> +	for (elem = obj2mod[i]; elem; elem = elem->next)
> +		if (strcmp(elem->obj, obj) == 0)
> +			return elem->mod;
> +	return 0;
> +}
> +
> +static void obj2mod_free(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < OBJ2MOD_N; i++) {
> +		struct obj2mod_elem *elem = obj2mod[i];
> +		struct obj2mod_elem *next;
> +
> +		while (elem) {
> +			next = elem->next;
> +			free(elem->obj);
> +			free(elem);
> +			elem = next;
> +		}
> +	}
> +}
> +
> +/*
> + * The builtin module names.  The "offset" points to the name as if
> + * all builtin module names were concatenated to a single string.
> + */
> +static unsigned int builtin_module_size;	/* number allocated */
> +static unsigned int builtin_module_len;		/* number assigned */
> +static char **builtin_modules;			/* array of module names */
> +static unsigned int *builtin_module_offsets;	/* offset */
> +
> +/*
> + * modules_thick.builtin iteration state.
> + */
> +struct modules_thick_iter {
> +	FILE *f;
> +	char *line;
> +	size_t line_size;
> +};
> +
> +/*
> + * An ordered list of address ranges and how they map to built-in modules.
> + */
> +struct addrmap_entry {
> +	unsigned long long addr;
> +	unsigned long long size;
> +	unsigned int module;
> +};
> +static struct addrmap_entry *addrmap;
> +static int addrmap_num, addrmap_alloced;
> +#endif
>  
>  static void usage(void)
>  {
> -	fprintf(stderr, "Usage: kallsyms [--all-symbols] "
> -			"[--base-relative] < in.map > out.S\n");
> +	fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
> +			"[--base-relative] [--builtin=modules_thick.builtin] "
> +			"< nm_vmlinux.out > symbols.S\n");
>  	exit(1);
>  }
>  
> @@ -107,13 +228,32 @@ static int check_symbol_range(const char *sym, unsigned long long addr,
>  	return 1;
>  }
>  
> +#ifdef CONFIG_KALLMODSYMS
> +static int addrmap_compare(const void *keyp, const void *rangep)
> +{
> +	unsigned long long addr = *((const unsigned long long *)keyp);
> +	const struct addrmap_entry *range = (const struct addrmap_entry *)rangep;
> +
> +	if (addr < range->addr)
> +		return -1;
> +	if (addr < range->addr + range->size)
> +		return 0;
> +	return 1;
> +}
> +#endif
> +
>  static int read_symbol(FILE *in, struct sym_entry *s)
>  {
>  	char sym[500], stype;
> -	int rc;
> +	int rc, init_scratch = 0;
> +#ifdef CONFIG_KALLMODSYMS
> +	struct addrmap_entry *range;
> +#endif
>  
> -	rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, sym);
> -	if (rc != 3) {
> +read_another:
> +	rc = fscanf(in, "%llx %llx %c %499s\n",
> +		    &s->addr, &s->size, &stype, sym);
> +	if (rc != 4) {
>  		if (rc != EOF && fgets(sym, 500, in) == NULL)
>  			fprintf(stderr, "Read error or end of file.\n");
>  		return -1;
> @@ -125,6 +265,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>  		return -1;
>  	}
>  
> +	/* skip the .init.scratch section */
> +	if (strcmp(sym, "__init_scratch_end") == 0) {
> +		init_scratch = 0;
> +		goto read_another;
> +	}
> +	if (strcmp(sym, "__init_scratch_begin") == 0)
> +		init_scratch = 1;
> +	if (init_scratch)
> +		goto read_another;
> +
>  	/* Ignore most absolute/undefined (?) symbols. */
>  	if (strcmp(sym, "_text") == 0)
>  		_text = s->addr;
> @@ -154,6 +304,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>  	else if (!strncmp(sym, ".LASANPC", 8))
>  		return -1;
>  
> +#ifdef CONFIG_KALLMODSYMS
> +	/* look up the builtin module this is part of (if any) */
> +	range = (struct addrmap_entry *) bsearch(&s->addr,
> +	    addrmap, addrmap_num, sizeof(*addrmap), &addrmap_compare);
> +	if (range)
> +		s->module = builtin_module_offsets[range->module];
> +	else
> +		s->module = 0;
> +#endif
> +
>  	/* include the type field in the symbol name, so that it gets
>  	 * compressed together */
>  	s->len = strlen(sym) + 1;
> @@ -201,11 +361,14 @@ static int symbol_valid(struct sym_entry *s)
>  		"kallsyms_addresses",
>  		"kallsyms_offsets",
>  		"kallsyms_relative_base",
> +		"kallsyms_sizes",
>  		"kallsyms_num_syms",
>  		"kallsyms_names",
>  		"kallsyms_markers",
>  		"kallsyms_token_table",
>  		"kallsyms_token_index",
> +		"kallsyms_symbol_modules",
> +		"kallsyms_modules",
>  
>  	/* Exclude linker generated symbols which vary between passes */
>  		"_SDA_BASE_",		/* ppc */
> @@ -405,6 +568,11 @@ static void write_src(void)
>  		printf("\n");
>  	}
>  
> +	output_label("kallsyms_sizes");
> +	for (i = 0; i < table_cnt; i++)
> +		printf("\tPTR\t%#llx\n", table[i].size);
> +	printf("\n");
> +
>  	output_label("kallsyms_num_syms");
>  	printf("\t.long\t%u\n", table_cnt);
>  	printf("\n");
> @@ -454,8 +622,22 @@ static void write_src(void)
>  	for (i = 0; i < 256; i++)
>  		printf("\t.short\t%d\n", best_idx[i]);
>  	printf("\n");
> -}
>  
> +#ifdef CONFIG_KALLMODSYMS
> +	output_label("kallsyms_modules");
> +	for (i = 0; i < builtin_module_len; i++)
> +		printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
> +	printf("\n");
> +
> +	for (i = 0; i < builtin_module_len; i++)
> +		free(builtin_modules[i]);
> +
> +	output_label("kallsyms_symbol_modules");
> +	for (i = 0; i < table_cnt; i++)
> +		printf("\t.int\t%d\n", table[i].module);
> +	printf("\n");
> +#endif
> +}
>  
>  /* table lookup compression functions */
>  
> @@ -683,6 +865,18 @@ static int compare_symbols(const void *a, const void *b)
>  	if (sa->addr < sb->addr)
>  		return -1;
>  
> +	/* zero-size markers before nonzero-size symbols */
> +	if (sa->size > 0 && sb->size == 0)
> +		return 1;
> +	if (sa->size == 0 && sb->size > 0)
> +		return -1;
> +
> +	/* sort by size (large size preceding symbols it encompasses) */
> +	if (sa->size < sb->size)
> +		return 1;
> +	if (sa->size > sb->size)
> +		return -1;
> +
>  	/* sort by "weakness" type */
>  	wa = (sa->sym[0] == 'w') || (sa->sym[0] == 'W');
>  	wb = (sb->sym[0] == 'w') || (sb->sym[0] == 'W');
> @@ -738,23 +932,372 @@ static void record_relative_base(void)
>  			relative_base = table[i].addr;
>  }
>  
> +#ifdef CONFIG_KALLMODSYMS
> +/*
> + * Read a modules_thick.builtin file.
> + */
> +
> +/*
> + * Construct a modules_thick.builtin iterator.
> + */
> +static struct modules_thick_iter *
> +modules_thick_iter_new(const char *modules_thick_file)
> +{
> +	struct modules_thick_iter *i;
> +
> +	i = calloc(1, sizeof(struct modules_thick_iter));
> +	if (i == NULL)
> +		return NULL;
> +
> +	i->f = fopen(modules_thick_file, "r");
> +
> +	if (i->f == NULL) {
> +		fprintf(stderr, "Cannot open builtin module file %s: %s\n",
> +			modules_thick_file, strerror(errno));
> +		return NULL;
> +	}
> +
> +	return i;
> +}
> +
> +/*
> + * Iterate, returning a new null-terminated array of object file names, and a
> + * new dynamically-allocated module name.  (The module name passed in is freed.)
> + *
> + * The array of object file names should be freed by the caller: the strings it
> + * points to are owned by the iterator, and should not be freed.
> + */
> +static char ** __attribute__((__nonnull__))
> +modules_thick_iter_next(struct modules_thick_iter *i, char **module_name)
> +{
> +	size_t npaths = 1;
> +	char **module_paths;
> +	char *last_slash;
> +	char *last_dot;
> +	char *trailing_linefeed;
> +	char *object_name = i->line;
> +	char *dash;
> +	int composite = 0;
> +
> +	/*
> +	 * Read in all module entries, computing the suffixless, pathless name
> +	 * of the module and building the next arrayful of object file names for
> +	 * return.
> +	 *
> +	 * Modules can consist of multiple files: in this case, the portion
> +	 * before the colon is the path to the module (as before): the portion
> +	 * after the colon is a space-separated list of files that should be *
> +	 * considered part of this module.  In this case, the portion before the
> +	 * name is an "object file" that does not actually exist: it is merged
> +	 * into built-in.a without ever being written out.
> +	 *
> +	 * All module names have - translated to _, to match what is done to the
> +	 * names of the same things when built as modules.
> +	 */
> +
> +	/*
> +	 * Reinvocation of exhausted iterator. Return NULL, once.
> +	 */
> +retry:
> +	if (getline(&i->line, &i->line_size, i->f) < 0) {
> +		if (ferror(i->f)) {
> +			fprintf(stderr,
> +				"Error reading from modules_thick file: %s\n",
> +				strerror(errno));
> +			exit(1);
> +		}
> +		rewind(i->f);
> +		return NULL;
> +	}
> +
> +	if (i->line[0] == '\0')
> +		goto retry;
> +
> +	/*
> +	 * Slice the line in two at the colon, if any.  If there is anything
> +	 * past the ': ', this is a composite module.  (We allow for no colon
> +	 * for robustness, even though one should always be present.)
> +	 */
> +	if (strchr(i->line, ':') != NULL) {
> +		char *name_start;
> +
> +		object_name = strchr(i->line, ':');
> +		*object_name = '\0';
> +		object_name++;
> +		name_start = object_name + strspn(object_name, " \n");
> +		if (*name_start != '\0') {
> +			composite = 1;
> +			object_name = name_start;
> +		}
> +	}
> +
> +	/*
> +	 * Figure out the module name.
> +	 */
> +	last_slash = strrchr(i->line, '/');
> +	last_slash = (!last_slash) ? i->line :
> +		last_slash + 1;
> +	free(*module_name);
> +	*module_name = strdup(last_slash);
> +	dash = *module_name;
> +
> +	while (dash != NULL) {
> +		dash = strchr(dash, '-');
> +		if (dash != NULL)
> +			*dash = '_';
> +	}
> +
> +	last_dot = strrchr(*module_name, '.');
> +	if (last_dot != NULL)
> +		*last_dot = '\0';
> +
> +	trailing_linefeed = strchr(object_name, '\n');
> +	if (trailing_linefeed != NULL)
> +		*trailing_linefeed = '\0';
> +
> +	/*
> +	 * Multifile separator? Object file names explicitly stated:
> +	 * slice them up and shuffle them in.
> +	 *
> +	 * The array size may be an overestimate if any object file
> +	 * names start or end with spaces (very unlikely) but cannot be
> +	 * an underestimate.  (Check for it anyway.)
> +	 */
> +	if (composite) {
> +		char *one_object;
> +
> +		for (npaths = 0, one_object = object_name;
> +		     one_object != NULL;
> +		     npaths++, one_object = strchr(one_object + 1, ' '))
> +			;
> +	}
> +
> +	module_paths = malloc((npaths + 1) * sizeof(char *));
> +	if (!module_paths) {
> +		fprintf(stderr, "%s: out of memory on module %s\n", __func__,
> +			*module_name);
> +		exit(1);
> +	}
> +
> +	if (composite) {
> +		char *one_object;
> +		size_t i = 0;
> +
> +		while ((one_object = strsep(&object_name, " ")) != NULL) {
> +			if (i >= npaths) {
> +				fprintf(stderr, "%s: npaths overflow on module "
> +					"%s: this is a bug.\n", __func__,
> +					*module_name);
> +				exit(1);
> +			}
> +
> +			module_paths[i++] = one_object;
> +		}
> +	} else
> +		module_paths[0] = i->line;	/* untransformed module name */
> +
> +	module_paths[npaths] = NULL;
> +
> +	return module_paths;
> +}
> +
> +/*
> + * Free an iterator. Can be called while iteration is underway, so even
> + * state that is freed at the end of iteration must be freed here too.
> + */
> +static void
> +modules_thick_iter_free(struct modules_thick_iter *i)
> +{
> +	if (i == NULL)
> +		return;
> +	fclose(i->f);
> +	free(i->line);
> +	free(i);
> +}
> +
> +/*
> + * Expand the builtin modules list.
> + */
> +static void expand_builtin_modules(void)
> +{
> +	builtin_module_size += 50;
> +
> +	builtin_modules = realloc(builtin_modules,
> +				  sizeof(*builtin_modules) *
> +				  builtin_module_size);
> +	builtin_module_offsets = realloc(builtin_module_offsets,
> +					 sizeof(*builtin_module_offsets) *
> +					 builtin_module_size);
> +
> +	if (!builtin_modules || !builtin_module_offsets) {
> +		fprintf(stderr, "kallsyms failure: out of memory.\n");
> +		exit(EXIT_FAILURE);
> +	}
> +}
> +
> +/*
> + * Add a single built-in module (possibly composed of many files) to the
> + * modules list.  Take the offset of the current module and return it
> + * (purely for simplicity's sake in the caller).
> + */
> +static size_t add_builtin_module(const char *module_name, char **module_paths,
> +				 size_t offset)
> +{
> +	/* map the module's object paths to the module offset */
> +	while (*module_paths) {
> +		obj2mod_put(*module_paths, builtin_module_len);
> +		module_paths++;
> +	}
> +
> +	/* add the module name */
> +	if (builtin_module_size <= builtin_module_len)
> +		expand_builtin_modules();
> +	builtin_modules[builtin_module_len] = strdup(module_name);
> +	builtin_module_offsets[builtin_module_len] = offset;
> +	builtin_module_len++;
> +
> +	return (offset + strlen(module_name) + 1);
> +}
> +
> +/*
> + * Read the linker map.
> + */
> +static void read_linker_map(void)
> +{
> +	unsigned long long addr, size;
> +	char obj[PATH_MAX+1];
> +	FILE *f = fopen(".tmp_vmlinux.ranges", "r");
> +
> +	if (!f) {
> +		fprintf(stderr, "Cannot open '.tmp_vmlinux.ranges'.\n");
> +		exit(1);
> +	}
> +
> +	addrmap_num = 0;
> +	addrmap_alloced = 4096;
> +	addrmap = malloc(sizeof(*addrmap) * addrmap_alloced);
> +	if (!addrmap)
> +		goto oom;
> +
> +	/*
> +	 * For each address range (addr,size) and object, add to addrmap
> +	 * the range and the built-in module to which the object maps.
> +	 */
> +	while (fscanf(f, "%llx %llx %s\n", &addr, &size, obj) == 3) {
> +		int m = obj2mod_get(obj);
> +
> +		if (addr == 0 || size == 0 || m == 0)
> +			continue;
> +
> +		if (addrmap_num >= addrmap_alloced) {
> +			addrmap_alloced *= 2;
> +			addrmap = realloc(addrmap,
> +			    sizeof(*addrmap) * addrmap_alloced);
> +			if (!addrmap)
> +				goto oom;
> +		}
> +
> +		addrmap[addrmap_num].addr = addr;
> +		addrmap[addrmap_num].size = size;
> +		addrmap[addrmap_num].module = m;
> +		addrmap_num++;
> +	}
> +	fclose(f);
> +	return;
> +
> +oom:
> +	fprintf(stderr, "kallsyms: out of memory\n");
> +	exit(1);
> +}
> +
> +/*
> + * Read "modules_thick.builtin" (the list of built-in modules).  Construct:
> + *   - builtin_modules: array of built-in-module names
> + *   - builtin_module_offsets: array of offsets that will later be
> + *       used to access a concatenated list of built-in-module names
> + *   - obj2mod: a temporary, many-to-one, hash mapping
> + *       from object-file paths to built-in-module names
> + * Read ".tmp_vmlinux.ranges" (the linker map).
> + *   - addrmap[] maps address ranges to built-in module names (using obj2mod)
> + */
> +static void read_modules(const char *modules_builtin)
> +{
> +	struct modules_thick_iter *i;
> +	size_t offset = 0;
> +	char *module_name = NULL;
> +	char **module_paths;
> +
> +	obj2mod_init();
> +
> +	/*
> +	 * builtin_modules[0] is a null entry signifying a symbol that cannot be
> +	 * modular.
> +	 */
> +	builtin_module_size = 50;
> +	builtin_modules = malloc(sizeof(*builtin_modules) *
> +				 builtin_module_size);
> +	builtin_module_offsets = malloc(sizeof(*builtin_module_offsets) *
> +				 builtin_module_size);
> +	if (!builtin_modules || !builtin_module_offsets) {
> +		fprintf(stderr, "kallsyms: out of memory\n");
> +		exit(1);
> +	}
> +	builtin_modules[0] = strdup("");
> +	builtin_module_offsets[0] = 0;
> +	builtin_module_len = 1;
> +	offset++;
> +
> +	/*
> +	 * Iterate over all modules in modules_thick.builtin and add each.
> +	 */
> +	i = modules_thick_iter_new(modules_builtin);
> +	if (i == NULL) {
> +		fprintf(stderr, "Cannot iterate over builtin modules.\n");
> +		exit(1);
> +	}
> +
> +	while ((module_paths = modules_thick_iter_next(i, &module_name))) {
> +		offset = add_builtin_module(module_name, module_paths, offset);
> +		free(module_paths);
> +		module_paths = NULL;
> +	}
> +
> +	free(module_name);
> +	modules_thick_iter_free(i);
> +
> +	/*
> +	 * Read linker map.
> +	 */
> +	read_linker_map();
> +
> +	obj2mod_free();
> +}
> +#else
> +static void read_modules(const char *unused) {}
> +#endif /* CONFIG_KALLMODSYMS */
> +
>  int main(int argc, char **argv)
>  {
> +	const char *modules_builtin = "modules_thick.builtin";
> +
>  	if (argc >= 2) {
>  		int i;
>  		for (i = 1; i < argc; i++) {
> -			if(strcmp(argv[i], "--all-symbols") == 0)
> +			if (strcmp(argv[i], "--all-symbols") == 0)
>  				all_symbols = 1;
>  			else if (strcmp(argv[i], "--absolute-percpu") == 0)
>  				absolute_percpu = 1;
>  			else if (strcmp(argv[i], "--base-relative") == 0)
>  				base_relative = 1;
> +			else if (strncmp(argv[i], "--builtin=", 10) == 0)
> +				modules_builtin = &argv[i][10];
>  			else
>  				usage();
>  		}
>  	} else if (argc != 1)
>  		usage();
>  
> +	read_modules(modules_builtin);
>  	read_map(stdin);
>  	if (absolute_percpu)
>  		make_percpus_absolute();
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 06495379fcd8..c07eae553c08 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -76,6 +76,7 @@ vmlinux_link()
>  			--start-group				\
>  			${KBUILD_VMLINUX_LIBS}			\
>  			--end-group				\
> +			-Map=.tmp_vmlinux.map			\
>  			${@}"
>  
>  		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
> @@ -88,6 +89,7 @@ vmlinux_link()
>  			-Wl,--start-group			\
>  			${KBUILD_VMLINUX_LIBS}			\
>  			-Wl,--end-group				\
> +			-Wl,-Map=.tmp_vmlinux.map		\
>  			${@}"
>  
>  		${CC} ${CFLAGS_vmlinux}				\
> @@ -138,6 +140,19 @@ kallsyms()
>  	info KSYM ${2}
>  	local kallsymopt;
>  
> +	# read the linker map to identify ranges of addresses:
> +	#   - for each *.o file, report address, size, pathname
> +	#       - most such lines will have four fields
> +	#       - but sometimes there is a line break after the first field
> +	#   - start reading at "Linker script and memory map"
> +	#   - stop reading at ".brk"
> +	${AWK} '
> +	    /\.o$/ && start==1 { print $(NF-2), $(NF-1), $NF }
> +	    /^Linker script and memory map/ { start = 1 }
> +	    /^\.brk/ { exit(0) }
> +	' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
> +
> +	# get kallsyms options
>  	if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
>  		kallsymopt="${kallsymopt} --all-symbols"
>  	fi
> @@ -150,12 +165,18 @@ kallsyms()
>  		kallsymopt="${kallsymopt} --base-relative"
>  	fi
>  
> +	# set up compilation
>  	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
>  		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
>  
>  	local afile="`basename ${2} .o`.S"
>  
> -	${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${afile}
> +	# "nm -S" does not print symbol size when size is 0
> +	# Therefore use awk to regularize the data:
> +	#   - when there are only three fields, add an explicit "0"
> +	#   - when there are already four fields, pass through as is
> +	${NM} -n -S ${1} | ${AWK} 'NF==3 {print $1, 0, $2, $3}; NF==4' | \
> +	    scripts/kallsyms ${kallsymopt} > ${afile}
>  	${CC} ${aflags} -c -o ${2} ${afile}
>  }
>  
> diff --git a/scripts/namespace.pl b/scripts/namespace.pl
> index 1da7bca201a4..40f82b4c3a50 100755
> --- a/scripts/namespace.pl
> +++ b/scripts/namespace.pl
> @@ -120,6 +120,12 @@ my %nameexception = (
>      'kallsyms_addresses'=> 1,  
>      'kallsyms_offsets'	=> 1,
>      'kallsyms_relative_base'=> 1,  
> +    'kallsyms_sizes'	=> 1,
> +    'kallsyms_token_table'=> 1,
> +    'kallsyms_token_index'=> 1,
> +    'kallsyms_markers'	=> 1,
> +    'kallsyms_modules'	=> 1,
> +    'kallsyms_symbol_modules'=> 1,
>      '__this_module'	=> 1,
>      '_etext'		=> 1,
>      '_edata'		=> 1,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes
  2019-11-15 16:47 ` Steven Rostedt
@ 2019-11-15 17:26   ` Linus Torvalds
  2019-11-16 17:58     ` Eugene Loh
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2019-11-15 17:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: eugene.loh, Jonathan Corbet, Masahiro Yamada, Michal Marek,
	Jessica Yu, Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, jacob.e.keller, Andrew Morton,
	Greg Kroah-Hartman

On Fri, Nov 15, 2019 at 8:47 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> [ Adding Linus, Andrew and Greg as this is something that needs higher
>   level of approval for acceptance ]

Is a new config option even needed?

Honestly, I think the "add the module name even when built-in" could
be done unconditionally with no backwards compatibility issues.  It's
not a new syntax, and shouldn't break anything, and looks like a
useful extension of the existing format - and one that existing tools
already have to be aware of.

The size thing is obviously different, but I find that much more
questionable. What's the use-case? If it's just about the occasional
big jumps, then adding a dummy entry for those (rare) cases sounds
like a much better option, and wouldn't break any existing code.

I don't see any upside at all in showing the "exact" function size
instead of a size rounded up to the usual 16 bytes or whatever.
Padding is real, and doesn't change anything.

              Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes
  2019-11-15 17:26   ` Linus Torvalds
@ 2019-11-16 17:58     ` Eugene Loh
  2019-11-17  0:32       ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Eugene Loh @ 2019-11-16 17:58 UTC (permalink / raw)
  To: Linus Torvalds, Steven Rostedt
  Cc: Jonathan Corbet, Masahiro Yamada, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, jacob.e.keller, Andrew Morton,
	Greg Kroah-Hartman

On 11/15/2019 09:26 AM, Linus Torvalds wrote:

> Is a new config option even needed?

I suppose not.  The original motivations for this option no longer 
matter.  I can amend the patch if that makes sense.

> Honestly, I think the "add the module name even when built-in" could
> be done unconditionally with no backwards compatibility issues.  It's
> not a new syntax, and shouldn't break anything, and looks like a
> useful extension of the existing format - and one that existing tools
> already have to be aware of.
>
> The size thing is obviously different, but I find that much more
> questionable. What's the use-case? If it's just about the occasional
> big jumps, then adding a dummy entry for those (rare) cases sounds
> like a much better option, and wouldn't break any existing code.
>
> I don't see any upside at all in showing the "exact" function size
> instead of a size rounded up to the usual 16 bytes or whatever.
> Padding is real, and doesn't change anything.

Since there are very many gaps, adding dummy entries makes sense only 
for "big" jumps.  I don't know where one would want to draw the line for 
"big."  In any case, to identify such gaps, one would still need the "nm 
-S" information provided by this patch.

Meanwhile, there are some symbols that encompass others.  E.g.,
     ffffffff8a001000 1000 T hypercall_page
     ffffffff8a001000 20 t xen_hypercall_set_trap_table
     ffffffff8a001020 20 t xen_hypercall_mmu_update
     [...]
     ffffffff8a0016c0 20 t xen_hypercall_arch_6
     ffffffff8a0016e0 20 t xen_hypercall_arch_7
     ffffffff8a002000 8 T __startup_secondary_64
Something between ffffffff8a001700 and ffffffff8a002000 maps to 
hypercall_page, but you'd never know that from the "up until the next 
address" approach.

Symbols of zero size pose interesting questions.  E.g., do we want these 
zero-size symbols to have nonzero size?
     ffffffff8ac00e00 19 T __x86_indirect_thunk_r15
     ffffffff8ac00e19 0 T _etext
     ffffffff8ac00e19 0 T __indirect_thunk_end
     ffffffff8ac00e1c 0 R __start_notes
     ffffffff8ac00fd8 18 r _note_55
     ffffffff8ac01008 0 R __stop_notes

Size information also helps us distinguish symbol ranges when a 
zero-size symbol has the same address as another symbol.  E.g.,
     ffffffff8a9593e0 0 T __sched_text_start
     ffffffff8a9593e0 6b9 t __schedule
     ffffffff8a959aa0 9a T schedule
If we have an address above ffffffff8a9593e0, we see it belongs to 
__schedule.   Without size information, we might have assigned such an 
address to __sched_text_start.

Or how would one distinguish the symbols at ffffffff8b37d840 without 
size info?
     ffffffff8b37a870 0 R __start___param
     ffffffff8b37a870 28 r __param_initcall_debug
     ffffffff8b37a898 28 r __param_action
     [...]
     ffffffff8b37d7f0 28 r __param_disable_ipv6
     ffffffff8b37d818 28 r __param_disable
     ffffffff8b37d840 0 r __stop___param
     ffffffff8b37d840 0 r __start___modver   [configfs]
     ffffffff8b37d840 8 r __modver_attr      [configfs]

There are also symbols of zero size that mark the end of a segment and 
share the same address as the first symbol in the next segment. This 
means that the "up to the next symbol" algorithm leads to some 
confusion, e.g., overlapping segments.

Anyhow, given that we know real symbol sizes, why not just use them?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes
  2019-11-16 17:58     ` Eugene Loh
@ 2019-11-17  0:32       ` Linus Torvalds
  2019-11-19 22:42         ` [PATCH v2] kallsyms: add names of built-in modules eugene.loh
  2019-11-20  0:11         ` [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes Eugene Loh
  0 siblings, 2 replies; 21+ messages in thread
From: Linus Torvalds @ 2019-11-17  0:32 UTC (permalink / raw)
  To: Eugene Loh
  Cc: Steven Rostedt, Jonathan Corbet, Masahiro Yamada, Michal Marek,
	Jessica Yu, Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, jacob.e.keller, Andrew Morton,
	Greg Kroah-Hartman

On Sat, Nov 16, 2019 at 9:58 AM Eugene Loh <eugene.loh@oracle.com> wrote:
>
> Since there are very many gaps, adding dummy entries makes sense only
> for "big" jumps.  I don't know where one would want to draw the line for
> "big."  In any case, to identify such gaps, one would still need the "nm
> -S" information provided by this patch.

Sure. You can have some kind of error estimate where if the size of
the thing is much smalle rthan the gap, add the fake padding object.

But it "much smaller than" would likely be in the area of page
alignment, not "next function was aligned to 64-byte boundary" kind of
small fixups.

Honestly, if somebody needs the real size, why aren't they just using
the original image?

> Meanwhile, there are some symbols that encompass others.

Yeah, I don't think this is at all worth worrying about. Again, if you
want that kind of information, you should use the original vmlinux
image, not think that "hey, /proc should give perfect information".

The /proc interface should be a rought and convenient baseline, but I
don't think it's at all interesting to try to make it perfect or even
all that clever.

Most of your questions boil down to "just use vmlinux" instead. If you
_really_ care about things like "one symbol can encompass many
sub-symbols", you shouldn't look at /proc/kallsyms.

So I think we could improve on /proc/kallsuyms, but we should do it
with the aim being "just make it incrementally better", not some
"let's solve big problems". The big problems are already solved by
just looking at the vmlinux file.

For example, I think the whole "include which module the symbol comes
from" is a nice improved quality thing even if the module happens to
be built-in. If that is easy to do, then we should just do it, and it
allows people to see interesting information and might make it useful
to (for example) have tools like profiling be able to zoom into
particular "modules", even if the module is built-in.

And if there are big gaps that aren't just "align to next cacheline",
then that sounds like it's worth pointing out too.

But I see _zero_ reason not to say "just use vmlinux if you need
detailed information". The /proc file is not supposed to be a
replacement for the full setup, it should be seen as a convenient
shorthand and as a "if you have nothing better, at least you can get
_some_ information, and maybe you can also use it to validate that you
have the _right_ vmlinux file"

                 Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] kallsyms: add names of built-in modules
  2019-11-17  0:32       ` Linus Torvalds
@ 2019-11-19 22:42         ` eugene.loh
  2019-11-20  4:59           ` [PATCH v3] " eugene.loh
  2019-11-20  0:11         ` [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes Eugene Loh
  1 sibling, 1 reply; 21+ messages in thread
From: eugene.loh @ 2019-11-19 22:42 UTC (permalink / raw)
  To: eugene.loh
  Cc: rostedt, corbet, yamada.masahiro, michal.lkml, jeyu,
	linux-kbuild, maz, songliubraving, tglx, jacob.e.keller,
	Kris Van Hees, Nick Alcock

From: Eugene Loh <eugene.loh@oracle.com>

/proc/kallsyms is very useful for tracers and other tools that need
to map kernel symbols to addresses.

It would be useful if there were a mapping between kernel symbol and
module name that only changed when the kernel source code is changed.
This mapping should not vanish simply because a module becomes built
into the kernel.

Therefore:

- Generate a file "modules_thick.builtin" that maps from thin
  archives that make up built-in modules to their constituent
  object files.  See files .gitignore, Documentation/dontdiff,
  Makefile, and scripts/Makefile.modbuiltin.

- Generate a linker map ".tmp_vmlinux.map", converting it into
  ".tmp_vmlinux.ranges", mapping address ranges to object files.
  See file scripts/link-vmlinux.sh.

- Read "modules_thick.builtin" and ".tmp_vmlinux.ranges" to
  map symbol addresses to built-in-module names.  Write those
  module names (kallsyms_modules) and that per-symbol module
  information (kallsyms_symbol_modules) to the *.s output file.
  See file scripts/kallsyms.c.

- Use kallsyms_modules and kallsyms_symbol_modules to add
  built-in-module information to /proc/kallsyms.  See files
  scripts/namespace.pl and kernel/kallsyms.c.

Note that kernel symbols for built-in modules appear in ascending
order by address, as usual, and thus will appear interspersed with
symbols that are part of other built-in modules or of the kernel.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
---
 .gitignore                  |   1 +
 Documentation/dontdiff      |   1 +
 Makefile                    |  41 ++-
 kernel/kallsyms.c           |  14 +-
 scripts/Makefile.modbuiltin |  20 +-
 scripts/kallsyms.c          | 517 +++++++++++++++++++++++++++++++++++-
 scripts/link-vmlinux.sh     |  17 ++
 scripts/namespace.pl        |   5 +
 8 files changed, 593 insertions(+), 23 deletions(-)

diff --git a/.gitignore b/.gitignore
index 70580bdd352c..474491775a1a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -47,6 +47,7 @@
 Module.symvers
 modules.builtin
 modules.order
+modules_thick.builtin
 
 #
 # Top-level generic files
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 9f4392876099..32ee05f91410 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -180,6 +180,7 @@ modpost
 modules.builtin
 modules.builtin.modinfo
 modules.order
+modules_thick.builtin
 modversions.h*
 nconf
 nconf-cfg
diff --git a/Makefile b/Makefile
index 49363caa7079..15b4e897cd3e 100644
--- a/Makefile
+++ b/Makefile
@@ -1077,7 +1077,7 @@ cmd_link-vmlinux =                                                 \
 	$(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
-vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
+vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
 	+$(call if_changed,link-vmlinux)
 
 targets := vmlinux
@@ -1292,17 +1292,6 @@ modules: $(if $(KBUILD_BUILTIN),vmlinux) modules.order modules.builtin
 modules.order: descend
 	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
 
-modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
-
-modules.builtin: $(modbuiltin-dirs)
-	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
-
-PHONY += $(modbuiltin-dirs)
-# tristate.conf is not included from this Makefile. Add it as a prerequisite
-# here to make it self-healing in case somebody accidentally removes it.
-$(modbuiltin-dirs): include/config/tristate.conf
-	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@)
-
 # Target to prepare building external modules
 PHONY += modules_prepare
 modules_prepare: prepare
@@ -1355,6 +1344,33 @@ modules modules_install:
 
 endif # CONFIG_MODULES
 
+# modules.builtin has a 'thick' form which maps from kernel modules (or rather
+# the object file names they would have had had they not been built in) to their
+# constituent object files: kallsyms uses this to determine which modules any
+# given object file is part of.  (We cannot eliminate the slight redundancy
+# here without double-expansion.)
+
+modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
+
+modbuiltin-thick-dirs := $(addprefix _modbuiltin_thick_, $(build-dirs))
+
+modules.builtin: $(modbuiltin-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+modules_thick.builtin: $(modbuiltin-thick-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+PHONY += $(modbuiltin-dirs) $(modbuiltin-thick-dirs)
+# tristate.conf is not included from this Makefile. Add it as a prerequisite
+# here to make it self-healing in case somebody accidentally removes it.
+$(modbuiltin-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@) \
+			builtin-file=modules.builtin
+
+$(modbuiltin-thick-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_thick_%,%,$@) \
+			builtin-file=modules_thick.builtin
+
 ###
 # Cleaning is done on three levels.
 # make clean     Delete most generated files
@@ -1674,6 +1690,7 @@ clean: $(clean-dirs)
 		-o -name '*.asn1.[ch]' \
 		-o -name '*.symtypes' -o -name 'modules.order' \
 		-o -name modules.builtin -o -name '.tmp_*.o.*' \
+		-o -name modules_thick.builtin \
 		-o -name '*.c.[012]*.*' \
 		-o -name '*.ll' \
 		-o -name '*.gcno' \) -type f -print | xargs rm -f
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 136ce049c4ad..ae47f4879723 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -46,6 +46,8 @@ __attribute__((weak, section(".rodata")));
 
 extern const u8 kallsyms_token_table[] __weak;
 extern const u16 kallsyms_token_index[] __weak;
+extern const char kallsyms_modules[] __weak;
+extern const u32 kallsyms_symbol_modules[] __weak;
 
 extern const unsigned int kallsyms_markers[] __weak;
 
@@ -270,6 +272,7 @@ int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize,
 	return !!module_address_lookup(addr, symbolsize, offset, NULL, namebuf) ||
 	       !!__bpf_address_lookup(addr, symbolsize, offset, namebuf);
 }
+EXPORT_SYMBOL_GPL(kallsyms_lookup_size_offset);
 
 /*
  * Lookup an address
@@ -508,8 +511,17 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
 static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
 {
 	unsigned off = iter->nameoff;
+	u32 mod_index = 0;
 
-	iter->module_name[0] = '\0';
+	if (kallsyms_symbol_modules)
+		mod_index = kallsyms_symbol_modules[iter->pos];
+
+	if (mod_index == 0 || kallsyms_modules == NULL) {
+		iter->module_name[0] = '\0';
+	} else {
+		strcpy(iter->module_name, &kallsyms_modules[mod_index]);
+	}
+	iter->exported = 0;
 	iter->value = kallsyms_sym_address(iter->pos);
 
 	iter->type = kallsyms_get_symbol_type(off);
diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
index 7d4711b88656..06f31e58111e 100644
--- a/scripts/Makefile.modbuiltin
+++ b/scripts/Makefile.modbuiltin
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 # ==========================================================================
-# Generating modules.builtin
+# Generating modules.builtin and modules_thick.builtin
 # ==========================================================================
 
 src := $(obj)
@@ -30,19 +30,29 @@ __subdir-Y     := $(patsubst %/,%,$(filter %/, $(obj-Y)))
 subdir-Y       += $(__subdir-Y)
 subdir-ym      := $(sort $(subdir-y) $(subdir-Y) $(subdir-m))
 subdir-ym      := $(addprefix $(obj)/,$(subdir-ym))
-obj-Y          := $(addprefix $(obj)/,$(obj-Y))
+pathobj-Y      := $(addprefix $(obj)/,$(obj-Y))
 
 modbuiltin-subdirs := $(patsubst %,%/modules.builtin, $(subdir-ym))
-modbuiltin-mods    := $(filter %.ko, $(obj-Y:.o=.ko))
+modbuiltin-mods    := $(filter %.ko, $(pathobj-Y:.o=.ko))
 modbuiltin-target  := $(obj)/modules.builtin
+modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
+modthickbuiltin-target  := $(obj)/modules_thick.builtin
 
-__modbuiltin: $(modbuiltin-target) $(subdir-ym)
+__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
 	@:
 
 $(modbuiltin-target): $(subdir-ym) FORCE
 	$(Q)(for m in $(modbuiltin-mods); do echo $$m; done;	\
 	cat /dev/null $(modbuiltin-subdirs)) > $@
 
+$(modthickbuiltin-target): $(subdir-ym) FORCE
+	$(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
+		printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
+		printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
+			$($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
+		printf "\n" >> $@; ) \
+	cat /dev/null $(modthickbuiltin-subdirs) >> $@;
+
 PHONY += FORCE
 
 FORCE:
@@ -52,6 +62,6 @@ FORCE:
 
 PHONY += $(subdir-ym)
 $(subdir-ym):
-	$(Q)$(MAKE) $(modbuiltin)=$@
+	$(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
 
 .PHONY: $(PHONY)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index ae6504d07fd6..d29d6eaec267 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -5,7 +5,10 @@
  * This software may be used and distributed according to the terms
  * of the GNU General Public License, incorporated herein by reference.
  *
- * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
+ * Usage: nm -n vmlinux
+ *        | scripts/kallsyms [--all-symbols] [--absolute-percpu]
+ *             [--base-relative] [--builtin=modules_thick.builtin]
+ *        > symbols.S
  *
  *      Table compression uses all the unused char codes on the symbols and
  *  maps these to the most used substrings (tokens). For instance, it might
@@ -18,12 +21,17 @@
  *
  */
 
+#define _GNU_SOURCE 1
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <ctype.h>
 #include <limits.h>
 
+#include "../include/generated/autoconf.h"
+
+#include <errno.h>
+
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
 #endif
@@ -36,6 +44,7 @@ struct sym_entry {
 	unsigned int start_pos;
 	unsigned char *sym;
 	unsigned int percpu_absolute;
+	unsigned int module;
 };
 
 struct addr_range {
@@ -69,10 +78,116 @@ static unsigned char best_table[256][2];
 static unsigned char best_table_len[256];
 
 
+static unsigned int strhash(const char *s)
+{
+	/* fnv32 hash */
+	unsigned int hash = 2166136261U;
+
+	for (; *s; s++)
+		hash = (hash ^ *s) * 0x01000193;
+	return hash;
+}
+
+#define OBJ2MOD_BITS 10
+#define OBJ2MOD_N (1 << OBJ2MOD_BITS)
+#define OBJ2MOD_MASK (OBJ2MOD_N - 1)
+struct obj2mod_elem {
+	char *obj;
+	int mod;
+	struct obj2mod_elem *next;
+};
+
+static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
+
+static void obj2mod_init(void)
+{
+	memset(obj2mod, 0, sizeof(obj2mod));
+}
+
+static void obj2mod_put(char *obj, int mod)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem = malloc(sizeof(struct obj2mod_elem));
+
+	if (!elem) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		exit(1);
+	}
+
+	elem->obj = strdup(obj);
+	if (!elem->obj) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		free(elem);
+		exit(1);
+	}
+
+	elem->mod = mod;
+	elem->next = obj2mod[i];
+	obj2mod[i] = elem;
+}
+
+static int obj2mod_get(char *obj)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem;
+
+	for (elem = obj2mod[i]; elem; elem = elem->next)
+		if (strcmp(elem->obj, obj) == 0)
+			return elem->mod;
+	return 0;
+}
+
+static void obj2mod_free(void)
+{
+	int i;
+
+	for (i = 0; i < OBJ2MOD_N; i++) {
+		struct obj2mod_elem *elem = obj2mod[i];
+		struct obj2mod_elem *next;
+
+		while (elem) {
+			next = elem->next;
+			free(elem->obj);
+			free(elem);
+			elem = next;
+		}
+	}
+}
+
+/*
+ * The builtin module names.  The "offset" points to the name as if
+ * all builtin module names were concatenated to a single string.
+ */
+static unsigned int builtin_module_size;	/* number allocated */
+static unsigned int builtin_module_len;		/* number assigned */
+static char **builtin_modules;			/* array of module names */
+static unsigned int *builtin_module_offsets;	/* offset */
+
+/*
+ * modules_thick.builtin iteration state.
+ */
+struct modules_thick_iter {
+	FILE *f;
+	char *line;
+	size_t line_size;
+};
+
+/*
+ * An ordered list of address ranges and how they map to built-in modules.
+ */
+struct addrmap_entry {
+	unsigned long long addr;
+	unsigned long long size;
+	unsigned int module;
+};
+static struct addrmap_entry *addrmap;
+static int addrmap_num, addrmap_alloced;
+
 static void usage(void)
 {
-	fprintf(stderr, "Usage: kallsyms [--all-symbols] "
-			"[--base-relative] < in.map > out.S\n");
+	fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
+			"[--base-relative] [--builtin=modules_thick.builtin] "
+			"< nm_vmlinux.out > symbols.S\n");
 	exit(1);
 }
 
@@ -107,11 +222,25 @@ static int check_symbol_range(const char *sym, unsigned long long addr,
 	return 1;
 }
 
+static int addrmap_compare(const void *keyp, const void *rangep)
+{
+	unsigned long long addr = *((const unsigned long long *)keyp);
+	const struct addrmap_entry *range = (const struct addrmap_entry *)rangep;
+
+	if (addr < range->addr)
+		return -1;
+	if (addr < range->addr + range->size)
+		return 0;
+	return 1;
+}
+
 static int read_symbol(FILE *in, struct sym_entry *s)
 {
 	char sym[500], stype;
-	int rc;
+	int rc, init_scratch = 0;
+	struct addrmap_entry *range;
 
+read_another:
 	rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, sym);
 	if (rc != 3) {
 		if (rc != EOF && fgets(sym, 500, in) == NULL)
@@ -125,6 +254,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 		return -1;
 	}
 
+	/* skip the .init.scratch section */
+	if (strcmp(sym, "__init_scratch_end") == 0) {
+		init_scratch = 0;
+		goto read_another;
+	}
+	if (strcmp(sym, "__init_scratch_begin") == 0)
+		init_scratch = 1;
+	if (init_scratch)
+		goto read_another;
+
 	/* Ignore most absolute/undefined (?) symbols. */
 	if (strcmp(sym, "_text") == 0)
 		_text = s->addr;
@@ -154,6 +293,14 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 	else if (!strncmp(sym, ".LASANPC", 8))
 		return -1;
 
+	/* look up the builtin module this is part of (if any) */
+	range = (struct addrmap_entry *) bsearch(&s->addr,
+	    addrmap, addrmap_num, sizeof(*addrmap), &addrmap_compare);
+	if (range)
+		s->module = builtin_module_offsets[range->module];
+	else
+		s->module = 0;
+
 	/* include the type field in the symbol name, so that it gets
 	 * compressed together */
 	s->len = strlen(sym) + 1;
@@ -206,6 +353,8 @@ static int symbol_valid(struct sym_entry *s)
 		"kallsyms_markers",
 		"kallsyms_token_table",
 		"kallsyms_token_index",
+		"kallsyms_symbol_modules",
+		"kallsyms_modules",
 
 	/* Exclude linker generated symbols which vary between passes */
 		"_SDA_BASE_",		/* ppc */
@@ -454,6 +603,19 @@ static void write_src(void)
 	for (i = 0; i < 256; i++)
 		printf("\t.short\t%d\n", best_idx[i]);
 	printf("\n");
+
+	output_label("kallsyms_modules");
+	for (i = 0; i < builtin_module_len; i++)
+		printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
+	printf("\n");
+
+	for (i = 0; i < builtin_module_len; i++)
+		free(builtin_modules[i]);
+
+	output_label("kallsyms_symbol_modules");
+	for (i = 0; i < table_cnt; i++)
+		printf("\t.int\t%d\n", table[i].module);
+	printf("\n");
 }
 
 
@@ -738,23 +900,368 @@ static void record_relative_base(void)
 			relative_base = table[i].addr;
 }
 
+/*
+ * Read a modules_thick.builtin file.
+ */
+
+/*
+ * Construct a modules_thick.builtin iterator.
+ */
+static struct modules_thick_iter *
+modules_thick_iter_new(const char *modules_thick_file)
+{
+	struct modules_thick_iter *i;
+
+	i = calloc(1, sizeof(struct modules_thick_iter));
+	if (i == NULL)
+		return NULL;
+
+	i->f = fopen(modules_thick_file, "r");
+
+	if (i->f == NULL) {
+		fprintf(stderr, "Cannot open builtin module file %s: %s\n",
+			modules_thick_file, strerror(errno));
+		return NULL;
+	}
+
+	return i;
+}
+
+/*
+ * Iterate, returning a new null-terminated array of object file names, and a
+ * new dynamically-allocated module name.  (The module name passed in is freed.)
+ *
+ * The array of object file names should be freed by the caller: the strings it
+ * points to are owned by the iterator, and should not be freed.
+ */
+static char ** __attribute__((__nonnull__))
+modules_thick_iter_next(struct modules_thick_iter *i, char **module_name)
+{
+	size_t npaths = 1;
+	char **module_paths;
+	char *last_slash;
+	char *last_dot;
+	char *trailing_linefeed;
+	char *object_name = i->line;
+	char *dash;
+	int composite = 0;
+
+	/*
+	 * Read in all module entries, computing the suffixless, pathless name
+	 * of the module and building the next arrayful of object file names for
+	 * return.
+	 *
+	 * Modules can consist of multiple files: in this case, the portion
+	 * before the colon is the path to the module (as before): the portion
+	 * after the colon is a space-separated list of files that should be *
+	 * considered part of this module.  In this case, the portion before the
+	 * name is an "object file" that does not actually exist: it is merged
+	 * into built-in.a without ever being written out.
+	 *
+	 * All module names have - translated to _, to match what is done to the
+	 * names of the same things when built as modules.
+	 */
+
+	/*
+	 * Reinvocation of exhausted iterator. Return NULL, once.
+	 */
+retry:
+	if (getline(&i->line, &i->line_size, i->f) < 0) {
+		if (ferror(i->f)) {
+			fprintf(stderr,
+				"Error reading from modules_thick file: %s\n",
+				strerror(errno));
+			exit(1);
+		}
+		rewind(i->f);
+		return NULL;
+	}
+
+	if (i->line[0] == '\0')
+		goto retry;
+
+	/*
+	 * Slice the line in two at the colon, if any.  If there is anything
+	 * past the ': ', this is a composite module.  (We allow for no colon
+	 * for robustness, even though one should always be present.)
+	 */
+	if (strchr(i->line, ':') != NULL) {
+		char *name_start;
+
+		object_name = strchr(i->line, ':');
+		*object_name = '\0';
+		object_name++;
+		name_start = object_name + strspn(object_name, " \n");
+		if (*name_start != '\0') {
+			composite = 1;
+			object_name = name_start;
+		}
+	}
+
+	/*
+	 * Figure out the module name.
+	 */
+	last_slash = strrchr(i->line, '/');
+	last_slash = (!last_slash) ? i->line :
+		last_slash + 1;
+	free(*module_name);
+	*module_name = strdup(last_slash);
+	dash = *module_name;
+
+	while (dash != NULL) {
+		dash = strchr(dash, '-');
+		if (dash != NULL)
+			*dash = '_';
+	}
+
+	last_dot = strrchr(*module_name, '.');
+	if (last_dot != NULL)
+		*last_dot = '\0';
+
+	trailing_linefeed = strchr(object_name, '\n');
+	if (trailing_linefeed != NULL)
+		*trailing_linefeed = '\0';
+
+	/*
+	 * Multifile separator? Object file names explicitly stated:
+	 * slice them up and shuffle them in.
+	 *
+	 * The array size may be an overestimate if any object file
+	 * names start or end with spaces (very unlikely) but cannot be
+	 * an underestimate.  (Check for it anyway.)
+	 */
+	if (composite) {
+		char *one_object;
+
+		for (npaths = 0, one_object = object_name;
+		     one_object != NULL;
+		     npaths++, one_object = strchr(one_object + 1, ' '))
+			;
+	}
+
+	module_paths = malloc((npaths + 1) * sizeof(char *));
+	if (!module_paths) {
+		fprintf(stderr, "%s: out of memory on module %s\n", __func__,
+			*module_name);
+		exit(1);
+	}
+
+	if (composite) {
+		char *one_object;
+		size_t i = 0;
+
+		while ((one_object = strsep(&object_name, " ")) != NULL) {
+			if (i >= npaths) {
+				fprintf(stderr, "%s: npaths overflow on module "
+					"%s: this is a bug.\n", __func__,
+					*module_name);
+				exit(1);
+			}
+
+			module_paths[i++] = one_object;
+		}
+	} else
+		module_paths[0] = i->line;	/* untransformed module name */
+
+	module_paths[npaths] = NULL;
+
+	return module_paths;
+}
+
+/*
+ * Free an iterator. Can be called while iteration is underway, so even
+ * state that is freed at the end of iteration must be freed here too.
+ */
+static void
+modules_thick_iter_free(struct modules_thick_iter *i)
+{
+	if (i == NULL)
+		return;
+	fclose(i->f);
+	free(i->line);
+	free(i);
+}
+
+/*
+ * Expand the builtin modules list.
+ */
+static void expand_builtin_modules(void)
+{
+	builtin_module_size += 50;
+
+	builtin_modules = realloc(builtin_modules,
+				  sizeof(*builtin_modules) *
+				  builtin_module_size);
+	builtin_module_offsets = realloc(builtin_module_offsets,
+					 sizeof(*builtin_module_offsets) *
+					 builtin_module_size);
+
+	if (!builtin_modules || !builtin_module_offsets) {
+		fprintf(stderr, "kallsyms failure: out of memory.\n");
+		exit(EXIT_FAILURE);
+	}
+}
+
+/*
+ * Add a single built-in module (possibly composed of many files) to the
+ * modules list.  Take the offset of the current module and return it
+ * (purely for simplicity's sake in the caller).
+ */
+static size_t add_builtin_module(const char *module_name, char **module_paths,
+				 size_t offset)
+{
+	/* map the module's object paths to the module offset */
+	while (*module_paths) {
+		obj2mod_put(*module_paths, builtin_module_len);
+		module_paths++;
+	}
+
+	/* add the module name */
+	if (builtin_module_size <= builtin_module_len)
+		expand_builtin_modules();
+	builtin_modules[builtin_module_len] = strdup(module_name);
+	builtin_module_offsets[builtin_module_len] = offset;
+	builtin_module_len++;
+
+	return (offset + strlen(module_name) + 1);
+}
+
+/*
+ * Read the linker map.
+ */
+static void read_linker_map(void)
+{
+	unsigned long long addr, size;
+	char obj[PATH_MAX+1];
+	FILE *f = fopen(".tmp_vmlinux.ranges", "r");
+
+	if (!f) {
+		fprintf(stderr, "Cannot open '.tmp_vmlinux.ranges'.\n");
+		exit(1);
+	}
+
+	addrmap_num = 0;
+	addrmap_alloced = 4096;
+	addrmap = malloc(sizeof(*addrmap) * addrmap_alloced);
+	if (!addrmap)
+		goto oom;
+
+	/*
+	 * For each address range (addr,size) and object, add to addrmap
+	 * the range and the built-in module to which the object maps.
+	 */
+	while (fscanf(f, "%llx %llx %s\n", &addr, &size, obj) == 3) {
+		int m = obj2mod_get(obj);
+
+		if (addr == 0 || size == 0 || m == 0)
+			continue;
+
+		if (addrmap_num >= addrmap_alloced) {
+			addrmap_alloced *= 2;
+			addrmap = realloc(addrmap,
+			    sizeof(*addrmap) * addrmap_alloced);
+			if (!addrmap)
+				goto oom;
+		}
+
+		addrmap[addrmap_num].addr = addr;
+		addrmap[addrmap_num].size = size;
+		addrmap[addrmap_num].module = m;
+		addrmap_num++;
+	}
+	fclose(f);
+	return;
+
+oom:
+	fprintf(stderr, "kallsyms: out of memory\n");
+	exit(1);
+}
+
+/*
+ * Read "modules_thick.builtin" (the list of built-in modules).  Construct:
+ *   - builtin_modules: array of built-in-module names
+ *   - builtin_module_offsets: array of offsets that will later be
+ *       used to access a concatenated list of built-in-module names
+ *   - obj2mod: a temporary, many-to-one, hash mapping
+ *       from object-file paths to built-in-module names
+ * Read ".tmp_vmlinux.ranges" (the linker map).
+ *   - addrmap[] maps address ranges to built-in module names (using obj2mod)
+ */
+static void read_modules(const char *modules_builtin)
+{
+	struct modules_thick_iter *i;
+	size_t offset = 0;
+	char *module_name = NULL;
+	char **module_paths;
+
+	obj2mod_init();
+
+	/*
+	 * builtin_modules[0] is a null entry signifying a symbol that cannot be
+	 * modular.
+	 */
+	builtin_module_size = 50;
+	builtin_modules = malloc(sizeof(*builtin_modules) *
+				 builtin_module_size);
+	builtin_module_offsets = malloc(sizeof(*builtin_module_offsets) *
+				 builtin_module_size);
+	if (!builtin_modules || !builtin_module_offsets) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		exit(1);
+	}
+	builtin_modules[0] = strdup("");
+	builtin_module_offsets[0] = 0;
+	builtin_module_len = 1;
+	offset++;
+
+	/*
+	 * Iterate over all modules in modules_thick.builtin and add each.
+	 */
+	i = modules_thick_iter_new(modules_builtin);
+	if (i == NULL) {
+		fprintf(stderr, "Cannot iterate over builtin modules.\n");
+		exit(1);
+	}
+
+	while ((module_paths = modules_thick_iter_next(i, &module_name))) {
+		offset = add_builtin_module(module_name, module_paths, offset);
+		free(module_paths);
+		module_paths = NULL;
+	}
+
+	free(module_name);
+	modules_thick_iter_free(i);
+
+	/*
+	 * Read linker map.
+	 */
+	read_linker_map();
+
+	obj2mod_free();
+}
+
 int main(int argc, char **argv)
 {
+	const char *modules_builtin = "modules_thick.builtin";
+
 	if (argc >= 2) {
 		int i;
 		for (i = 1; i < argc; i++) {
-			if(strcmp(argv[i], "--all-symbols") == 0)
+			if (strcmp(argv[i], "--all-symbols") == 0)
 				all_symbols = 1;
 			else if (strcmp(argv[i], "--absolute-percpu") == 0)
 				absolute_percpu = 1;
 			else if (strcmp(argv[i], "--base-relative") == 0)
 				base_relative = 1;
+			else if (strncmp(argv[i], "--builtin=", 10) == 0)
+				modules_builtin = &argv[i][10];
 			else
 				usage();
 		}
 	} else if (argc != 1)
 		usage();
 
+	read_modules(modules_builtin);
 	read_map(stdin);
 	if (absolute_percpu)
 		make_percpus_absolute();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 06495379fcd8..e4d5a98133e7 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -76,6 +76,7 @@ vmlinux_link()
 			--start-group				\
 			${KBUILD_VMLINUX_LIBS}			\
 			--end-group				\
+			-Map=.tmp_vmlinux.map			\
 			${@}"
 
 		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
@@ -88,6 +89,7 @@ vmlinux_link()
 			-Wl,--start-group			\
 			${KBUILD_VMLINUX_LIBS}			\
 			-Wl,--end-group				\
+			-Wl,-Map=.tmp_vmlinux.map		\
 			${@}"
 
 		${CC} ${CFLAGS_vmlinux}				\
@@ -138,6 +140,19 @@ kallsyms()
 	info KSYM ${2}
 	local kallsymopt;
 
+	# read the linker map to identify ranges of addresses:
+	#   - for each *.o file, report address, size, pathname
+	#       - most such lines will have four fields
+	#       - but sometimes there is a line break after the first field
+	#   - start reading at "Linker script and memory map"
+	#   - stop reading at ".brk"
+	${AWK} '
+	    /\.o$/ && start==1 { print $(NF-2), $(NF-1), $NF }
+	    /^Linker script and memory map/ { start = 1 }
+	    /^\.brk/ { exit(0) }
+	' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
+
+	# get kallsyms options
 	if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
 		kallsymopt="${kallsymopt} --all-symbols"
 	fi
@@ -150,11 +165,13 @@ kallsyms()
 		kallsymopt="${kallsymopt} --base-relative"
 	fi
 
+	# set up compilation
 	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
 		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
 
 	local afile="`basename ${2} .o`.S"
 
+	# construct file and compile
 	${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${afile}
 	${CC} ${aflags} -c -o ${2} ${afile}
 }
diff --git a/scripts/namespace.pl b/scripts/namespace.pl
index 1da7bca201a4..4c7615e720de 100755
--- a/scripts/namespace.pl
+++ b/scripts/namespace.pl
@@ -120,6 +120,11 @@ my %nameexception = (
     'kallsyms_addresses'=> 1,
     'kallsyms_offsets'	=> 1,
     'kallsyms_relative_base'=> 1,
+    'kallsyms_token_table'=> 1,
+    'kallsyms_token_index'=> 1,
+    'kallsyms_markers'	=> 1,
+    'kallsyms_modules'	=> 1,
+    'kallsyms_symbol_modules'=> 1,
     '__this_module'	=> 1,
     '_etext'		=> 1,
     '_edata'		=> 1,
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes
  2019-11-17  0:32       ` Linus Torvalds
  2019-11-19 22:42         ` [PATCH v2] kallsyms: add names of built-in modules eugene.loh
@ 2019-11-20  0:11         ` Eugene Loh
  1 sibling, 0 replies; 21+ messages in thread
From: Eugene Loh @ 2019-11-20  0:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Jonathan Corbet, Masahiro Yamada, Michal Marek,
	Jessica Yu, Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, jacob.e.keller, Andrew Morton,
	Greg Kroah-Hartman, Kris Van Hees, Nick Alcock

On 11/16/2019 04:32 PM, Linus Torvalds wrote:

> On Sat, Nov 16, 2019 at 9:58 AM Eugene Loh <eugene.loh@oracle.com> wrote:
>> Since there are very many gaps, adding dummy entries makes sense only
>> for "big" jumps.  I don't know where one would want to draw the line for
>> "big."  In any case, to identify such gaps, one would still need the "nm
>> -S" information provided by this patch.
> Sure. You can have some kind of error estimate where if the size of
> the thing is much smalle rthan the gap, add the fake padding object.
>
> But it "much smaller than" would likely be in the area of page
> alignment, not "next function was aligned to 64-byte boundary" kind of
> small fixups.
>
> Honestly, if somebody needs the real size, why aren't they just using
> the original image?
>
>> Meanwhile, there are some symbols that encompass others.
> Yeah, I don't think this is at all worth worrying about. Again, if you
> want that kind of information, you should use the original vmlinux
> image, not think that "hey, /proc should give perfect information".

We're also interested in systems that don't have vmlinux available -- 
e.g., production systems with kernels installed from vendor packages.

Nevertheless, I'll proceed along the lines you suggest.  I'll remove the 
size stuff and simply add the module info.  I prematurely sent a "v2" to 
this mail list.  Sorry.  Amended patch coming soon.

> The /proc interface should be a rought and convenient baseline, but I
> don't think it's at all interesting to try to make it perfect or even
> all that clever.
>
> Most of your questions boil down to "just use vmlinux" instead. If you
> _really_ care about things like "one symbol can encompass many
> sub-symbols", you shouldn't look at /proc/kallsyms.
>
> So I think we could improve on /proc/kallsuyms, but we should do it
> with the aim being "just make it incrementally better", not some
> "let's solve big problems". The big problems are already solved by
> just looking at the vmlinux file.
>
> For example, I think the whole "include which module the symbol comes
> from" is a nice improved quality thing even if the module happens to
> be built-in. If that is easy to do, then we should just do it, and it
> allows people to see interesting information and might make it useful
> to (for example) have tools like profiling be able to zoom into
> particular "modules", even if the module is built-in.
>
> And if there are big gaps that aren't just "align to next cacheline",
> then that sounds like it's worth pointing out too.
>
> But I see _zero_ reason not to say "just use vmlinux if you need
> detailed information". The /proc file is not supposed to be a
> replacement for the full setup, it should be seen as a convenient
> shorthand and as a "if you have nothing better, at least you can get
> _some_ information, and maybe you can also use it to validate that you
> have the _right_ vmlinux file"
>
>                   Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] kallsyms: add names of built-in modules
  2019-11-19 22:42         ` [PATCH v2] kallsyms: add names of built-in modules eugene.loh
@ 2019-11-20  4:59           ` eugene.loh
  2019-11-22 10:00             ` Masahiro Yamada
  0 siblings, 1 reply; 21+ messages in thread
From: eugene.loh @ 2019-11-20  4:59 UTC (permalink / raw)
  To: eugene.loh
  Cc: rostedt, corbet, yamada.masahiro, michal.lkml, jeyu,
	linux-kbuild, maz, songliubraving, tglx, jacob.e.keller,
	Kris Van Hees, Nick Alcock

From: Eugene Loh <eugene.loh@oracle.com>

/proc/kallsyms is very useful for tracers and other tools that need
to map kernel symbols to addresses.

It would be useful if there were a mapping between kernel symbol and
module name that only changed when the kernel source code is changed.
This mapping should not vanish simply because a module becomes built
into the kernel.

Therefore:

- Generate a file "modules_thick.builtin" that maps from thin
  archives that make up built-in modules to their constituent
  object files.

- Generate a linker map ".tmp_vmlinux.map", converting it into
  ".tmp_vmlinux.ranges", mapping address ranges to object files.

- Read "modules_thick.builtin" and ".tmp_vmlinux.ranges" to
  map symbol addresses to built-in-module names.  Write those
  module names (kallsyms_modules) and that per-symbol module
  information (kallsyms_symbol_modules) to the *.s output file.

- Use kallsyms_modules and kallsyms_symbol_modules to add
  built-in-module information to /proc/kallsyms.

Note that kernel symbols for built-in modules appear in ascending
order by address, as usual, and thus will appear interspersed with
symbols that are part of other built-in modules or of the kernel.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
---
 .gitignore                  |   1 +
 Documentation/dontdiff      |   1 +
 Makefile                    |  41 ++-
 kernel/kallsyms.c           |  12 +-
 scripts/Makefile.modbuiltin |  20 +-
 scripts/kallsyms.c          | 515 +++++++++++++++++++++++++++++++++++-
 scripts/link-vmlinux.sh     |  17 ++
 scripts/namespace.pl        |   5 +
 8 files changed, 589 insertions(+), 23 deletions(-)

diff --git a/.gitignore b/.gitignore
index 70580bdd352c..474491775a1a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -47,6 +47,7 @@
 Module.symvers
 modules.builtin
 modules.order
+modules_thick.builtin
 
 #
 # Top-level generic files
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 9f4392876099..32ee05f91410 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -180,6 +180,7 @@ modpost
 modules.builtin
 modules.builtin.modinfo
 modules.order
+modules_thick.builtin
 modversions.h*
 nconf
 nconf-cfg
diff --git a/Makefile b/Makefile
index 49363caa7079..15b4e897cd3e 100644
--- a/Makefile
+++ b/Makefile
@@ -1077,7 +1077,7 @@ cmd_link-vmlinux =                                                 \
 	$(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
-vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
+vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
 	+$(call if_changed,link-vmlinux)
 
 targets := vmlinux
@@ -1292,17 +1292,6 @@ modules: $(if $(KBUILD_BUILTIN),vmlinux) modules.order modules.builtin
 modules.order: descend
 	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
 
-modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
-
-modules.builtin: $(modbuiltin-dirs)
-	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
-
-PHONY += $(modbuiltin-dirs)
-# tristate.conf is not included from this Makefile. Add it as a prerequisite
-# here to make it self-healing in case somebody accidentally removes it.
-$(modbuiltin-dirs): include/config/tristate.conf
-	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@)
-
 # Target to prepare building external modules
 PHONY += modules_prepare
 modules_prepare: prepare
@@ -1355,6 +1344,33 @@ modules modules_install:
 
 endif # CONFIG_MODULES
 
+# modules.builtin has a 'thick' form which maps from kernel modules (or rather
+# the object file names they would have had had they not been built in) to their
+# constituent object files: kallsyms uses this to determine which modules any
+# given object file is part of.  (We cannot eliminate the slight redundancy
+# here without double-expansion.)
+
+modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
+
+modbuiltin-thick-dirs := $(addprefix _modbuiltin_thick_, $(build-dirs))
+
+modules.builtin: $(modbuiltin-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+modules_thick.builtin: $(modbuiltin-thick-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+PHONY += $(modbuiltin-dirs) $(modbuiltin-thick-dirs)
+# tristate.conf is not included from this Makefile. Add it as a prerequisite
+# here to make it self-healing in case somebody accidentally removes it.
+$(modbuiltin-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@) \
+			builtin-file=modules.builtin
+
+$(modbuiltin-thick-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_thick_%,%,$@) \
+			builtin-file=modules_thick.builtin
+
 ###
 # Cleaning is done on three levels.
 # make clean     Delete most generated files
@@ -1674,6 +1690,7 @@ clean: $(clean-dirs)
 		-o -name '*.asn1.[ch]' \
 		-o -name '*.symtypes' -o -name 'modules.order' \
 		-o -name modules.builtin -o -name '.tmp_*.o.*' \
+		-o -name modules_thick.builtin \
 		-o -name '*.c.[012]*.*' \
 		-o -name '*.ll' \
 		-o -name '*.gcno' \) -type f -print | xargs rm -f
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 136ce049c4ad..ce8576503e35 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -46,6 +46,8 @@ __attribute__((weak, section(".rodata")));
 
 extern const u8 kallsyms_token_table[] __weak;
 extern const u16 kallsyms_token_index[] __weak;
+extern const char kallsyms_modules[] __weak;
+extern const u32 kallsyms_symbol_modules[] __weak;
 
 extern const unsigned int kallsyms_markers[] __weak;
 
@@ -508,8 +510,16 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
 static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
 {
 	unsigned off = iter->nameoff;
+	u32 mod_index = 0;
 
-	iter->module_name[0] = '\0';
+	if (kallsyms_symbol_modules)
+		mod_index = kallsyms_symbol_modules[iter->pos];
+
+	if (mod_index == 0 || kallsyms_modules == NULL)
+		iter->module_name[0] = '\0';
+	else
+		strcpy(iter->module_name, &kallsyms_modules[mod_index]);
+	iter->exported = 0;
 	iter->value = kallsyms_sym_address(iter->pos);
 
 	iter->type = kallsyms_get_symbol_type(off);
diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
index 7d4711b88656..06f31e58111e 100644
--- a/scripts/Makefile.modbuiltin
+++ b/scripts/Makefile.modbuiltin
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 # ==========================================================================
-# Generating modules.builtin
+# Generating modules.builtin and modules_thick.builtin
 # ==========================================================================
 
 src := $(obj)
@@ -30,19 +30,29 @@ __subdir-Y     := $(patsubst %/,%,$(filter %/, $(obj-Y)))
 subdir-Y       += $(__subdir-Y)
 subdir-ym      := $(sort $(subdir-y) $(subdir-Y) $(subdir-m))
 subdir-ym      := $(addprefix $(obj)/,$(subdir-ym))
-obj-Y          := $(addprefix $(obj)/,$(obj-Y))
+pathobj-Y      := $(addprefix $(obj)/,$(obj-Y))
 
 modbuiltin-subdirs := $(patsubst %,%/modules.builtin, $(subdir-ym))
-modbuiltin-mods    := $(filter %.ko, $(obj-Y:.o=.ko))
+modbuiltin-mods    := $(filter %.ko, $(pathobj-Y:.o=.ko))
 modbuiltin-target  := $(obj)/modules.builtin
+modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
+modthickbuiltin-target  := $(obj)/modules_thick.builtin
 
-__modbuiltin: $(modbuiltin-target) $(subdir-ym)
+__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
 	@:
 
 $(modbuiltin-target): $(subdir-ym) FORCE
 	$(Q)(for m in $(modbuiltin-mods); do echo $$m; done;	\
 	cat /dev/null $(modbuiltin-subdirs)) > $@
 
+$(modthickbuiltin-target): $(subdir-ym) FORCE
+	$(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
+		printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
+		printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
+			$($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
+		printf "\n" >> $@; ) \
+	cat /dev/null $(modthickbuiltin-subdirs) >> $@;
+
 PHONY += FORCE
 
 FORCE:
@@ -52,6 +62,6 @@ FORCE:
 
 PHONY += $(subdir-ym)
 $(subdir-ym):
-	$(Q)$(MAKE) $(modbuiltin)=$@
+	$(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
 
 .PHONY: $(PHONY)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index ae6504d07fd6..f71432df09d8 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -5,7 +5,10 @@
  * This software may be used and distributed according to the terms
  * of the GNU General Public License, incorporated herein by reference.
  *
- * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
+ * Usage: nm -n vmlinux
+ *        | scripts/kallsyms [--all-symbols] [--absolute-percpu]
+ *             [--base-relative] [--builtin=modules_thick.builtin]
+ *        > symbols.S
  *
  *      Table compression uses all the unused char codes on the symbols and
  *  maps these to the most used substrings (tokens). For instance, it might
@@ -18,12 +21,15 @@
  *
  */
 
+#define _GNU_SOURCE 1
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <ctype.h>
 #include <limits.h>
 
+#include <errno.h>
+
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
 #endif
@@ -36,6 +42,7 @@ struct sym_entry {
 	unsigned int start_pos;
 	unsigned char *sym;
 	unsigned int percpu_absolute;
+	unsigned int module;
 };
 
 struct addr_range {
@@ -69,10 +76,116 @@ static unsigned char best_table[256][2];
 static unsigned char best_table_len[256];
 
 
+static unsigned int strhash(const char *s)
+{
+	/* fnv32 hash */
+	unsigned int hash = 2166136261U;
+
+	for (; *s; s++)
+		hash = (hash ^ *s) * 0x01000193;
+	return hash;
+}
+
+#define OBJ2MOD_BITS 10
+#define OBJ2MOD_N (1 << OBJ2MOD_BITS)
+#define OBJ2MOD_MASK (OBJ2MOD_N - 1)
+struct obj2mod_elem {
+	char *obj;
+	int mod;
+	struct obj2mod_elem *next;
+};
+
+static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
+
+static void obj2mod_init(void)
+{
+	memset(obj2mod, 0, sizeof(obj2mod));
+}
+
+static void obj2mod_put(char *obj, int mod)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem = malloc(sizeof(struct obj2mod_elem));
+
+	if (!elem) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		exit(1);
+	}
+
+	elem->obj = strdup(obj);
+	if (!elem->obj) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		free(elem);
+		exit(1);
+	}
+
+	elem->mod = mod;
+	elem->next = obj2mod[i];
+	obj2mod[i] = elem;
+}
+
+static int obj2mod_get(char *obj)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem;
+
+	for (elem = obj2mod[i]; elem; elem = elem->next)
+		if (strcmp(elem->obj, obj) == 0)
+			return elem->mod;
+	return 0;
+}
+
+static void obj2mod_free(void)
+{
+	int i;
+
+	for (i = 0; i < OBJ2MOD_N; i++) {
+		struct obj2mod_elem *elem = obj2mod[i];
+		struct obj2mod_elem *next;
+
+		while (elem) {
+			next = elem->next;
+			free(elem->obj);
+			free(elem);
+			elem = next;
+		}
+	}
+}
+
+/*
+ * The builtin module names.  The "offset" points to the name as if
+ * all builtin module names were concatenated to a single string.
+ */
+static unsigned int builtin_module_size;	/* number allocated */
+static unsigned int builtin_module_len;		/* number assigned */
+static char **builtin_modules;			/* array of module names */
+static unsigned int *builtin_module_offsets;	/* offset */
+
+/*
+ * modules_thick.builtin iteration state.
+ */
+struct modules_thick_iter {
+	FILE *f;
+	char *line;
+	size_t line_size;
+};
+
+/*
+ * An ordered list of address ranges and how they map to built-in modules.
+ */
+struct addrmap_entry {
+	unsigned long long addr;
+	unsigned long long size;
+	unsigned int module;
+};
+static struct addrmap_entry *addrmap;
+static int addrmap_num, addrmap_alloced;
+
 static void usage(void)
 {
-	fprintf(stderr, "Usage: kallsyms [--all-symbols] "
-			"[--base-relative] < in.map > out.S\n");
+	fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
+			"[--base-relative] [--builtin=modules_thick.builtin] "
+			"< nm_vmlinux.out > symbols.S\n");
 	exit(1);
 }
 
@@ -107,11 +220,25 @@ static int check_symbol_range(const char *sym, unsigned long long addr,
 	return 1;
 }
 
+static int addrmap_compare(const void *keyp, const void *rangep)
+{
+	unsigned long long addr = *((const unsigned long long *)keyp);
+	const struct addrmap_entry *range = (const struct addrmap_entry *)rangep;
+
+	if (addr < range->addr)
+		return -1;
+	if (addr < range->addr + range->size)
+		return 0;
+	return 1;
+}
+
 static int read_symbol(FILE *in, struct sym_entry *s)
 {
 	char sym[500], stype;
-	int rc;
+	int rc, init_scratch = 0;
+	struct addrmap_entry *range;
 
+read_another:
 	rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, sym);
 	if (rc != 3) {
 		if (rc != EOF && fgets(sym, 500, in) == NULL)
@@ -125,6 +252,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 		return -1;
 	}
 
+	/* skip the .init.scratch section */
+	if (strcmp(sym, "__init_scratch_end") == 0) {
+		init_scratch = 0;
+		goto read_another;
+	}
+	if (strcmp(sym, "__init_scratch_begin") == 0)
+		init_scratch = 1;
+	if (init_scratch)
+		goto read_another;
+
 	/* Ignore most absolute/undefined (?) symbols. */
 	if (strcmp(sym, "_text") == 0)
 		_text = s->addr;
@@ -154,6 +291,14 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 	else if (!strncmp(sym, ".LASANPC", 8))
 		return -1;
 
+	/* look up the builtin module this is part of (if any) */
+	range = (struct addrmap_entry *) bsearch(&s->addr,
+	    addrmap, addrmap_num, sizeof(*addrmap), &addrmap_compare);
+	if (range)
+		s->module = builtin_module_offsets[range->module];
+	else
+		s->module = 0;
+
 	/* include the type field in the symbol name, so that it gets
 	 * compressed together */
 	s->len = strlen(sym) + 1;
@@ -206,6 +351,8 @@ static int symbol_valid(struct sym_entry *s)
 		"kallsyms_markers",
 		"kallsyms_token_table",
 		"kallsyms_token_index",
+		"kallsyms_symbol_modules",
+		"kallsyms_modules",
 
 	/* Exclude linker generated symbols which vary between passes */
 		"_SDA_BASE_",		/* ppc */
@@ -454,6 +601,19 @@ static void write_src(void)
 	for (i = 0; i < 256; i++)
 		printf("\t.short\t%d\n", best_idx[i]);
 	printf("\n");
+
+	output_label("kallsyms_modules");
+	for (i = 0; i < builtin_module_len; i++)
+		printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
+	printf("\n");
+
+	for (i = 0; i < builtin_module_len; i++)
+		free(builtin_modules[i]);
+
+	output_label("kallsyms_symbol_modules");
+	for (i = 0; i < table_cnt; i++)
+		printf("\t.int\t%d\n", table[i].module);
+	printf("\n");
 }
 
 
@@ -738,23 +898,368 @@ static void record_relative_base(void)
 			relative_base = table[i].addr;
 }
 
+/*
+ * Read a modules_thick.builtin file.
+ */
+
+/*
+ * Construct a modules_thick.builtin iterator.
+ */
+static struct modules_thick_iter *
+modules_thick_iter_new(const char *modules_thick_file)
+{
+	struct modules_thick_iter *i;
+
+	i = calloc(1, sizeof(struct modules_thick_iter));
+	if (i == NULL)
+		return NULL;
+
+	i->f = fopen(modules_thick_file, "r");
+
+	if (i->f == NULL) {
+		fprintf(stderr, "Cannot open builtin module file %s: %s\n",
+			modules_thick_file, strerror(errno));
+		return NULL;
+	}
+
+	return i;
+}
+
+/*
+ * Iterate, returning a new null-terminated array of object file names, and a
+ * new dynamically-allocated module name.  (The module name passed in is freed.)
+ *
+ * The array of object file names should be freed by the caller: the strings it
+ * points to are owned by the iterator, and should not be freed.
+ */
+static char ** __attribute__((__nonnull__))
+modules_thick_iter_next(struct modules_thick_iter *i, char **module_name)
+{
+	size_t npaths = 1;
+	char **module_paths;
+	char *last_slash;
+	char *last_dot;
+	char *trailing_linefeed;
+	char *object_name = i->line;
+	char *dash;
+	int composite = 0;
+
+	/*
+	 * Read in all module entries, computing the suffixless, pathless name
+	 * of the module and building the next arrayful of object file names for
+	 * return.
+	 *
+	 * Modules can consist of multiple files: in this case, the portion
+	 * before the colon is the path to the module (as before): the portion
+	 * after the colon is a space-separated list of files that should be *
+	 * considered part of this module.  In this case, the portion before the
+	 * name is an "object file" that does not actually exist: it is merged
+	 * into built-in.a without ever being written out.
+	 *
+	 * All module names have - translated to _, to match what is done to the
+	 * names of the same things when built as modules.
+	 */
+
+	/*
+	 * Reinvocation of exhausted iterator. Return NULL, once.
+	 */
+retry:
+	if (getline(&i->line, &i->line_size, i->f) < 0) {
+		if (ferror(i->f)) {
+			fprintf(stderr,
+				"Error reading from modules_thick file: %s\n",
+				strerror(errno));
+			exit(1);
+		}
+		rewind(i->f);
+		return NULL;
+	}
+
+	if (i->line[0] == '\0')
+		goto retry;
+
+	/*
+	 * Slice the line in two at the colon, if any.  If there is anything
+	 * past the ': ', this is a composite module.  (We allow for no colon
+	 * for robustness, even though one should always be present.)
+	 */
+	if (strchr(i->line, ':') != NULL) {
+		char *name_start;
+
+		object_name = strchr(i->line, ':');
+		*object_name = '\0';
+		object_name++;
+		name_start = object_name + strspn(object_name, " \n");
+		if (*name_start != '\0') {
+			composite = 1;
+			object_name = name_start;
+		}
+	}
+
+	/*
+	 * Figure out the module name.
+	 */
+	last_slash = strrchr(i->line, '/');
+	last_slash = (!last_slash) ? i->line :
+		last_slash + 1;
+	free(*module_name);
+	*module_name = strdup(last_slash);
+	dash = *module_name;
+
+	while (dash != NULL) {
+		dash = strchr(dash, '-');
+		if (dash != NULL)
+			*dash = '_';
+	}
+
+	last_dot = strrchr(*module_name, '.');
+	if (last_dot != NULL)
+		*last_dot = '\0';
+
+	trailing_linefeed = strchr(object_name, '\n');
+	if (trailing_linefeed != NULL)
+		*trailing_linefeed = '\0';
+
+	/*
+	 * Multifile separator? Object file names explicitly stated:
+	 * slice them up and shuffle them in.
+	 *
+	 * The array size may be an overestimate if any object file
+	 * names start or end with spaces (very unlikely) but cannot be
+	 * an underestimate.  (Check for it anyway.)
+	 */
+	if (composite) {
+		char *one_object;
+
+		for (npaths = 0, one_object = object_name;
+		     one_object != NULL;
+		     npaths++, one_object = strchr(one_object + 1, ' '))
+			;
+	}
+
+	module_paths = malloc((npaths + 1) * sizeof(char *));
+	if (!module_paths) {
+		fprintf(stderr, "%s: out of memory on module %s\n", __func__,
+			*module_name);
+		exit(1);
+	}
+
+	if (composite) {
+		char *one_object;
+		size_t i = 0;
+
+		while ((one_object = strsep(&object_name, " ")) != NULL) {
+			if (i >= npaths) {
+				fprintf(stderr, "%s: npaths overflow on module "
+					"%s: this is a bug.\n", __func__,
+					*module_name);
+				exit(1);
+			}
+
+			module_paths[i++] = one_object;
+		}
+	} else
+		module_paths[0] = i->line;	/* untransformed module name */
+
+	module_paths[npaths] = NULL;
+
+	return module_paths;
+}
+
+/*
+ * Free an iterator. Can be called while iteration is underway, so even
+ * state that is freed at the end of iteration must be freed here too.
+ */
+static void
+modules_thick_iter_free(struct modules_thick_iter *i)
+{
+	if (i == NULL)
+		return;
+	fclose(i->f);
+	free(i->line);
+	free(i);
+}
+
+/*
+ * Expand the builtin modules list.
+ */
+static void expand_builtin_modules(void)
+{
+	builtin_module_size += 50;
+
+	builtin_modules = realloc(builtin_modules,
+				  sizeof(*builtin_modules) *
+				  builtin_module_size);
+	builtin_module_offsets = realloc(builtin_module_offsets,
+					 sizeof(*builtin_module_offsets) *
+					 builtin_module_size);
+
+	if (!builtin_modules || !builtin_module_offsets) {
+		fprintf(stderr, "kallsyms failure: out of memory.\n");
+		exit(EXIT_FAILURE);
+	}
+}
+
+/*
+ * Add a single built-in module (possibly composed of many files) to the
+ * modules list.  Take the offset of the current module and return it
+ * (purely for simplicity's sake in the caller).
+ */
+static size_t add_builtin_module(const char *module_name, char **module_paths,
+				 size_t offset)
+{
+	/* map the module's object paths to the module offset */
+	while (*module_paths) {
+		obj2mod_put(*module_paths, builtin_module_len);
+		module_paths++;
+	}
+
+	/* add the module name */
+	if (builtin_module_size <= builtin_module_len)
+		expand_builtin_modules();
+	builtin_modules[builtin_module_len] = strdup(module_name);
+	builtin_module_offsets[builtin_module_len] = offset;
+	builtin_module_len++;
+
+	return (offset + strlen(module_name) + 1);
+}
+
+/*
+ * Read the linker map.
+ */
+static void read_linker_map(void)
+{
+	unsigned long long addr, size;
+	char obj[PATH_MAX+1];
+	FILE *f = fopen(".tmp_vmlinux.ranges", "r");
+
+	if (!f) {
+		fprintf(stderr, "Cannot open '.tmp_vmlinux.ranges'.\n");
+		exit(1);
+	}
+
+	addrmap_num = 0;
+	addrmap_alloced = 4096;
+	addrmap = malloc(sizeof(*addrmap) * addrmap_alloced);
+	if (!addrmap)
+		goto oom;
+
+	/*
+	 * For each address range (addr,size) and object, add to addrmap
+	 * the range and the built-in module to which the object maps.
+	 */
+	while (fscanf(f, "%llx %llx %s\n", &addr, &size, obj) == 3) {
+		int m = obj2mod_get(obj);
+
+		if (addr == 0 || size == 0 || m == 0)
+			continue;
+
+		if (addrmap_num >= addrmap_alloced) {
+			addrmap_alloced *= 2;
+			addrmap = realloc(addrmap,
+			    sizeof(*addrmap) * addrmap_alloced);
+			if (!addrmap)
+				goto oom;
+		}
+
+		addrmap[addrmap_num].addr = addr;
+		addrmap[addrmap_num].size = size;
+		addrmap[addrmap_num].module = m;
+		addrmap_num++;
+	}
+	fclose(f);
+	return;
+
+oom:
+	fprintf(stderr, "kallsyms: out of memory\n");
+	exit(1);
+}
+
+/*
+ * Read "modules_thick.builtin" (the list of built-in modules).  Construct:
+ *   - builtin_modules: array of built-in-module names
+ *   - builtin_module_offsets: array of offsets that will later be
+ *       used to access a concatenated list of built-in-module names
+ *   - obj2mod: a temporary, many-to-one, hash mapping
+ *       from object-file paths to built-in-module names
+ * Read ".tmp_vmlinux.ranges" (the linker map).
+ *   - addrmap[] maps address ranges to built-in module names (using obj2mod)
+ */
+static void read_modules(const char *modules_builtin)
+{
+	struct modules_thick_iter *i;
+	size_t offset = 0;
+	char *module_name = NULL;
+	char **module_paths;
+
+	obj2mod_init();
+
+	/*
+	 * builtin_modules[0] is a null entry signifying a symbol that cannot be
+	 * modular.
+	 */
+	builtin_module_size = 50;
+	builtin_modules = malloc(sizeof(*builtin_modules) *
+				 builtin_module_size);
+	builtin_module_offsets = malloc(sizeof(*builtin_module_offsets) *
+				 builtin_module_size);
+	if (!builtin_modules || !builtin_module_offsets) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		exit(1);
+	}
+	builtin_modules[0] = strdup("");
+	builtin_module_offsets[0] = 0;
+	builtin_module_len = 1;
+	offset++;
+
+	/*
+	 * Iterate over all modules in modules_thick.builtin and add each.
+	 */
+	i = modules_thick_iter_new(modules_builtin);
+	if (i == NULL) {
+		fprintf(stderr, "Cannot iterate over builtin modules.\n");
+		exit(1);
+	}
+
+	while ((module_paths = modules_thick_iter_next(i, &module_name))) {
+		offset = add_builtin_module(module_name, module_paths, offset);
+		free(module_paths);
+		module_paths = NULL;
+	}
+
+	free(module_name);
+	modules_thick_iter_free(i);
+
+	/*
+	 * Read linker map.
+	 */
+	read_linker_map();
+
+	obj2mod_free();
+}
+
 int main(int argc, char **argv)
 {
+	const char *modules_builtin = "modules_thick.builtin";
+
 	if (argc >= 2) {
 		int i;
 		for (i = 1; i < argc; i++) {
-			if(strcmp(argv[i], "--all-symbols") == 0)
+			if (strcmp(argv[i], "--all-symbols") == 0)
 				all_symbols = 1;
 			else if (strcmp(argv[i], "--absolute-percpu") == 0)
 				absolute_percpu = 1;
 			else if (strcmp(argv[i], "--base-relative") == 0)
 				base_relative = 1;
+			else if (strncmp(argv[i], "--builtin=", 10) == 0)
+				modules_builtin = &argv[i][10];
 			else
 				usage();
 		}
 	} else if (argc != 1)
 		usage();
 
+	read_modules(modules_builtin);
 	read_map(stdin);
 	if (absolute_percpu)
 		make_percpus_absolute();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 06495379fcd8..e4d5a98133e7 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -76,6 +76,7 @@ vmlinux_link()
 			--start-group				\
 			${KBUILD_VMLINUX_LIBS}			\
 			--end-group				\
+			-Map=.tmp_vmlinux.map			\
 			${@}"
 
 		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
@@ -88,6 +89,7 @@ vmlinux_link()
 			-Wl,--start-group			\
 			${KBUILD_VMLINUX_LIBS}			\
 			-Wl,--end-group				\
+			-Wl,-Map=.tmp_vmlinux.map		\
 			${@}"
 
 		${CC} ${CFLAGS_vmlinux}				\
@@ -138,6 +140,19 @@ kallsyms()
 	info KSYM ${2}
 	local kallsymopt;
 
+	# read the linker map to identify ranges of addresses:
+	#   - for each *.o file, report address, size, pathname
+	#       - most such lines will have four fields
+	#       - but sometimes there is a line break after the first field
+	#   - start reading at "Linker script and memory map"
+	#   - stop reading at ".brk"
+	${AWK} '
+	    /\.o$/ && start==1 { print $(NF-2), $(NF-1), $NF }
+	    /^Linker script and memory map/ { start = 1 }
+	    /^\.brk/ { exit(0) }
+	' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
+
+	# get kallsyms options
 	if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
 		kallsymopt="${kallsymopt} --all-symbols"
 	fi
@@ -150,11 +165,13 @@ kallsyms()
 		kallsymopt="${kallsymopt} --base-relative"
 	fi
 
+	# set up compilation
 	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
 		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
 
 	local afile="`basename ${2} .o`.S"
 
+	# construct file and compile
 	${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${afile}
 	${CC} ${aflags} -c -o ${2} ${afile}
 }
diff --git a/scripts/namespace.pl b/scripts/namespace.pl
index 1da7bca201a4..4c7615e720de 100755
--- a/scripts/namespace.pl
+++ b/scripts/namespace.pl
@@ -120,6 +120,11 @@ my %nameexception = (
     'kallsyms_addresses'=> 1,
     'kallsyms_offsets'	=> 1,
     'kallsyms_relative_base'=> 1,
+    'kallsyms_token_table'=> 1,
+    'kallsyms_token_index'=> 1,
+    'kallsyms_markers'	=> 1,
+    'kallsyms_modules'	=> 1,
+    'kallsyms_symbol_modules'=> 1,
     '__this_module'	=> 1,
     '_etext'		=> 1,
     '_edata'		=> 1,
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] kallsyms: add names of built-in modules
  2019-11-20  4:59           ` [PATCH v3] " eugene.loh
@ 2019-11-22 10:00             ` Masahiro Yamada
  2019-11-22 15:23               ` Nick Alcock
  2019-12-10 17:45               ` Eugene Loh
  0 siblings, 2 replies; 21+ messages in thread
From: Masahiro Yamada @ 2019-11-22 10:00 UTC (permalink / raw)
  To: eugene.loh
  Cc: Steven Rostedt, Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Nick Alcock

On Wed, Nov 20, 2019 at 2:02 PM <eugene.loh@oracle.com> wrote:
>
> From: Eugene Loh <eugene.loh@oracle.com>
>
> /proc/kallsyms is very useful for tracers and other tools that need
> to map kernel symbols to addresses.
>
> It would be useful if there were a mapping between kernel symbol and
> module name that only changed when the kernel source code is changed.



Unfortunately, this is not necessarily true.

Some objects could be linked into multiple modules.

For example, see
lib/zstd/Makefile
drivers/net/ethernet/cavium/liquidio/Makefile


For real modules, the mapping from a symbol to a modname
works well in /proc/kallsyms.

For built-in modules, it is quite subtle.


I will show corner cases.

Build with CONFIG_LIQUIDIO=m && CONFIG_LIQUIDIO_VF=m

Then, do
$ modprobe liquidio
$ modprobe liquidio_vf
$ grep  lio_get_link_ksettings  /proc/kallsyms

I think the output is correct.



CONFIG_LIQUIDIO=y && CONFIG_LIQUIDIO_VF=n  is OK
CONFIG_LIQUIDIO=n && CONFIG_LIQUIDIO_VF=y  is OK

The symbol-to-modname mapping changes depending on
the .config though.


What about  CONFIG_LIQUIDIO=y && CONFIG_LIQUIDIO_VF=y ?

It is hard to say which particular module the symbol came from.

As far as I tested this patch, it seems it picked up a
random one?




> This mapping should not vanish simply because a module becomes built
> into the kernel.
>
> Therefore:
>
> - Generate a file "modules_thick.builtin" that maps from thin
>   archives that make up built-in modules to their constituent
>   object files.
>
> - Generate a linker map ".tmp_vmlinux.map", converting it into
>   ".tmp_vmlinux.ranges", mapping address ranges to object files.
>
> - Read "modules_thick.builtin" and ".tmp_vmlinux.ranges" to
>   map symbol addresses to built-in-module names.  Write those
>   module names (kallsyms_modules) and that per-symbol module
>   information (kallsyms_symbol_modules) to the *.s output file.
>
> - Use kallsyms_modules and kallsyms_symbol_modules to add
>   built-in-module information to /proc/kallsyms.
>
> Note that kernel symbols for built-in modules appear in ascending
> order by address, as usual, and thus will appear interspersed with
> symbols that are part of other built-in modules or of the kernel.
>
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
> ---
>  .gitignore                  |   1 +
>  Documentation/dontdiff      |   1 +
>  Makefile                    |  41 ++-
>  kernel/kallsyms.c           |  12 +-
>  scripts/Makefile.modbuiltin |  20 +-
>  scripts/kallsyms.c          | 515 +++++++++++++++++++++++++++++++++++-
>  scripts/link-vmlinux.sh     |  17 ++
>  scripts/namespace.pl        |   5 +
>  8 files changed, 589 insertions(+), 23 deletions(-)


This diff-stat is unfortunate.
scripts/kallsyms.c increased 65% for parsing
.tmp_vmlinux.ranges and modules_think.builtin

I tend to suspect the design mistake...



I tested this patch on x86_64_defconfig
It also increases 24% of kallsyms data.

The data increase is  outstanding compared with the
amount of information added.

   text    data     bss     dec     hex filename
 830000       0       0 830000   caa30 .tmp_kallsyms2.o.before
1031216       0       0 1031216   fbc30 .tmp_kallsyms2.o.after




> diff --git a/Documentation/dontdiff b/Documentation/dontdiff
> index 9f4392876099..32ee05f91410 100644
> --- a/Documentation/dontdiff
> +++ b/Documentation/dontdiff
> @@ -180,6 +180,7 @@ modpost
>  modules.builtin
>  modules.builtin.modinfo
>  modules.order
> +modules_thick.builtin
>  modversions.h*
>  nconf
>  nconf-cfg

Most people missed to add this.
I think you took time for internal review.


> diff --git a/Makefile b/Makefile
> index 49363caa7079..15b4e897cd3e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1077,7 +1077,7 @@ cmd_link-vmlinux =                                                 \
>         $(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
>         $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
>
> -vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
> +vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
>         +$(call if_changed,link-vmlinux)
>
>  targets := vmlinux
> @@ -1292,17 +1292,6 @@ modules: $(if $(KBUILD_BUILTIN),vmlinux) modules.order modules.builtin
>  modules.order: descend
>         $(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
>
> -modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
> -
> -modules.builtin: $(modbuiltin-dirs)
> -       $(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> -
> -PHONY += $(modbuiltin-dirs)
> -# tristate.conf is not included from this Makefile. Add it as a prerequisite
> -# here to make it self-healing in case somebody accidentally removes it.
> -$(modbuiltin-dirs): include/config/tristate.conf
> -       $(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@)
> -
>  # Target to prepare building external modules
>  PHONY += modules_prepare
>  modules_prepare: prepare
> @@ -1355,6 +1344,33 @@ modules modules_install:
>
>  endif # CONFIG_MODULES
>
> +# modules.builtin has a 'thick' form which maps from kernel modules (or rather
> +# the object file names they would have had had they not been built in) to their
> +# constituent object files: kallsyms uses this to determine which modules any
> +# given object file is part of.  (We cannot eliminate the slight redundancy
> +# here without double-expansion.)
> +
> +modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
> +
> +modbuiltin-thick-dirs := $(addprefix _modbuiltin_thick_, $(build-dirs))
> +
> +modules.builtin: $(modbuiltin-dirs)
> +       $(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> +
> +modules_thick.builtin: $(modbuiltin-thick-dirs)
> +       $(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> +
> +PHONY += $(modbuiltin-dirs) $(modbuiltin-thick-dirs)
> +# tristate.conf is not included from this Makefile. Add it as a prerequisite
> +# here to make it self-healing in case somebody accidentally removes it.
> +$(modbuiltin-dirs): include/config/tristate.conf
> +       $(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@) \
> +                       builtin-file=modules.builtin
> +
> +$(modbuiltin-thick-dirs): include/config/tristate.conf
> +       $(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_thick_%,%,$@) \
> +                       builtin-file=modules_thick.builtin
> +
>  ###
>  # Cleaning is done on three levels.
>  # make clean     Delete most generated files
> @@ -1674,6 +1690,7 @@ clean: $(clean-dirs)
>                 -o -name '*.asn1.[ch]' \
>                 -o -name '*.symtypes' -o -name 'modules.order' \
>                 -o -name modules.builtin -o -name '.tmp_*.o.*' \
> +               -o -name modules_thick.builtin \
>                 -o -name '*.c.[012]*.*' \
>                 -o -name '*.ll' \
>                 -o -name '*.gcno' \) -type f -print | xargs rm -f
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 136ce049c4ad..ce8576503e35 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -46,6 +46,8 @@ __attribute__((weak, section(".rodata")));
>
>  extern const u8 kallsyms_token_table[] __weak;
>  extern const u16 kallsyms_token_index[] __weak;
> +extern const char kallsyms_modules[] __weak;
> +extern const u32 kallsyms_symbol_modules[] __weak;
>
>  extern const unsigned int kallsyms_markers[] __weak;
>
> @@ -508,8 +510,16 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
>  static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
>  {
>         unsigned off = iter->nameoff;
> +       u32 mod_index = 0;
>
> -       iter->module_name[0] = '\0';
> +       if (kallsyms_symbol_modules)
> +               mod_index = kallsyms_symbol_modules[iter->pos];
> +
> +       if (mod_index == 0 || kallsyms_modules == NULL)
> +               iter->module_name[0] = '\0';
> +       else
> +               strcpy(iter->module_name, &kallsyms_modules[mod_index]);
> +       iter->exported = 0;
>         iter->value = kallsyms_sym_address(iter->pos);
>
>         iter->type = kallsyms_get_symbol_type(off);
> diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
> index 7d4711b88656..06f31e58111e 100644
> --- a/scripts/Makefile.modbuiltin
> +++ b/scripts/Makefile.modbuiltin
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  # ==========================================================================
> -# Generating modules.builtin
> +# Generating modules.builtin and modules_thick.builtin
>  # ==========================================================================
>
>  src := $(obj)
> @@ -30,19 +30,29 @@ __subdir-Y     := $(patsubst %/,%,$(filter %/, $(obj-Y)))
>  subdir-Y       += $(__subdir-Y)
>  subdir-ym      := $(sort $(subdir-y) $(subdir-Y) $(subdir-m))
>  subdir-ym      := $(addprefix $(obj)/,$(subdir-ym))
> -obj-Y          := $(addprefix $(obj)/,$(obj-Y))
> +pathobj-Y      := $(addprefix $(obj)/,$(obj-Y))
>
>  modbuiltin-subdirs := $(patsubst %,%/modules.builtin, $(subdir-ym))
> -modbuiltin-mods    := $(filter %.ko, $(obj-Y:.o=.ko))
> +modbuiltin-mods    := $(filter %.ko, $(pathobj-Y:.o=.ko))
>  modbuiltin-target  := $(obj)/modules.builtin
> +modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
> +modthickbuiltin-target  := $(obj)/modules_thick.builtin
>
> -__modbuiltin: $(modbuiltin-target) $(subdir-ym)
> +__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
>         @:
>
>  $(modbuiltin-target): $(subdir-ym) FORCE
>         $(Q)(for m in $(modbuiltin-mods); do echo $$m; done;    \
>         cat /dev/null $(modbuiltin-subdirs)) > $@
>
> +$(modthickbuiltin-target): $(subdir-ym) FORCE
> +       $(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
> +               printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
> +               printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
> +                       $($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
> +               printf "\n" >> $@; ) \
> +       cat /dev/null $(modthickbuiltin-subdirs) >> $@;
> +
>  PHONY += FORCE
>
>  FORCE:
> @@ -52,6 +62,6 @@ FORCE:
>
>  PHONY += $(subdir-ym)
>  $(subdir-ym):
> -       $(Q)$(MAKE) $(modbuiltin)=$@
> +       $(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
>
>  .PHONY: $(PHONY)
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index ae6504d07fd6..f71432df09d8 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -5,7 +5,10 @@
>   * This software may be used and distributed according to the terms
>   * of the GNU General Public License, incorporated herein by reference.
>   *
> - * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
> + * Usage: nm -n vmlinux
> + *        | scripts/kallsyms [--all-symbols] [--absolute-percpu]
> + *             [--base-relative] [--builtin=modules_thick.builtin]
> + *        > symbols.S
>   *
>   *      Table compression uses all the unused char codes on the symbols and
>   *  maps these to the most used substrings (tokens). For instance, it might
> @@ -18,12 +21,15 @@
>   *
>   */
>
> +#define _GNU_SOURCE 1
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <ctype.h>
>  #include <limits.h>
>
> +#include <errno.h>
> +
>  #ifndef ARRAY_SIZE
>  #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
>  #endif
> @@ -36,6 +42,7 @@ struct sym_entry {
>         unsigned int start_pos;
>         unsigned char *sym;
>         unsigned int percpu_absolute;
> +       unsigned int module;
>  };
>
>  struct addr_range {
> @@ -69,10 +76,116 @@ static unsigned char best_table[256][2];
>  static unsigned char best_table_len[256];
>
>
> +static unsigned int strhash(const char *s)
> +{
> +       /* fnv32 hash */
> +       unsigned int hash = 2166136261U;
> +
> +       for (; *s; s++)
> +               hash = (hash ^ *s) * 0x01000193;
> +       return hash;
> +}
> +
> +#define OBJ2MOD_BITS 10
> +#define OBJ2MOD_N (1 << OBJ2MOD_BITS)
> +#define OBJ2MOD_MASK (OBJ2MOD_N - 1)
> +struct obj2mod_elem {
> +       char *obj;
> +       int mod;
> +       struct obj2mod_elem *next;
> +};
> +
> +static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
> +
> +static void obj2mod_init(void)
> +{
> +       memset(obj2mod, 0, sizeof(obj2mod));
> +}


Unneeded.

The .bss section is automatically zero-cleared by
operating system.  obj2mod is already zero-filled.




> +static void obj2mod_put(char *obj, int mod)

you can add 'const' to the 'char *'.

Same for obj2mod_get().


> +{
> +       int i = strhash(obj) & OBJ2MOD_MASK;
> +       struct obj2mod_elem *elem = malloc(sizeof(struct obj2mod_elem));
> +
> +       if (!elem) {
> +               fprintf(stderr, "kallsyms: out of memory\n");
> +               exit(1);
> +       }
> +
> +       elem->obj = strdup(obj);
> +       if (!elem->obj) {
> +               fprintf(stderr, "kallsyms: out of memory\n");
> +               free(elem);
> +               exit(1);
> +       }
> +
> +       elem->mod = mod;
> +       elem->next = obj2mod[i];
> +       obj2mod[i] = elem;
> +}
> +
> +static int obj2mod_get(char *obj)
> +{
> +       int i = strhash(obj) & OBJ2MOD_MASK;
> +       struct obj2mod_elem *elem;
> +
> +       for (elem = obj2mod[i]; elem; elem = elem->next)
> +               if (strcmp(elem->obj, obj) == 0)
> +                       return elem->mod;
> +       return 0;
> +}
> +
> +static void obj2mod_free(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < OBJ2MOD_N; i++) {
> +               struct obj2mod_elem *elem = obj2mod[i];
> +               struct obj2mod_elem *next;
> +
> +               while (elem) {
> +                       next = elem->next;
> +                       free(elem->obj);
> +                       free(elem);
> +                       elem = next;
> +               }
> +       }
> +}
> +
> +/*
> + * The builtin module names.  The "offset" points to the name as if
> + * all builtin module names were concatenated to a single string.
> + */
> +static unsigned int builtin_module_size;       /* number allocated */
> +static unsigned int builtin_module_len;                /* number assigned */
> +static char **builtin_modules;                 /* array of module names */
> +static unsigned int *builtin_module_offsets;   /* offset */
> +
> +/*
> + * modules_thick.builtin iteration state.
> + */
> +struct modules_thick_iter {
> +       FILE *f;
> +       char *line;
> +       size_t line_size;
> +};
> +
> +/*
> + * An ordered list of address ranges and how they map to built-in modules.
> + */
> +struct addrmap_entry {
> +       unsigned long long addr;
> +       unsigned long long size;
> +       unsigned int module;
> +};
> +static struct addrmap_entry *addrmap;
> +static int addrmap_num, addrmap_alloced;
> +
>  static void usage(void)
>  {
> -       fprintf(stderr, "Usage: kallsyms [--all-symbols] "
> -                       "[--base-relative] < in.map > out.S\n");
> +       fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
> +                       "[--base-relative] [--builtin=modules_thick.builtin] "
> +                       "< nm_vmlinux.out > symbols.S\n");
>         exit(1);
>  }
>
> @@ -107,11 +220,25 @@ static int check_symbol_range(const char *sym, unsigned long long addr,
>         return 1;
>  }
>
> +static int addrmap_compare(const void *keyp, const void *rangep)
> +{
> +       unsigned long long addr = *((const unsigned long long *)keyp);
> +       const struct addrmap_entry *range = (const struct addrmap_entry *)rangep;

Cast is uneeded since rangep is an opaque pointer.



> +
> +       if (addr < range->addr)
> +               return -1;
> +       if (addr < range->addr + range->size)
> +               return 0;
> +       return 1;
> +}
> +
>  static int read_symbol(FILE *in, struct sym_entry *s)
>  {
>         char sym[500], stype;
> -       int rc;
> +       int rc, init_scratch = 0;
> +       struct addrmap_entry *range;
>
> +read_another:
>         rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, sym);
>         if (rc != 3) {
>                 if (rc != EOF && fgets(sym, 500, in) == NULL)
> @@ -125,6 +252,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>                 return -1;
>         }
>
> +       /* skip the .init.scratch section */
> +       if (strcmp(sym, "__init_scratch_end") == 0) {
> +               init_scratch = 0;
> +               goto read_another;
> +       }
> +       if (strcmp(sym, "__init_scratch_begin") == 0)
> +               init_scratch = 1;
> +       if (init_scratch)
> +               goto read_another;


How is this hunk related?
I do not understand it from the commit log.

The address range check is done in symbol_valid().
I do not like to see different people adopt
different ways.


>         /* Ignore most absolute/undefined (?) symbols. */
>         if (strcmp(sym, "_text") == 0)
>                 _text = s->addr;
> @@ -154,6 +291,14 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>         else if (!strncmp(sym, ".LASANPC", 8))
>                 return -1;
>
> +       /* look up the builtin module this is part of (if any) */
> +       range = (struct addrmap_entry *) bsearch(&s->addr,

Unneeded cast because bsearch() returns an opaque pointer.



> +           addrmap, addrmap_num, sizeof(*addrmap), &addrmap_compare);
> +       if (range)
> +               s->module = builtin_module_offsets[range->module];
> +       else
> +               s->module = 0;
> +
>         /* include the type field in the symbol name, so that it gets
>          * compressed together */
>         s->len = strlen(sym) + 1;
> @@ -206,6 +351,8 @@ static int symbol_valid(struct sym_entry *s)
>                 "kallsyms_markers",
>                 "kallsyms_token_table",
>                 "kallsyms_token_index",
> +               "kallsyms_symbol_modules",
> +               "kallsyms_modules",
>
>         /* Exclude linker generated symbols which vary between passes */
>                 "_SDA_BASE_",           /* ppc */
> @@ -454,6 +601,19 @@ static void write_src(void)
>         for (i = 0; i < 256; i++)
>                 printf("\t.short\t%d\n", best_idx[i]);
>         printf("\n");
> +
> +       output_label("kallsyms_modules");
> +       for (i = 0; i < builtin_module_len; i++)
> +               printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
> +       printf("\n");


Output strings in plain text?

Did you consider the possibility for compression?



> +       for (i = 0; i < builtin_module_len; i++)
> +               free(builtin_modules[i]);
> +
> +       output_label("kallsyms_symbol_modules");
> +       for (i = 0; i < table_cnt; i++)
> +               printf("\t.int\t%d\n", table[i].module);
> +       printf("\n");
>  }
>
>
> @@ -738,23 +898,368 @@ static void record_relative_base(void)
>                         relative_base = table[i].addr;
>  }
>
> +/*
> + * Read a modules_thick.builtin file.
> + */
> +
> +/*
> + * Construct a modules_thick.builtin iterator.
> + */
> +static struct modules_thick_iter *
> +modules_thick_iter_new(const char *modules_thick_file)
> +{
> +       struct modules_thick_iter *i;
> +
> +       i = calloc(1, sizeof(struct modules_thick_iter));
> +       if (i == NULL)
> +               return NULL;
> +
> +       i->f = fopen(modules_thick_file, "r");
> +
> +       if (i->f == NULL) {
> +               fprintf(stderr, "Cannot open builtin module file %s: %s\n",
> +                       modules_thick_file, strerror(errno));
> +               return NULL;
> +       }
> +
> +       return i;
> +}
> +
> +/*
> + * Iterate, returning a new null-terminated array of object file names, and a
> + * new dynamically-allocated module name.  (The module name passed in is freed.)
> + *
> + * The array of object file names should be freed by the caller: the strings it
> + * points to are owned by the iterator, and should not be freed.
> + */
> +static char ** __attribute__((__nonnull__))
> +modules_thick_iter_next(struct modules_thick_iter *i, char **module_name)
> +{
> +       size_t npaths = 1;
> +       char **module_paths;
> +       char *last_slash;
> +       char *last_dot;
> +       char *trailing_linefeed;
> +       char *object_name = i->line;
> +       char *dash;
> +       int composite = 0;
> +
> +       /*
> +        * Read in all module entries, computing the suffixless, pathless name
> +        * of the module and building the next arrayful of object file names for
> +        * return.
> +        *
> +        * Modules can consist of multiple files: in this case, the portion
> +        * before the colon is the path to the module (as before): the portion
> +        * after the colon is a space-separated list of files that should be *
> +        * considered part of this module.  In this case, the portion before the
> +        * name is an "object file" that does not actually exist: it is merged
> +        * into built-in.a without ever being written out.
> +        *
> +        * All module names have - translated to _, to match what is done to the
> +        * names of the same things when built as modules.
> +        */
> +
> +       /*
> +        * Reinvocation of exhausted iterator. Return NULL, once.
> +        */
> +retry:
> +       if (getline(&i->line, &i->line_size, i->f) < 0) {
> +               if (ferror(i->f)) {
> +                       fprintf(stderr,
> +                               "Error reading from modules_thick file: %s\n",
> +                               strerror(errno));
> +                       exit(1);
> +               }
> +               rewind(i->f);
> +               return NULL;
> +       }
> +
> +       if (i->line[0] == '\0')
> +               goto retry;
> +
> +       /*
> +        * Slice the line in two at the colon, if any.  If there is anything
> +        * past the ': ', this is a composite module.  (We allow for no colon
> +        * for robustness, even though one should always be present.)
> +        */
> +       if (strchr(i->line, ':') != NULL) {
> +               char *name_start;
> +
> +               object_name = strchr(i->line, ':');
> +               *object_name = '\0';
> +               object_name++;
> +               name_start = object_name + strspn(object_name, " \n");
> +               if (*name_start != '\0') {
> +                       composite = 1;
> +                       object_name = name_start;
> +               }
> +       }
> +
> +       /*
> +        * Figure out the module name.
> +        */
> +       last_slash = strrchr(i->line, '/');
> +       last_slash = (!last_slash) ? i->line :
> +               last_slash + 1;
> +       free(*module_name);
> +       *module_name = strdup(last_slash);
> +       dash = *module_name;
> +
> +       while (dash != NULL) {
> +               dash = strchr(dash, '-');
> +               if (dash != NULL)
> +                       *dash = '_';
> +       }
> +
> +       last_dot = strrchr(*module_name, '.');
> +       if (last_dot != NULL)
> +               *last_dot = '\0';
> +
> +       trailing_linefeed = strchr(object_name, '\n');
> +       if (trailing_linefeed != NULL)
> +               *trailing_linefeed = '\0';
> +
> +       /*
> +        * Multifile separator? Object file names explicitly stated:
> +        * slice them up and shuffle them in.
> +        *
> +        * The array size may be an overestimate if any object file
> +        * names start or end with spaces (very unlikely) but cannot be
> +        * an underestimate.  (Check for it anyway.)
> +        */
> +       if (composite) {
> +               char *one_object;
> +
> +               for (npaths = 0, one_object = object_name;
> +                    one_object != NULL;
> +                    npaths++, one_object = strchr(one_object + 1, ' '))
> +                       ;
> +       }
> +
> +       module_paths = malloc((npaths + 1) * sizeof(char *));
> +       if (!module_paths) {
> +               fprintf(stderr, "%s: out of memory on module %s\n", __func__,
> +                       *module_name);
> +               exit(1);
> +       }
> +
> +       if (composite) {
> +               char *one_object;
> +               size_t i = 0;
> +
> +               while ((one_object = strsep(&object_name, " ")) != NULL) {
> +                       if (i >= npaths) {
> +                               fprintf(stderr, "%s: npaths overflow on module "
> +                                       "%s: this is a bug.\n", __func__,
> +                                       *module_name);
> +                               exit(1);
> +                       }
> +
> +                       module_paths[i++] = one_object;
> +               }
> +       } else
> +               module_paths[0] = i->line;      /* untransformed module name */
> +
> +       module_paths[npaths] = NULL;
> +
> +       return module_paths;
> +}
> +
> +/*
> + * Free an iterator. Can be called while iteration is underway, so even
> + * state that is freed at the end of iteration must be freed here too.
> + */
> +static void
> +modules_thick_iter_free(struct modules_thick_iter *i)
> +{
> +       if (i == NULL)
> +               return;
> +       fclose(i->f);
> +       free(i->line);
> +       free(i);
> +}
> +
> +/*
> + * Expand the builtin modules list.
> + */
> +static void expand_builtin_modules(void)
> +{
> +       builtin_module_size += 50;
> +
> +       builtin_modules = realloc(builtin_modules,
> +                                 sizeof(*builtin_modules) *
> +                                 builtin_module_size);
> +       builtin_module_offsets = realloc(builtin_module_offsets,
> +                                        sizeof(*builtin_module_offsets) *
> +                                        builtin_module_size);
> +
> +       if (!builtin_modules || !builtin_module_offsets) {
> +               fprintf(stderr, "kallsyms failure: out of memory.\n");
> +               exit(EXIT_FAILURE);
> +       }
> +}
> +
> +/*
> + * Add a single built-in module (possibly composed of many files) to the
> + * modules list.  Take the offset of the current module and return it
> + * (purely for simplicity's sake in the caller).
> + */
> +static size_t add_builtin_module(const char *module_name, char **module_paths,
> +                                size_t offset)
> +{
> +       /* map the module's object paths to the module offset */
> +       while (*module_paths) {
> +               obj2mod_put(*module_paths, builtin_module_len);
> +               module_paths++;
> +       }
> +
> +       /* add the module name */
> +       if (builtin_module_size <= builtin_module_len)
> +               expand_builtin_modules();
> +       builtin_modules[builtin_module_len] = strdup(module_name);
> +       builtin_module_offsets[builtin_module_len] = offset;
> +       builtin_module_len++;
> +
> +       return (offset + strlen(module_name) + 1);
> +}
> +
> +/*
> + * Read the linker map.
> + */
> +static void read_linker_map(void)
> +{
> +       unsigned long long addr, size;
> +       char obj[PATH_MAX+1];
> +       FILE *f = fopen(".tmp_vmlinux.ranges", "r");
> +
> +       if (!f) {
> +               fprintf(stderr, "Cannot open '.tmp_vmlinux.ranges'.\n");
> +               exit(1);
> +       }
> +
> +       addrmap_num = 0;
> +       addrmap_alloced = 4096;
> +       addrmap = malloc(sizeof(*addrmap) * addrmap_alloced);
> +       if (!addrmap)
> +               goto oom;
> +
> +       /*
> +        * For each address range (addr,size) and object, add to addrmap
> +        * the range and the built-in module to which the object maps.
> +        */
> +       while (fscanf(f, "%llx %llx %s\n", &addr, &size, obj) == 3) {
> +               int m = obj2mod_get(obj);
> +
> +               if (addr == 0 || size == 0 || m == 0)
> +                       continue;
> +
> +               if (addrmap_num >= addrmap_alloced) {
> +                       addrmap_alloced *= 2;
> +                       addrmap = realloc(addrmap,
> +                           sizeof(*addrmap) * addrmap_alloced);
> +                       if (!addrmap)
> +                               goto oom;
> +               }
> +
> +               addrmap[addrmap_num].addr = addr;
> +               addrmap[addrmap_num].size = size;
> +               addrmap[addrmap_num].module = m;
> +               addrmap_num++;
> +       }
> +       fclose(f);
> +       return;
> +
> +oom:
> +       fprintf(stderr, "kallsyms: out of memory\n");
> +       exit(1);
> +}
> +
> +/*
> + * Read "modules_thick.builtin" (the list of built-in modules).  Construct:
> + *   - builtin_modules: array of built-in-module names
> + *   - builtin_module_offsets: array of offsets that will later be
> + *       used to access a concatenated list of built-in-module names
> + *   - obj2mod: a temporary, many-to-one, hash mapping
> + *       from object-file paths to built-in-module names
> + * Read ".tmp_vmlinux.ranges" (the linker map).
> + *   - addrmap[] maps address ranges to built-in module names (using obj2mod)
> + */
> +static void read_modules(const char *modules_builtin)
> +{
> +       struct modules_thick_iter *i;
> +       size_t offset = 0;
> +       char *module_name = NULL;
> +       char **module_paths;
> +
> +       obj2mod_init();
> +
> +       /*
> +        * builtin_modules[0] is a null entry signifying a symbol that cannot be
> +        * modular.
> +        */
> +       builtin_module_size = 50;
> +       builtin_modules = malloc(sizeof(*builtin_modules) *
> +                                builtin_module_size);
> +       builtin_module_offsets = malloc(sizeof(*builtin_module_offsets) *
> +                                builtin_module_size);
> +       if (!builtin_modules || !builtin_module_offsets) {
> +               fprintf(stderr, "kallsyms: out of memory\n");
> +               exit(1);
> +       }
> +       builtin_modules[0] = strdup("");
> +       builtin_module_offsets[0] = 0;
> +       builtin_module_len = 1;
> +       offset++;
> +
> +       /*
> +        * Iterate over all modules in modules_thick.builtin and add each.
> +        */
> +       i = modules_thick_iter_new(modules_builtin);
> +       if (i == NULL) {
> +               fprintf(stderr, "Cannot iterate over builtin modules.\n");
> +               exit(1);
> +       }
> +
> +       while ((module_paths = modules_thick_iter_next(i, &module_name))) {
> +               offset = add_builtin_module(module_name, module_paths, offset);
> +               free(module_paths);
> +               module_paths = NULL;
> +       }
> +
> +       free(module_name);
> +       modules_thick_iter_free(i);
> +
> +       /*
> +        * Read linker map.
> +        */
> +       read_linker_map();
> +
> +       obj2mod_free();
> +}
> +
>  int main(int argc, char **argv)
>  {
> +       const char *modules_builtin = "modules_thick.builtin";
> +
>         if (argc >= 2) {
>                 int i;
>                 for (i = 1; i < argc; i++) {
> -                       if(strcmp(argv[i], "--all-symbols") == 0)
> +                       if (strcmp(argv[i], "--all-symbols") == 0)
>                                 all_symbols = 1;
>                         else if (strcmp(argv[i], "--absolute-percpu") == 0)
>                                 absolute_percpu = 1;
>                         else if (strcmp(argv[i], "--base-relative") == 0)
>                                 base_relative = 1;
> +                       else if (strncmp(argv[i], "--builtin=", 10) == 0)
> +                               modules_builtin = &argv[i][10];


".tmp_vmlinux.ranges" is hard-coded, but
"modules_think.builtin" can be changed via option. Heh.



>                         else
>                                 usage();
>                 }
>         } else if (argc != 1)
>                 usage();
>
> +       read_modules(modules_builtin);
>         read_map(stdin);
>         if (absolute_percpu)
>                 make_percpus_absolute();
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 06495379fcd8..e4d5a98133e7 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -76,6 +76,7 @@ vmlinux_link()
>                         --start-group                           \
>                         ${KBUILD_VMLINUX_LIBS}                  \
>                         --end-group                             \
> +                       -Map=.tmp_vmlinux.map                   \
>                         ${@}"
>
>                 ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}      \
> @@ -88,6 +89,7 @@ vmlinux_link()
>                         -Wl,--start-group                       \
>                         ${KBUILD_VMLINUX_LIBS}                  \
>                         -Wl,--end-group                         \
> +                       -Wl,-Map=.tmp_vmlinux.map               \
>                         ${@}"
>
>                 ${CC} ${CFLAGS_vmlinux}                         \
> @@ -138,6 +140,19 @@ kallsyms()
>         info KSYM ${2}
>         local kallsymopt;
>
> +       # read the linker map to identify ranges of addresses:
> +       #   - for each *.o file, report address, size, pathname
> +       #       - most such lines will have four fields
> +       #       - but sometimes there is a line break after the first field
> +       #   - start reading at "Linker script and memory map"


Searching for "Linker script and memory map" will probably bring
portability issue.

llvm folks will be unhappy with it.




> +       #   - stop reading at ".brk"
> +       ${AWK} '
> +           /\.o$/ && start==1 { print $(NF-2), $(NF-1), $NF }
> +           /^Linker script and memory map/ { start = 1 }
> +           /^\.brk/ { exit(0) }
> +       ' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
> +
> +       # get kallsyms options
>         if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
>                 kallsymopt="${kallsymopt} --all-symbols"
>         fi
> @@ -150,11 +165,13 @@ kallsyms()
>                 kallsymopt="${kallsymopt} --base-relative"
>         fi
>
> +       # set up compilation
>         local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
>                       ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
>
>         local afile="`basename ${2} .o`.S"
>
> +       # construct file and compile
>         ${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${afile}
>         ${CC} ${aflags} -c -o ${2} ${afile}
>  }
> diff --git a/scripts/namespace.pl b/scripts/namespace.pl
> index 1da7bca201a4..4c7615e720de 100755
> --- a/scripts/namespace.pl
> +++ b/scripts/namespace.pl
> @@ -120,6 +120,11 @@ my %nameexception = (
>      'kallsyms_addresses'=> 1,
>      'kallsyms_offsets' => 1,
>      'kallsyms_relative_base'=> 1,
> +    'kallsyms_token_table'=> 1,
> +    'kallsyms_token_index'=> 1,
> +    'kallsyms_markers' => 1,
> +    'kallsyms_modules' => 1,
> +    'kallsyms_symbol_modules'=> 1,
>      '__this_module'    => 1,
>      '_etext'           => 1,
>      '_edata'           => 1,
> --
> 2.18.1
>


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] kallsyms: add names of built-in modules
  2019-11-22 10:00             ` Masahiro Yamada
@ 2019-11-22 15:23               ` Nick Alcock
  2019-11-22 17:04                 ` Eugene Loh
  2019-12-10 17:45               ` Eugene Loh
  1 sibling, 1 reply; 21+ messages in thread
From: Nick Alcock @ 2019-11-22 15:23 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: eugene.loh, Steven Rostedt, Jonathan Corbet, Michal Marek,
	Jessica Yu, Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Nick Alcock

On 22 Nov 2019, Masahiro Yamada stated:

> On Wed, Nov 20, 2019 at 2:02 PM <eugene.loh@oracle.com> wrote:
>>
>> From: Eugene Loh <eugene.loh@oracle.com>
>>
>> /proc/kallsyms is very useful for tracers and other tools that need
>> to map kernel symbols to addresses.
>>
>> It would be useful if there were a mapping between kernel symbol and
>> module name that only changed when the kernel source code is changed.
>
> Unfortunately, this is not necessarily true.
>
> Some objects could be linked into multiple modules.

Agreed, though this is fairly rare.

> CONFIG_LIQUIDIO=y && CONFIG_LIQUIDIO_VF=n  is OK
> CONFIG_LIQUIDIO=n && CONFIG_LIQUIDIO_VF=y  is OK
>
> The symbol-to-modname mapping changes depending on
> the .config though.

I don't see a way to avoid that: if you compile only one of the
constituent modules in, that module is now the only source of that
symbol: if you compile only the other one in, *it* is. It would be nice
if we could unambiguously find a way to identify a single module as the
source in this case, but it's not really possible, because you could
always choose to not compile one of the modules at all, and then it
would be unambiguously wrong to identify it as the source of the symbol.
So the mapping will always change if you do that.

The stability I was aiming for was in the common case: identifying
symbols as belonging to a module where those symbols are linked into
only that module, without regard to whether the module is built-in or
modular in this configuration.

I'd say we can progress to handling the nastier multiple-linkage case
incrementally, by starting from where this patch leaves us: it leaves us
in a better place to do so than we were in before it landed.

> It is hard to say which particular module the symbol came from.

Exactly.

> As far as I tested this patch, it seems it picked up a
> random one?

It'll get the one that's first by address, I think, which of course is
more or less random or at least we don't try to make it stable. I would
prefer if it picked up a constant one when a given set of modules are
competing for the same symbol, but doing that without making everything
much slower or more complex for a fairly obscure edge case is tricky.
(Also, I thought this was only a theoretical edge case: thanks for
highlighting some real instances of this!)

(If you can think of a way to do this which doesn't slow down the common
case of symbols only owned by one module, I would be very happy indeed.)

>>  scripts/kallsyms.c          | 515 +++++++++++++++++++++++++++++++++++-
>>  scripts/link-vmlinux.sh     |  17 ++
>>  scripts/namespace.pl        |   5 +
>>  8 files changed, 589 insertions(+), 23 deletions(-)
>
>
> This diff-stat is unfortunate.
> scripts/kallsyms.c increased 65% for parsing
> .tmp_vmlinux.ranges and modules_think.builtin
>
> I tend to suspect the design mistake...

When I wrote this patch, that was in a separate source file (and used by
other things that aren't upstream yet), but that was folded into
scripts/kallsyms.c for this submission. We can split it back out again
easily if you like.

> I tested this patch on x86_64_defconfig
> It also increases 24% of kallsyms data.
>
> The data increase is  outstanding compared with the
> amount of information added.

I would be very happy to find a more compact representation. Right now
this errs on the side of being similar to what kallsyms is already
doing. Fundamentally this is hard to encode compactly, though: range
tables (in either direction, symbol -> module or module -> set of
symbols) are more or less useless because the sybmols for a given module
are so scattered throughout the kernel's symbols. We could probably use
a smaller datatype than an int for kallsyms_symbol_modules in many
cases, I suppose, unless the kernel is truly huge: that would save some
space in the uncompressed output, though not in the vmlinuz. Any other
suggestions would be much appreciated.

>> diff --git a/Documentation/dontdiff b/Documentation/dontdiff
>> index 9f4392876099..32ee05f91410 100644
>> --- a/Documentation/dontdiff
>> +++ b/Documentation/dontdiff
>> @@ -180,6 +180,7 @@ modpost
>>  modules.builtin
>>  modules.builtin.modinfo
>>  modules.order
>> +modules_thick.builtin
>>  modversions.h*
>>  nconf
>>  nconf-cfg
>
> Most people missed to add this.
> I think you took time for internal review.

I grepped for 'modules.builtin' to make sure I caught all the places it
was used :)

>> +static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
>> +
>> +static void obj2mod_init(void)
>> +{
>> +       memset(obj2mod, 0, sizeof(obj2mod));
>> +}
>
>
> Unneeded.
>
> The .bss section is automatically zero-cleared by
> operating system.  obj2mod is already zero-filled.

Agreed (with the caveat that I'm not the one shepherding this patch
through, but I hope Eugene agrees as well. :) )

>> +static void obj2mod_put(char *obj, int mod)
>
> you can add 'const' to the 'char *'.
>
> Same for obj2mod_get().

Agreed.

>> +static int addrmap_compare(const void *keyp, const void *rangep)
>> +{
>> +       unsigned long long addr = *((const unsigned long long *)keyp);
>> +       const struct addrmap_entry *range = (const struct addrmap_entry *)rangep;
>
> Cast is uneeded since rangep is an opaque pointer.

Overuse-of-C++ disease, sorry. (I can never remember whether it is C or
C++ that wants casts here, so I tend to err on the side of always
putting them in :) ).

>> +       /* skip the .init.scratch section */
>> +       if (strcmp(sym, "__init_scratch_end") == 0) {
>> +               init_scratch = 0;
>> +               goto read_another;
>> +       }
>> +       if (strcmp(sym, "__init_scratch_begin") == 0)
>> +               init_scratch = 1;
>> +       if (init_scratch)
>> +               goto read_another;
>
>
> How is this hunk related?
> I do not understand it from the commit log.

Ah, sorry, this was a recent addition to work around problems in the
parsing side in userspace: it really doesn't belong here, I suspect.

(Our userspace-side parsing code (which was getting confused about the
relatively recently-added presence of symbols that were neither 'all
address in order, modules jumbled together' nor 'addresses not in order,
one module at a time'. I suspect we should adjust to that instead -- but
I didn't make that change so I'm not really an authoritative source
here. Eugene?)

> The address range check is done in symbol_valid().
> I do not like to see different people adopt
> different ways.

Agreed.

>>         /* Ignore most absolute/undefined (?) symbols. */
>>         if (strcmp(sym, "_text") == 0)
>>                 _text = s->addr;
>> @@ -154,6 +291,14 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>>         else if (!strncmp(sym, ".LASANPC", 8))
>>                 return -1;
>>
>> +       /* look up the builtin module this is part of (if any) */
>> +       range = (struct addrmap_entry *) bsearch(&s->addr,
>
> Unneeded cast because bsearch() returns an opaque pointer.

More C-is-not-C++ confusion on my part, mea culpa!

>> @@ -454,6 +601,19 @@ static void write_src(void)
>>         for (i = 0; i < 256; i++)
>>                 printf("\t.short\t%d\n", best_idx[i]);
>>         printf("\n");
>> +
>> +       output_label("kallsyms_modules");
>> +       for (i = 0; i < builtin_module_len; i++)
>> +               printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
>> +       printf("\n");
>
> Output strings in plain text?
>
> Did you consider the possibility for compression?

It's kinda hard, given that the table is accessed more or less at
random: any compression would need to be something that applied within
each module name, and there aren't *that* many module names. Bear in
mind that any given module name appears only once. I suspect the code to
do the decompression on the kernel side would be bigger than the space
savings.

This table isn't very big: the big one (and the incompressible one!) is
kallsyms_symbol_modules.

>> +       for (i = 0; i < builtin_module_len; i++)
>> +               free(builtin_modules[i]);
>> +
>> +       output_label("kallsyms_symbol_modules");
>> +       for (i = 0; i < table_cnt; i++)
>> +               printf("\t.int\t%d\n", table[i].module);

*This* is the one that optimization efforts should focus on. If anyone
can think of any. :)

>> +                       else if (strncmp(argv[i], "--builtin=", 10) == 0)
>> +                               modules_builtin = &argv[i][10];
>
> ".tmp_vmlinux.ranges" is hard-coded, but
> "modules_think.builtin" can be changed via option. Heh.

Code residue: at one point during development the kallmodsyms build
process was spinning over several versions of modules_thick.builtins and
unifying them. It doesn't do that any more, so I would be quite happy to
tear this thoroughly useless piece of customizability out. :)

>> +       # read the linker map to identify ranges of addresses:
>> +       #   - for each *.o file, report address, size, pathname
>> +       #       - most such lines will have four fields
>> +       #       - but sometimes there is a line break after the first field
>> +       #   - start reading at "Linker script and memory map"
>
> Searching for "Linker script and memory map" will probably bring
> portability issue.
>
> llvm folks will be unhappy with it.

We can easily search for something LLVM-suitable too. (But I suspect
we'd need an entirely different bit of awk for LLVM's map output
format.)

(I note that if this does differ much between LLVM and GCC, LLVM will be
incapable of building glibc's ld.so...)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] kallsyms: add names of built-in modules
  2019-11-22 15:23               ` Nick Alcock
@ 2019-11-22 17:04                 ` Eugene Loh
  0 siblings, 0 replies; 21+ messages in thread
From: Eugene Loh @ 2019-11-22 17:04 UTC (permalink / raw)
  To: Nick Alcock, Masahiro Yamada
  Cc: Steven Rostedt, Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees

On 11/22/2019 07:23 AM, Nick Alcock wrote:

> On 22 Nov 2019, Masahiro Yamada stated:

Thank you for the careful review.  I await your feedback on Nick's 
comments.  In particular, you raise good challenge cases, but like Nick 
I wonder if "better" might be good enough.

Meanwhile, ...

>> On Wed, Nov 20, 2019 at 2:02 PM <eugene.loh@oracle.com> wrote:
>>> From: Eugene Loh <eugene.loh@oracle.com>
>>> +static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
>>> +
>>> +static void obj2mod_init(void)
>>> +{
>>> +       memset(obj2mod, 0, sizeof(obj2mod));
>>> +}
>>
>> Unneeded.
>>
>> The .bss section is automatically zero-cleared by
>> operating system.  obj2mod is already zero-filled.
> Agreed (with the caveat that I'm not the one shepherding this patch
> through, but I hope Eugene agrees as well. :) )

No caveat needed:  I'm on board!

>>> +       /* skip the .init.scratch section */
>>> +       if (strcmp(sym, "__init_scratch_end") == 0) {
>>> +               init_scratch = 0;
>>> +               goto read_another;
>>> +       }
>>> +       if (strcmp(sym, "__init_scratch_begin") == 0)
>>> +               init_scratch = 1;
>>> +       if (init_scratch)
>>> +               goto read_another;
>>
>> How is this hunk related?
>> I do not understand it from the commit log.

Right.  It's not described in the commit message.  I will pull this code 
out in the next version of this patch.

> Ah, sorry, this was a recent addition to work around problems in the
> parsing side in userspace: it really doesn't belong here, I suspect.
>
> (Our userspace-side parsing code (which was getting confused about the
> relatively recently-added presence of symbols that were neither 'all
> address in order, modules jumbled together' nor 'addresses not in order,
> one module at a time'. I suspect we should adjust to that instead -- but
> I didn't make that change so I'm not really an authoritative source
> here. Eugene?)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] kallsyms: add names of built-in modules
  2019-11-22 10:00             ` Masahiro Yamada
  2019-11-22 15:23               ` Nick Alcock
@ 2019-12-10 17:45               ` Eugene Loh
  2019-12-10 17:48                 ` [PATCH v4] " eugene.loh
  1 sibling, 1 reply; 21+ messages in thread
From: Eugene Loh @ 2019-12-10 17:45 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Steven Rostedt, Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Nick Alcock

Thank you for the careful review.  Sorry for the delayed response.  I 
will post a v4 momentarily.  Further comments below.

On 11/22/2019 02:00 AM, Masahiro Yamada wrote:

> On Wed, Nov 20, 2019 at 2:02 PM <eugene.loh@oracle.com> wrote:
>> From: Eugene Loh <eugene.loh@oracle.com>
>>
>> /proc/kallsyms is very useful for tracers and other tools that need
>> to map kernel symbols to addresses.
>>
>> It would be useful if there were a mapping between kernel symbol and
>> module name that only changed when the kernel source code is changed.
> Unfortunately, this is not necessarily true.
> Some objects could be linked into multiple modules.

Good point but, as Nick pointed out, at least we can solve the common 
case.  I have added a remark to that effect to the commit message.

>>   .gitignore                  |   1 +
>>   Documentation/dontdiff      |   1 +
>>   Makefile                    |  41 ++-
>>   kernel/kallsyms.c           |  12 +-
>>   scripts/Makefile.modbuiltin |  20 +-
>>   scripts/kallsyms.c          | 515 +++++++++++++++++++++++++++++++++++-
>>   scripts/link-vmlinux.sh     |  17 ++
>>   scripts/namespace.pl        |   5 +
>>   8 files changed, 589 insertions(+), 23 deletions(-)
> This diff-stat is unfortunate.
> scripts/kallsyms.c increased 65% for parsing
> .tmp_vmlinux.ranges and modules_think.builtin
>
> I tend to suspect the design mistake...

I have cleaned the changes up some and moved one portion, that can be 
used for other purposes, into a separate file.  So the delta on 
scripts/kallsyms.c has been decreased some.

>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>> @@ -69,10 +76,116 @@ static unsigned char best_table[256][2];
>> +static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
>> +
>> +static void obj2mod_init(void)
>> +{
>> +       memset(obj2mod, 0, sizeof(obj2mod));
>> +}
> Unneeded.
>
> The .bss section is automatically zero-cleared by
> operating system.  obj2mod is already zero-filled.

Thanks.  Change made.

>> +static void obj2mod_put(char *obj, int mod)
> you can add 'const' to the 'char *'.
> Same for obj2mod_get().

Thanks.  Change made.

>> +static int addrmap_compare(const void *keyp, const void *rangep)
>> +{
>> +       unsigned long long addr = *((const unsigned long long *)keyp);
>> +       const struct addrmap_entry *range = (const struct addrmap_entry *)rangep;
> Cast is uneeded since rangep is an opaque pointer.

Thanks.  Change made.

>> @@ -125,6 +252,16 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>> +       /* skip the .init.scratch section */
>> +       if (strcmp(sym, "__init_scratch_end") == 0) {
>> +               init_scratch = 0;
>> +               goto read_another;
>> +       }
>> +       if (strcmp(sym, "__init_scratch_begin") == 0)
>> +               init_scratch = 1;
>> +       if (init_scratch)
>> +               goto read_another;
> How is this hunk related?
> I do not understand it from the commit log.

I removed this section.

>> @@ -154,6 +291,14 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>>          else if (!strncmp(sym, ".LASANPC", 8))
>>                  return -1;
>>
>> +       /* look up the builtin module this is part of (if any) */
>> +       range = (struct addrmap_entry *) bsearch(&s->addr,
> Unneeded cast because bsearch() returns an opaque pointer.

Thanks.  Change made.

>> @@ -454,6 +601,19 @@ static void write_src(void)
>>          for (i = 0; i < 256; i++)
>>                  printf("\t.short\t%d\n", best_idx[i]);
>>          printf("\n");
>> +
>> +       output_label("kallsyms_modules");
>> +       for (i = 0; i < builtin_module_len; i++)
>> +               printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
>> +       printf("\n");
> Output strings in plain text?
> Did you consider the possibility for compression?

As Nick pointed out, these names (one per module) only add a little 
extra to the size.


>>   int main(int argc, char **argv)
>>   {
>> +       const char *modules_builtin = "modules_thick.builtin";
>> +
>>          if (argc >= 2) {
>>                  int i;
>>                  for (i = 1; i < argc; i++) {
>> -                       if(strcmp(argv[i], "--all-symbols") == 0)
>> +                       if (strcmp(argv[i], "--all-symbols") == 0)
>>                                  all_symbols = 1;
>>                          else if (strcmp(argv[i], "--absolute-percpu") == 0)
>>                                  absolute_percpu = 1;
>>                          else if (strcmp(argv[i], "--base-relative") == 0)
>>                                  base_relative = 1;
>> +                       else if (strncmp(argv[i], "--builtin=", 10) == 0)
>> +                               modules_builtin = &argv[i][10];
>
> ".tmp_vmlinux.ranges" is hard-coded, but
> "modules_think.builtin" can be changed via option. Heh.

Good point.  I removed the unneeded option.

>> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
>> @@ -138,6 +140,19 @@ kallsyms()
>> +       # read the linker map to identify ranges of addresses:
>> +       #   - for each *.o file, report address, size, pathname
>> +       #       - most such lines will have four fields
>> +       #       - but sometimes there is a line break after the first field
>> +       #   - start reading at "Linker script and memory map"
> Searching for "Linker script and memory map" will probably bring
> portability issue.
>
> llvm folks will be unhappy with it.

Actually, LLVM emits the same string.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4] kallsyms: add names of built-in modules
  2019-12-10 17:45               ` Eugene Loh
@ 2019-12-10 17:48                 ` eugene.loh
  2019-12-18 23:55                   ` Eugene Loh
  0 siblings, 1 reply; 21+ messages in thread
From: eugene.loh @ 2019-12-10 17:48 UTC (permalink / raw)
  To: eugene.loh
  Cc: rostedt, corbet, yamada.masahiro, michal.lkml, jeyu,
	linux-kbuild, maz, songliubraving, tglx, jacob.e.keller,
	Kris Van Hees, Nick Alcock

From: Eugene Loh <eugene.loh@oracle.com>

/proc/kallsyms is very useful for tracers and other tools that need
to map kernel symbols to addresses.

It would be useful if there were a mapping between kernel symbol and
module name that only changed when the kernel source code is changed.
This mapping should not vanish simply because a module becomes built
into the kernel.

Therefore:

- Generate a file "modules_thick.builtin" that maps from thin
  archives that make up built-in modules to their constituent
  object files.

- Generate a linker map ".tmp_vmlinux.map", converting it into
  ".tmp_vmlinux.ranges", mapping address ranges to object files.

- Read "modules_thick.builtin" and ".tmp_vmlinux.ranges" to
  map symbol addresses to built-in-module names.  Write those
  module names (kallsyms_modules) and that per-symbol module
  information (kallsyms_symbol_modules) to the *.s output file.

- Use kallsyms_modules and kallsyms_symbol_modules to add
  built-in-module information to /proc/kallsyms.

Note that kernel symbols for built-in modules appear in ascending
order by address, as usual, and thus will appear interspersed with
symbols that are part of other built-in modules or of the kernel.

Also, while it is possible for an object to appear in multiple
built-in modules, making an unambiguous mapping of symbol to module
impossible in such cases, this patch addresses the typical cases.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
---
 .gitignore                  |   1 +
 Documentation/dontdiff      |   1 +
 Makefile                    |  41 +++--
 kernel/kallsyms.c           |  12 +-
 scripts/Makefile            |   5 +
 scripts/Makefile.modbuiltin |  20 ++-
 scripts/kallsyms.c          | 298 +++++++++++++++++++++++++++++++++++-
 scripts/link-vmlinux.sh     |  17 ++
 scripts/modules_thick.c     | 104 +++++++++++++
 scripts/modules_thick.h     |  27 ++++
 scripts/namespace.pl        |   5 +
 11 files changed, 509 insertions(+), 22 deletions(-)
 create mode 100644 scripts/modules_thick.c
 create mode 100644 scripts/modules_thick.h

diff --git a/.gitignore b/.gitignore
index 72ef86a5570d..0b9c88f1d388 100644
--- a/.gitignore
+++ b/.gitignore
@@ -46,6 +46,7 @@
 Module.symvers
 modules.builtin
 modules.order
+modules_thick.builtin
 
 #
 # Top-level generic files
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 72fc2e9e2b63..9d0db2ef3a51 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -181,6 +181,7 @@ modules.builtin
 modules.builtin.modinfo
 modules.nsdeps
 modules.order
+modules_thick.builtin
 modversions.h*
 nconf
 nconf-cfg
diff --git a/Makefile b/Makefile
index 73e3c2802927..430d49d3a93e 100644
--- a/Makefile
+++ b/Makefile
@@ -1073,7 +1073,7 @@ cmd_link-vmlinux =                                                 \
 	$(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
-vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
+vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
 	+$(call if_changed,link-vmlinux)
 
 targets := vmlinux
@@ -1284,17 +1284,6 @@ modules: $(if $(KBUILD_BUILTIN),vmlinux) modules.order modules.builtin
 modules.order: descend
 	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
 
-modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
-
-modules.builtin: $(modbuiltin-dirs)
-	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
-
-PHONY += $(modbuiltin-dirs)
-# tristate.conf is not included from this Makefile. Add it as a prerequisite
-# here to make it self-healing in case somebody accidentally removes it.
-$(modbuiltin-dirs): include/config/tristate.conf
-	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@)
-
 # Target to prepare building external modules
 PHONY += modules_prepare
 modules_prepare: prepare
@@ -1347,6 +1336,33 @@ modules modules_install:
 
 endif # CONFIG_MODULES
 
+# modules.builtin has a 'thick' form which maps from kernel modules (or rather
+# the object file names they would have had had they not been built in) to their
+# constituent object files: kallsyms uses this to determine which modules any
+# given object file is part of.  (We cannot eliminate the slight redundancy
+# here without double-expansion.)
+
+modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
+
+modbuiltin-thick-dirs := $(addprefix _modbuiltin_thick_, $(build-dirs))
+
+modules.builtin: $(modbuiltin-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+modules_thick.builtin: $(modbuiltin-thick-dirs)
+	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
+
+PHONY += $(modbuiltin-dirs) $(modbuiltin-thick-dirs)
+# tristate.conf is not included from this Makefile. Add it as a prerequisite
+# here to make it self-healing in case somebody accidentally removes it.
+$(modbuiltin-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@) \
+			builtin-file=modules.builtin
+
+$(modbuiltin-thick-dirs): include/config/tristate.conf
+	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_thick_%,%,$@) \
+			builtin-file=modules_thick.builtin
+
 ###
 # Cleaning is done on three levels.
 # make clean     Delete most generated files
@@ -1712,6 +1728,7 @@ clean: $(clean-dirs)
 		-o -name '*.asn1.[ch]' \
 		-o -name '*.symtypes' -o -name 'modules.order' \
 		-o -name modules.builtin -o -name '.tmp_*.o.*' \
+		-o -name modules_thick.builtin \
 		-o -name '*.c.[012]*.*' \
 		-o -name '*.ll' \
 		-o -name '*.gcno' \) -type f -print | xargs rm -f
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 136ce049c4ad..ce8576503e35 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -46,6 +46,8 @@ __attribute__((weak, section(".rodata")));
 
 extern const u8 kallsyms_token_table[] __weak;
 extern const u16 kallsyms_token_index[] __weak;
+extern const char kallsyms_modules[] __weak;
+extern const u32 kallsyms_symbol_modules[] __weak;
 
 extern const unsigned int kallsyms_markers[] __weak;
 
@@ -508,8 +510,16 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
 static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
 {
 	unsigned off = iter->nameoff;
+	u32 mod_index = 0;
 
-	iter->module_name[0] = '\0';
+	if (kallsyms_symbol_modules)
+		mod_index = kallsyms_symbol_modules[iter->pos];
+
+	if (mod_index == 0 || kallsyms_modules == NULL)
+		iter->module_name[0] = '\0';
+	else
+		strcpy(iter->module_name, &kallsyms_modules[mod_index]);
+	iter->exported = 0;
 	iter->value = kallsyms_sym_address(iter->pos);
 
 	iter->type = kallsyms_get_symbol_type(off);
diff --git a/scripts/Makefile b/scripts/Makefile
index 00c47901cb06..44641cabb261 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -26,6 +26,11 @@ HOSTLDLIBS_extract-cert = -lcrypto
 
 always		:= $(hostprogs-y) $(hostprogs-m)
 
+kallsyms-objs	:= kallsyms.o
+kallsyms-objs	+= modules_thick.o
+
+HOSTCFLAGS_modules_thick.o := -I$(srctree)/scripts
+HOSTCFLAGS_kallsyms.o := -I$(srctree)/scripts
 # The following hostprogs-y programs are only build on demand
 hostprogs-y += unifdef
 
diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
index 7d4711b88656..06f31e58111e 100644
--- a/scripts/Makefile.modbuiltin
+++ b/scripts/Makefile.modbuiltin
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 # ==========================================================================
-# Generating modules.builtin
+# Generating modules.builtin and modules_thick.builtin
 # ==========================================================================
 
 src := $(obj)
@@ -30,19 +30,29 @@ __subdir-Y     := $(patsubst %/,%,$(filter %/, $(obj-Y)))
 subdir-Y       += $(__subdir-Y)
 subdir-ym      := $(sort $(subdir-y) $(subdir-Y) $(subdir-m))
 subdir-ym      := $(addprefix $(obj)/,$(subdir-ym))
-obj-Y          := $(addprefix $(obj)/,$(obj-Y))
+pathobj-Y      := $(addprefix $(obj)/,$(obj-Y))
 
 modbuiltin-subdirs := $(patsubst %,%/modules.builtin, $(subdir-ym))
-modbuiltin-mods    := $(filter %.ko, $(obj-Y:.o=.ko))
+modbuiltin-mods    := $(filter %.ko, $(pathobj-Y:.o=.ko))
 modbuiltin-target  := $(obj)/modules.builtin
+modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
+modthickbuiltin-target  := $(obj)/modules_thick.builtin
 
-__modbuiltin: $(modbuiltin-target) $(subdir-ym)
+__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
 	@:
 
 $(modbuiltin-target): $(subdir-ym) FORCE
 	$(Q)(for m in $(modbuiltin-mods); do echo $$m; done;	\
 	cat /dev/null $(modbuiltin-subdirs)) > $@
 
+$(modthickbuiltin-target): $(subdir-ym) FORCE
+	$(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
+		printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
+		printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
+			$($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
+		printf "\n" >> $@; ) \
+	cat /dev/null $(modthickbuiltin-subdirs) >> $@;
+
 PHONY += FORCE
 
 FORCE:
@@ -52,6 +62,6 @@ FORCE:
 
 PHONY += $(subdir-ym)
 $(subdir-ym):
-	$(Q)$(MAKE) $(modbuiltin)=$@
+	$(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
 
 .PHONY: $(PHONY)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index fb55f262f42d..7368996e5d7b 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -5,7 +5,8 @@
  * This software may be used and distributed according to the terms
  * of the GNU General Public License, incorporated herein by reference.
  *
- * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
+ * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols]
+ *                   [--absolute-percpu] [--base-relative] > symbols.S
  *
  *      Table compression uses all the unused char codes on the symbols and
  *  maps these to the most used substrings (tokens). For instance, it might
@@ -18,12 +19,15 @@
  *
  */
 
+#define _GNU_SOURCE 1
 #include <stdbool.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <ctype.h>
 #include <limits.h>
+#include <errno.h>
+#include "modules_thick.h"
 
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
 
@@ -35,6 +39,7 @@ struct sym_entry {
 	unsigned int start_pos;
 	unsigned char *sym;
 	unsigned int percpu_absolute;
+	unsigned int module;
 };
 
 struct addr_range {
@@ -68,10 +73,101 @@ static unsigned char best_table[256][2];
 static unsigned char best_table_len[256];
 
 
+static unsigned int strhash(const char *s)
+{
+	/* fnv32 hash */
+	unsigned int hash = 2166136261U;
+
+	for (; *s; s++)
+		hash = (hash ^ *s) * 0x01000193;
+	return hash;
+}
+
+#define OBJ2MOD_BITS 10
+#define OBJ2MOD_N (1 << OBJ2MOD_BITS)
+#define OBJ2MOD_MASK (OBJ2MOD_N - 1)
+struct obj2mod_elem {
+	char *obj;
+	int mod;
+	struct obj2mod_elem *next;
+};
+
+static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
+
+static void obj2mod_put(const char *obj, int mod)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem = malloc(sizeof(struct obj2mod_elem));
+
+	if (!elem) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		exit(1);
+	}
+
+	elem->obj = strdup(obj);
+	if (!elem->obj) {
+		fprintf(stderr, "kallsyms: out of memory\n");
+		free(elem);
+		exit(1);
+	}
+
+	elem->mod = mod;
+	elem->next = obj2mod[i];
+	obj2mod[i] = elem;
+}
+
+static int obj2mod_get(const char *obj)
+{
+	int i = strhash(obj) & OBJ2MOD_MASK;
+	struct obj2mod_elem *elem;
+
+	for (elem = obj2mod[i]; elem; elem = elem->next)
+		if (strcmp(elem->obj, obj) == 0)
+			return elem->mod;
+	return 0;
+}
+
+static void obj2mod_free(void)
+{
+	int i;
+
+	for (i = 0; i < OBJ2MOD_N; i++) {
+		struct obj2mod_elem *elem = obj2mod[i];
+		struct obj2mod_elem *next;
+
+		while (elem) {
+			next = elem->next;
+			free(elem->obj);
+			free(elem);
+			elem = next;
+		}
+	}
+}
+
+/*
+ * The builtin module names.  The "offset" points to the name as if
+ * all builtin module names were concatenated to a single string.
+ */
+static unsigned int builtin_module_size;	/* number allocated */
+static unsigned int builtin_module_len;		/* number assigned */
+static char **builtin_modules;			/* array of module names */
+static unsigned int *builtin_module_offsets;	/* offset */
+
+/*
+ * An ordered list of address ranges and how they map to built-in modules.
+ */
+struct addrmap_entry {
+	unsigned long long addr;
+	unsigned long long size;
+	unsigned int module;
+};
+static struct addrmap_entry *addrmap;
+static int addrmap_num, addrmap_alloced;
+
 static void usage(void)
 {
-	fprintf(stderr, "Usage: kallsyms [--all-symbols] "
-			"[--base-relative] < in.map > out.S\n");
+	fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
+			"[--base-relative] < nm_vmlinux.out > symbols.S\n");
 	exit(1);
 }
 
@@ -98,6 +194,8 @@ static bool is_ignored_symbol(const char *name, char type)
 		"kallsyms_markers",
 		"kallsyms_token_table",
 		"kallsyms_token_index",
+		"kallsyms_symbol_modules",
+		"kallsyms_modules",
 		/* Exclude linker generated symbols which vary between passes */
 		"_SDA_BASE_",		/* ppc */
 		"_SDA2_BASE_",		/* ppc */
@@ -174,10 +272,23 @@ static void check_symbol_range(const char *sym, unsigned long long addr,
 	}
 }
 
+static int addrmap_compare(const void *keyp, const void *rangep)
+{
+	unsigned long long addr = *((const unsigned long long *)keyp);
+	const struct addrmap_entry *range = rangep;
+
+	if (addr < range->addr)
+		return -1;
+	if (addr < range->addr + range->size)
+		return 0;
+	return 1;
+}
+
 static int read_symbol(FILE *in, struct sym_entry *s)
 {
 	char sym[500], stype;
 	int rc;
+	struct addrmap_entry *range;
 
 	rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, sym);
 	if (rc != 3) {
@@ -202,6 +313,14 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 	check_symbol_range(sym, s->addr, text_ranges, ARRAY_SIZE(text_ranges));
 	check_symbol_range(sym, s->addr, &percpu_range, 1);
 
+	/* try to find a module that this address belongs to */
+	range = bsearch(&s->addr,
+	    addrmap, addrmap_num, sizeof(*addrmap), &addrmap_compare);
+	if (range)
+		s->module = builtin_module_offsets[range->module];
+	else
+		s->module = 0;
+
 	/* include the type field in the symbol name, so that it gets
 	 * compressed together */
 	s->len = strlen(sym) + 1;
@@ -469,6 +588,19 @@ static void write_src(void)
 	for (i = 0; i < 256; i++)
 		printf("\t.short\t%d\n", best_idx[i]);
 	printf("\n");
+
+	output_label("kallsyms_modules");
+	for (i = 0; i < builtin_module_len; i++)
+		printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
+	printf("\n");
+
+	for (i = 0; i < builtin_module_len; i++)
+		free(builtin_modules[i]);
+
+	output_label("kallsyms_symbol_modules");
+	for (i = 0; i < table_cnt; i++)
+		printf("\t.int\t%d\n", table[i].module);
+	printf("\n");
 }
 
 
@@ -734,12 +866,169 @@ static void record_relative_base(void)
 		}
 }
 
+/*
+ * Reallocate the builtin modules list.
+ */
+static void realloc_builtin_modules(void)
+{
+	builtin_module_size += 50;
+
+	builtin_modules = realloc(builtin_modules,
+				  sizeof(*builtin_modules) *
+				  builtin_module_size);
+	builtin_module_offsets = realloc(builtin_module_offsets,
+					 sizeof(*builtin_module_offsets) *
+					 builtin_module_size);
+
+	if (!builtin_modules || !builtin_module_offsets) {
+		fprintf(stderr, "kallsyms failure: out of memory.\n");
+		exit(EXIT_FAILURE);
+	}
+}
+
+/*
+ * Add a single built-in module (possibly composed of many files) to the
+ * modules list.  Take the offset of the current module and return it
+ * (purely for simplicity's sake in the caller).
+ */
+static size_t add_builtin_module(const char *module_name, char **module_paths,
+				 size_t offset)
+{
+	/* map the module's object paths to the module offset */
+	while (*module_paths) {
+		obj2mod_put(*module_paths, builtin_module_len);
+		module_paths++;
+	}
+
+	/* add the module name */
+	if (builtin_module_size <= builtin_module_len)
+		realloc_builtin_modules();
+	builtin_modules[builtin_module_len] = strdup(module_name);
+	builtin_module_offsets[builtin_module_len] = offset;
+	builtin_module_len++;
+
+	return (offset + strlen(module_name) + 1);
+}
+
+/*
+ * Read the linker map.
+ */
+static void read_linker_map(void)
+{
+	unsigned long long addr, size;
+	char obj[PATH_MAX+1];
+	FILE *f = fopen(".tmp_vmlinux.ranges", "r");
+
+	if (!f) {
+		fprintf(stderr, "Cannot open '.tmp_vmlinux.ranges'.\n");
+		exit(1);
+	}
+
+	addrmap_num = 0;
+	addrmap_alloced = 4096;
+	addrmap = malloc(sizeof(*addrmap) * addrmap_alloced);
+	if (!addrmap)
+		goto oom;
+
+	/*
+	 * For each address range (addr,size) and object, add to addrmap
+	 * the range and the built-in module to which the object maps.
+	 */
+	while (fscanf(f, "%llx %llx %s\n", &addr, &size, obj) == 3) {
+		int m = obj2mod_get(obj);
+
+		if (addr == 0 || size == 0 || m == 0)
+			continue;
+
+		if (addrmap_num >= addrmap_alloced) {
+			addrmap_alloced *= 2;
+			addrmap = realloc(addrmap,
+			    sizeof(*addrmap) * addrmap_alloced);
+			if (!addrmap)
+				goto oom;
+		}
+
+		addrmap[addrmap_num].addr = addr;
+		addrmap[addrmap_num].size = size;
+		addrmap[addrmap_num].module = m;
+		addrmap_num++;
+	}
+	fclose(f);
+	return;
+
+oom:
+	fprintf(stderr, "kallsyms: out of memory\n");
+	exit(1);
+}
+
+/*
+ * Read "modules_thick.builtin" (the list of built-in modules).  Construct:
+ *   - builtin_modules: array of built-in-module names
+ *   - builtin_module_offsets: array of offsets that will later be
+ *       used to access a concatenated list of built-in-module names
+ *   - obj2mod: a temporary, many-to-one, hash mapping
+ *       from object-file paths to built-in-module names
+ * Read ".tmp_vmlinux.ranges" (the linker map).
+ *   - addrmap[] maps address ranges to built-in module names (using obj2mod)
+ */
+static void read_modules(void)
+{
+	FILE *f;
+	char *line;
+	size_t line_size;
+	size_t offset = 0;
+
+	realloc_builtin_modules(); /* initial allocation */
+
+	builtin_modules[0] = strdup(""); /* a symbol that cannot be modular */
+	builtin_module_offsets[0] = 0;
+	builtin_module_len = 1;
+	offset++;
+
+	/*
+	 * Iterate over all modules in modules_thick.builtin and add each.
+	 */
+	f = fopen("modules_thick.builtin", "r");
+	if (f == NULL) {
+		fprintf(stderr, "Cannot open modules_thick.builtin: %s\n",
+		    strerror(errno));
+		exit(1);
+	}
+
+	while (getline(&line, &line_size, f) > 0) {
+		char **paths;
+		char *module_name = NULL;
+
+		paths = modules_thick_parse(line, &module_name);
+		if (paths == NULL)
+			break;
+		offset = add_builtin_module(module_name, paths, offset);
+		free(paths);
+		free(module_name);
+	}
+	if (ferror(f)) {
+		fprintf(stderr, "Error reading from modules_thick file: %s\n",
+		    strerror(errno));
+		exit(1);
+	}
+
+	fclose(f);
+	free(line);
+
+	/*
+	 * Read linker map.
+	 */
+	read_linker_map();
+
+	obj2mod_free();
+}
+
 int main(int argc, char **argv)
 {
 	if (argc >= 2) {
 		int i;
 		for (i = 1; i < argc; i++) {
-			if(strcmp(argv[i], "--all-symbols") == 0)
+			if (strcmp(argv[i], "--all-symbols") == 0)
 				all_symbols = 1;
 			else if (strcmp(argv[i], "--absolute-percpu") == 0)
 				absolute_percpu = 1;
@@ -751,6 +1040,7 @@ int main(int argc, char **argv)
 	} else if (argc != 1)
 		usage();
 
+	read_modules();
 	read_map(stdin);
 	shrink_table();
 	if (absolute_percpu)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 436379940356..ac14d292387a 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -76,6 +76,7 @@ vmlinux_link()
 			--start-group				\
 			${KBUILD_VMLINUX_LIBS}			\
 			--end-group				\
+			-Map=.tmp_vmlinux.map			\
 			${@}"
 
 		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
@@ -88,6 +89,7 @@ vmlinux_link()
 			-Wl,--start-group			\
 			${KBUILD_VMLINUX_LIBS}			\
 			-Wl,--end-group				\
+			-Wl,-Map=.tmp_vmlinux.map		\
 			${@}"
 
 		${CC} ${CFLAGS_vmlinux}				\
@@ -140,6 +142,19 @@ kallsyms()
 	info KSYM ${2}
 	local kallsymopt;
 
+	# read the linker map to identify ranges of addresses:
+	#   - for each *.o file, report address, size, pathname
+	#       - most such lines will have four fields
+	#       - but sometimes there is a line break after the first field
+	#   - start reading at "Linker script and memory map"
+	#   - stop reading at ".brk"
+	${AWK} '
+	    /\.o$/ && start==1 && NF>=3 { print $(NF-2), $(NF-1), $NF }
+	    /^Linker script and memory map/ { start = 1 }
+	    /^\.brk/ { exit(0) }
+	' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
+
+	# get kallsyms options
 	if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
 		kallsymopt="${kallsymopt} --all-symbols"
 	fi
@@ -152,11 +167,13 @@ kallsyms()
 		kallsymopt="${kallsymopt} --base-relative"
 	fi
 
+	# set up compilation
 	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
 		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
 
 	local afile="`basename ${2} .o`.S"
 
+	# construct file and compile
 	${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${afile}
 	${CC} ${aflags} -c -o ${2} ${afile}
 }
diff --git a/scripts/modules_thick.c b/scripts/modules_thick.c
new file mode 100644
index 000000000000..2d33b92060f9
--- /dev/null
+++ b/scripts/modules_thick.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A simple modules_thick reader.
+ *
+ * (C) 2014, 2019 Oracle, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "modules_thick.h"
+
+/*
+ * Parse a line from "modules_thick.builtin".  Allocate and return a module name
+ * and a null-terminated array of object paths (file names).  The name and array
+ * should be freed by the caller; the strings the array points to are in "line".
+ *
+ * Modules can consist of multiple paths: in this case, the portion before the
+ * colon is the path to the module, while the portion after the colon is a
+ * space-separated list of object paths.  In this case, the portion before the
+ * colon is an "object file" that does not actually exist: it is merged into
+ * built-in.a without ever being written out.
+ */
+char ** __attribute__((__nonnull__))
+modules_thick_parse(char *line, char **module_name)
+{
+	size_t npaths = 1;
+	char **paths;
+	char *tmp;
+	char *olist;
+
+	/* find object-file list after the colon, if any */
+	olist = strchr(line, ':');
+	if (olist != NULL) {
+		*olist = '\0';
+		olist++;
+		olist += strspn(olist, " \n");
+		if (*olist != '\0') {
+			/* replace any trailing \n with \0 */
+			tmp = strchr(olist, '\n');
+			if (tmp != NULL)
+				*tmp = '\0';
+		} else
+			olist = NULL;
+	}
+
+	/* get pathless module_name, starting after the last '/', if any */
+	tmp = strrchr(line, '/');
+	*module_name = strdup(tmp ? tmp + 1 : line);
+
+	/* replace '-' with '_' as is done to names when built as modules */
+	for (tmp = *module_name; *tmp != '\0'; tmp++)
+		if (*tmp == '-')
+			*tmp = '_';
+
+	/* terminate at the last '.' to remove any suffix */
+	tmp = strrchr(*module_name, '.');
+	if (tmp != NULL)
+		*tmp = '\0';
+
+	/*
+	 * Count the number of paths by counting the number of spaces.
+	 * This could be an overestimate.
+	 */
+	if (olist) {
+		npaths = 0;
+		for (tmp = olist; tmp != NULL; tmp = strchr(tmp + 1, ' '))
+			npaths++;
+	}
+
+	paths = malloc((npaths + 1) * sizeof(char *));
+	if (!paths) {
+		fprintf(stderr, "%s: out of memory\n", __func__);
+		exit(1);
+	}
+
+	/* copy the paths in */
+	if (olist) {
+		size_t i = 0;
+
+		while ((tmp = strsep(&olist, " ")) != NULL) {
+			if (i >= npaths) {
+				fprintf(stderr,
+				    "%s bug: npaths overflow on module %s\n",
+				    __func__, *module_name);
+				exit(1);
+			}
+			paths[i++] = tmp;
+		}
+		npaths = i;
+	} else
+		paths[0] = line;	/* untransformed module name */
+
+	paths[npaths] = NULL;
+
+	return paths;
+}
diff --git a/scripts/modules_thick.h b/scripts/modules_thick.h
new file mode 100644
index 000000000000..7e2c0309c731
--- /dev/null
+++ b/scripts/modules_thick.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * A simple modules_thick reader.
+ *
+ * (C) 2014, 2019 Oracle, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _LINUX_MODULES_THICK_H
+#define _LINUX_MODULES_THICK_H
+
+#include <stdio.h>
+#include <stddef.h>
+
+/*
+ * Parse a line from "modules_thick.builtin".  Return a module name
+ * and a null-terminated array of object paths (file names).
+ */
+
+char ** __attribute__((__nonnull__))
+modules_thick_parse(char *line, char **module_name);
+
+#endif
diff --git a/scripts/namespace.pl b/scripts/namespace.pl
index 1da7bca201a4..4c7615e720de 100755
--- a/scripts/namespace.pl
+++ b/scripts/namespace.pl
@@ -120,6 +120,11 @@ my %nameexception = (
     'kallsyms_addresses'=> 1,
     'kallsyms_offsets'	=> 1,
     'kallsyms_relative_base'=> 1,
+    'kallsyms_token_table'=> 1,
+    'kallsyms_token_index'=> 1,
+    'kallsyms_markers'	=> 1,
+    'kallsyms_modules'	=> 1,
+    'kallsyms_symbol_modules'=> 1,
     '__this_module'	=> 1,
     '_etext'		=> 1,
     '_edata'		=> 1,
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2019-12-10 17:48                 ` [PATCH v4] " eugene.loh
@ 2019-12-18 23:55                   ` Eugene Loh
  2019-12-19  3:29                     ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Eugene Loh @ 2019-12-18 23:55 UTC (permalink / raw)
  Cc: rostedt, corbet, yamada.masahiro, michal.lkml, jeyu,
	linux-kbuild, maz, songliubraving, tglx, jacob.e.keller,
	Kris Van Hees, Nick Alcock

Ping.


On 12/10/2019 09:48 AM, eugene.loh@oracle.com wrote:
> From: Eugene Loh <eugene.loh@oracle.com>
>
> /proc/kallsyms is very useful for tracers and other tools that need
> to map kernel symbols to addresses.
>
> It would be useful if there were a mapping between kernel symbol and
> module name that only changed when the kernel source code is changed.
> This mapping should not vanish simply because a module becomes built
> into the kernel.
>
> Therefore:
>
> - Generate a file "modules_thick.builtin" that maps from thin
>    archives that make up built-in modules to their constituent
>    object files.
>
> - Generate a linker map ".tmp_vmlinux.map", converting it into
>    ".tmp_vmlinux.ranges", mapping address ranges to object files.
>
> - Read "modules_thick.builtin" and ".tmp_vmlinux.ranges" to
>    map symbol addresses to built-in-module names.  Write those
>    module names (kallsyms_modules) and that per-symbol module
>    information (kallsyms_symbol_modules) to the *.s output file.
>
> - Use kallsyms_modules and kallsyms_symbol_modules to add
>    built-in-module information to /proc/kallsyms.
>
> Note that kernel symbols for built-in modules appear in ascending
> order by address, as usual, and thus will appear interspersed with
> symbols that are part of other built-in modules or of the kernel.
>
> Also, while it is possible for an object to appear in multiple
> built-in modules, making an unambiguous mapping of symbol to module
> impossible in such cases, this patch addresses the typical cases.
>
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
> ---
>   .gitignore                  |   1 +
>   Documentation/dontdiff      |   1 +
>   Makefile                    |  41 +++--
>   kernel/kallsyms.c           |  12 +-
>   scripts/Makefile            |   5 +
>   scripts/Makefile.modbuiltin |  20 ++-
>   scripts/kallsyms.c          | 298 +++++++++++++++++++++++++++++++++++-
>   scripts/link-vmlinux.sh     |  17 ++
>   scripts/modules_thick.c     | 104 +++++++++++++
>   scripts/modules_thick.h     |  27 ++++
>   scripts/namespace.pl        |   5 +
>   11 files changed, 509 insertions(+), 22 deletions(-)
>   create mode 100644 scripts/modules_thick.c
>   create mode 100644 scripts/modules_thick.h
>
> diff --git a/.gitignore b/.gitignore
> index 72ef86a5570d..0b9c88f1d388 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -46,6 +46,7 @@
>   Module.symvers
>   modules.builtin
>   modules.order
> +modules_thick.builtin
>   
>   #
>   # Top-level generic files
> diff --git a/Documentation/dontdiff b/Documentation/dontdiff
> index 72fc2e9e2b63..9d0db2ef3a51 100644
> --- a/Documentation/dontdiff
> +++ b/Documentation/dontdiff
> @@ -181,6 +181,7 @@ modules.builtin
>   modules.builtin.modinfo
>   modules.nsdeps
>   modules.order
> +modules_thick.builtin
>   modversions.h*
>   nconf
>   nconf-cfg
> diff --git a/Makefile b/Makefile
> index 73e3c2802927..430d49d3a93e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1073,7 +1073,7 @@ cmd_link-vmlinux =                                                 \
>   	$(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
>   	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
>   
> -vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
> +vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) modules_thick.builtin FORCE
>   	+$(call if_changed,link-vmlinux)
>   
>   targets := vmlinux
> @@ -1284,17 +1284,6 @@ modules: $(if $(KBUILD_BUILTIN),vmlinux) modules.order modules.builtin
>   modules.order: descend
>   	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
>   
> -modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
> -
> -modules.builtin: $(modbuiltin-dirs)
> -	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> -
> -PHONY += $(modbuiltin-dirs)
> -# tristate.conf is not included from this Makefile. Add it as a prerequisite
> -# here to make it self-healing in case somebody accidentally removes it.
> -$(modbuiltin-dirs): include/config/tristate.conf
> -	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@)
> -
>   # Target to prepare building external modules
>   PHONY += modules_prepare
>   modules_prepare: prepare
> @@ -1347,6 +1336,33 @@ modules modules_install:
>   
>   endif # CONFIG_MODULES
>   
> +# modules.builtin has a 'thick' form which maps from kernel modules (or rather
> +# the object file names they would have had had they not been built in) to their
> +# constituent object files: kallsyms uses this to determine which modules any
> +# given object file is part of.  (We cannot eliminate the slight redundancy
> +# here without double-expansion.)
> +
> +modbuiltin-dirs := $(addprefix _modbuiltin_, $(build-dirs))
> +
> +modbuiltin-thick-dirs := $(addprefix _modbuiltin_thick_, $(build-dirs))
> +
> +modules.builtin: $(modbuiltin-dirs)
> +	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> +
> +modules_thick.builtin: $(modbuiltin-thick-dirs)
> +	$(Q)$(AWK) '!x[$$0]++' $(addsuffix /$@, $(build-dirs)) > $@
> +
> +PHONY += $(modbuiltin-dirs) $(modbuiltin-thick-dirs)
> +# tristate.conf is not included from this Makefile. Add it as a prerequisite
> +# here to make it self-healing in case somebody accidentally removes it.
> +$(modbuiltin-dirs): include/config/tristate.conf
> +	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_%,%,$@) \
> +			builtin-file=modules.builtin
> +
> +$(modbuiltin-thick-dirs): include/config/tristate.conf
> +	$(Q)$(MAKE) $(modbuiltin)=$(patsubst _modbuiltin_thick_%,%,$@) \
> +			builtin-file=modules_thick.builtin
> +
>   ###
>   # Cleaning is done on three levels.
>   # make clean     Delete most generated files
> @@ -1712,6 +1728,7 @@ clean: $(clean-dirs)
>   		-o -name '*.asn1.[ch]' \
>   		-o -name '*.symtypes' -o -name 'modules.order' \
>   		-o -name modules.builtin -o -name '.tmp_*.o.*' \
> +		-o -name modules_thick.builtin \
>   		-o -name '*.c.[012]*.*' \
>   		-o -name '*.ll' \
>   		-o -name '*.gcno' \) -type f -print | xargs rm -f
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 136ce049c4ad..ce8576503e35 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -46,6 +46,8 @@ __attribute__((weak, section(".rodata")));
>   
>   extern const u8 kallsyms_token_table[] __weak;
>   extern const u16 kallsyms_token_index[] __weak;
> +extern const char kallsyms_modules[] __weak;
> +extern const u32 kallsyms_symbol_modules[] __weak;
>   
>   extern const unsigned int kallsyms_markers[] __weak;
>   
> @@ -508,8 +510,16 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)
>   static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
>   {
>   	unsigned off = iter->nameoff;
> +	u32 mod_index = 0;
>   
> -	iter->module_name[0] = '\0';
> +	if (kallsyms_symbol_modules)
> +		mod_index = kallsyms_symbol_modules[iter->pos];
> +
> +	if (mod_index == 0 || kallsyms_modules == NULL)
> +		iter->module_name[0] = '\0';
> +	else
> +		strcpy(iter->module_name, &kallsyms_modules[mod_index]);
> +	iter->exported = 0;
>   	iter->value = kallsyms_sym_address(iter->pos);
>   
>   	iter->type = kallsyms_get_symbol_type(off);
> diff --git a/scripts/Makefile b/scripts/Makefile
> index 00c47901cb06..44641cabb261 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -26,6 +26,11 @@ HOSTLDLIBS_extract-cert = -lcrypto
>   
>   always		:= $(hostprogs-y) $(hostprogs-m)
>   
> +kallsyms-objs	:= kallsyms.o
> +kallsyms-objs	+= modules_thick.o
> +
> +HOSTCFLAGS_modules_thick.o := -I$(srctree)/scripts
> +HOSTCFLAGS_kallsyms.o := -I$(srctree)/scripts
>   # The following hostprogs-y programs are only build on demand
>   hostprogs-y += unifdef
>   
> diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
> index 7d4711b88656..06f31e58111e 100644
> --- a/scripts/Makefile.modbuiltin
> +++ b/scripts/Makefile.modbuiltin
> @@ -1,6 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   # ==========================================================================
> -# Generating modules.builtin
> +# Generating modules.builtin and modules_thick.builtin
>   # ==========================================================================
>   
>   src := $(obj)
> @@ -30,19 +30,29 @@ __subdir-Y     := $(patsubst %/,%,$(filter %/, $(obj-Y)))
>   subdir-Y       += $(__subdir-Y)
>   subdir-ym      := $(sort $(subdir-y) $(subdir-Y) $(subdir-m))
>   subdir-ym      := $(addprefix $(obj)/,$(subdir-ym))
> -obj-Y          := $(addprefix $(obj)/,$(obj-Y))
> +pathobj-Y      := $(addprefix $(obj)/,$(obj-Y))
>   
>   modbuiltin-subdirs := $(patsubst %,%/modules.builtin, $(subdir-ym))
> -modbuiltin-mods    := $(filter %.ko, $(obj-Y:.o=.ko))
> +modbuiltin-mods    := $(filter %.ko, $(pathobj-Y:.o=.ko))
>   modbuiltin-target  := $(obj)/modules.builtin
> +modthickbuiltin-subdirs := $(patsubst %,%/modules_thick.builtin, $(subdir-ym))
> +modthickbuiltin-target  := $(obj)/modules_thick.builtin
>   
> -__modbuiltin: $(modbuiltin-target) $(subdir-ym)
> +__modbuiltin: $(obj)/$(builtin-file) $(subdir-ym)
>   	@:
>   
>   $(modbuiltin-target): $(subdir-ym) FORCE
>   	$(Q)(for m in $(modbuiltin-mods); do echo $$m; done;	\
>   	cat /dev/null $(modbuiltin-subdirs)) > $@
>   
> +$(modthickbuiltin-target): $(subdir-ym) FORCE
> +	$(Q) $(foreach mod-o, $(filter %.o,$(obj-Y)),\
> +		printf "%s:" $(addprefix $(obj)/,$(mod-o)) >> $@; \
> +		printf " %s" $(sort $(strip $(addprefix $(obj)/,$($(mod-o:.o=-objs)) \
> +			$($(mod-o:.o=-y)) $($(mod-o:.o=-Y))))) >> $@; \
> +		printf "\n" >> $@; ) \
> +	cat /dev/null $(modthickbuiltin-subdirs) >> $@;
> +
>   PHONY += FORCE
>   
>   FORCE:
> @@ -52,6 +62,6 @@ FORCE:
>   
>   PHONY += $(subdir-ym)
>   $(subdir-ym):
> -	$(Q)$(MAKE) $(modbuiltin)=$@
> +	$(Q)$(MAKE) $(modbuiltin)=$@ builtin-file=$(builtin-file)
>   
>   .PHONY: $(PHONY)
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index fb55f262f42d..7368996e5d7b 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -5,7 +5,8 @@
>    * This software may be used and distributed according to the terms
>    * of the GNU General Public License, incorporated herein by reference.
>    *
> - * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols] > symbols.S
> + * Usage: nm -n vmlinux | scripts/kallsyms [--all-symbols]
> + *                   [--absolute-percpu] [--base-relative] > symbols.S
>    *
>    *      Table compression uses all the unused char codes on the symbols and
>    *  maps these to the most used substrings (tokens). For instance, it might
> @@ -18,12 +19,15 @@
>    *
>    */
>   
> +#define _GNU_SOURCE 1
>   #include <stdbool.h>
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <string.h>
>   #include <ctype.h>
>   #include <limits.h>
> +#include <errno.h>
> +#include "modules_thick.h"
>   
>   #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
>   
> @@ -35,6 +39,7 @@ struct sym_entry {
>   	unsigned int start_pos;
>   	unsigned char *sym;
>   	unsigned int percpu_absolute;
> +	unsigned int module;
>   };
>   
>   struct addr_range {
> @@ -68,10 +73,101 @@ static unsigned char best_table[256][2];
>   static unsigned char best_table_len[256];
>   
>   
> +static unsigned int strhash(const char *s)
> +{
> +	/* fnv32 hash */
> +	unsigned int hash = 2166136261U;
> +
> +	for (; *s; s++)
> +		hash = (hash ^ *s) * 0x01000193;
> +	return hash;
> +}
> +
> +#define OBJ2MOD_BITS 10
> +#define OBJ2MOD_N (1 << OBJ2MOD_BITS)
> +#define OBJ2MOD_MASK (OBJ2MOD_N - 1)
> +struct obj2mod_elem {
> +	char *obj;
> +	int mod;
> +	struct obj2mod_elem *next;
> +};
> +
> +static struct obj2mod_elem *obj2mod[OBJ2MOD_N];
> +
> +static void obj2mod_put(const char *obj, int mod)
> +{
> +	int i = strhash(obj) & OBJ2MOD_MASK;
> +	struct obj2mod_elem *elem = malloc(sizeof(struct obj2mod_elem));
> +
> +	if (!elem) {
> +		fprintf(stderr, "kallsyms: out of memory\n");
> +		exit(1);
> +	}
> +
> +	elem->obj = strdup(obj);
> +	if (!elem->obj) {
> +		fprintf(stderr, "kallsyms: out of memory\n");
> +		free(elem);
> +		exit(1);
> +	}
> +
> +	elem->mod = mod;
> +	elem->next = obj2mod[i];
> +	obj2mod[i] = elem;
> +}
> +
> +static int obj2mod_get(const char *obj)
> +{
> +	int i = strhash(obj) & OBJ2MOD_MASK;
> +	struct obj2mod_elem *elem;
> +
> +	for (elem = obj2mod[i]; elem; elem = elem->next)
> +		if (strcmp(elem->obj, obj) == 0)
> +			return elem->mod;
> +	return 0;
> +}
> +
> +static void obj2mod_free(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < OBJ2MOD_N; i++) {
> +		struct obj2mod_elem *elem = obj2mod[i];
> +		struct obj2mod_elem *next;
> +
> +		while (elem) {
> +			next = elem->next;
> +			free(elem->obj);
> +			free(elem);
> +			elem = next;
> +		}
> +	}
> +}
> +
> +/*
> + * The builtin module names.  The "offset" points to the name as if
> + * all builtin module names were concatenated to a single string.
> + */
> +static unsigned int builtin_module_size;	/* number allocated */
> +static unsigned int builtin_module_len;		/* number assigned */
> +static char **builtin_modules;			/* array of module names */
> +static unsigned int *builtin_module_offsets;	/* offset */
> +
> +/*
> + * An ordered list of address ranges and how they map to built-in modules.
> + */
> +struct addrmap_entry {
> +	unsigned long long addr;
> +	unsigned long long size;
> +	unsigned int module;
> +};
> +static struct addrmap_entry *addrmap;
> +static int addrmap_num, addrmap_alloced;
> +
>   static void usage(void)
>   {
> -	fprintf(stderr, "Usage: kallsyms [--all-symbols] "
> -			"[--base-relative] < in.map > out.S\n");
> +	fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
> +			"[--base-relative] < nm_vmlinux.out > symbols.S\n");
>   	exit(1);
>   }
>   
> @@ -98,6 +194,8 @@ static bool is_ignored_symbol(const char *name, char type)
>   		"kallsyms_markers",
>   		"kallsyms_token_table",
>   		"kallsyms_token_index",
> +		"kallsyms_symbol_modules",
> +		"kallsyms_modules",
>   		/* Exclude linker generated symbols which vary between passes */
>   		"_SDA_BASE_",		/* ppc */
>   		"_SDA2_BASE_",		/* ppc */
> @@ -174,10 +272,23 @@ static void check_symbol_range(const char *sym, unsigned long long addr,
>   	}
>   }
>   
> +static int addrmap_compare(const void *keyp, const void *rangep)
> +{
> +	unsigned long long addr = *((const unsigned long long *)keyp);
> +	const struct addrmap_entry *range = rangep;
> +
> +	if (addr < range->addr)
> +		return -1;
> +	if (addr < range->addr + range->size)
> +		return 0;
> +	return 1;
> +}
> +
>   static int read_symbol(FILE *in, struct sym_entry *s)
>   {
>   	char sym[500], stype;
>   	int rc;
> +	struct addrmap_entry *range;
>   
>   	rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, sym);
>   	if (rc != 3) {
> @@ -202,6 +313,14 @@ static int read_symbol(FILE *in, struct sym_entry *s)
>   	check_symbol_range(sym, s->addr, text_ranges, ARRAY_SIZE(text_ranges));
>   	check_symbol_range(sym, s->addr, &percpu_range, 1);
>   
> +	/* try to find a module that this address belongs to */
> +	range = bsearch(&s->addr,
> +	    addrmap, addrmap_num, sizeof(*addrmap), &addrmap_compare);
> +	if (range)
> +		s->module = builtin_module_offsets[range->module];
> +	else
> +		s->module = 0;
> +
>   	/* include the type field in the symbol name, so that it gets
>   	 * compressed together */
>   	s->len = strlen(sym) + 1;
> @@ -469,6 +588,19 @@ static void write_src(void)
>   	for (i = 0; i < 256; i++)
>   		printf("\t.short\t%d\n", best_idx[i]);
>   	printf("\n");
> +
> +	output_label("kallsyms_modules");
> +	for (i = 0; i < builtin_module_len; i++)
> +		printf("\t.asciz\t\"%s\"\n", builtin_modules[i]);
> +	printf("\n");
> +
> +	for (i = 0; i < builtin_module_len; i++)
> +		free(builtin_modules[i]);
> +
> +	output_label("kallsyms_symbol_modules");
> +	for (i = 0; i < table_cnt; i++)
> +		printf("\t.int\t%d\n", table[i].module);
> +	printf("\n");
>   }
>   
>   
> @@ -734,12 +866,169 @@ static void record_relative_base(void)
>   		}
>   }
>   
> +/*
> + * Reallocate the builtin modules list.
> + */
> +static void realloc_builtin_modules(void)
> +{
> +	builtin_module_size += 50;
> +
> +	builtin_modules = realloc(builtin_modules,
> +				  sizeof(*builtin_modules) *
> +				  builtin_module_size);
> +	builtin_module_offsets = realloc(builtin_module_offsets,
> +					 sizeof(*builtin_module_offsets) *
> +					 builtin_module_size);
> +
> +	if (!builtin_modules || !builtin_module_offsets) {
> +		fprintf(stderr, "kallsyms failure: out of memory.\n");
> +		exit(EXIT_FAILURE);
> +	}
> +}
> +
> +/*
> + * Add a single built-in module (possibly composed of many files) to the
> + * modules list.  Take the offset of the current module and return it
> + * (purely for simplicity's sake in the caller).
> + */
> +static size_t add_builtin_module(const char *module_name, char **module_paths,
> +				 size_t offset)
> +{
> +	/* map the module's object paths to the module offset */
> +	while (*module_paths) {
> +		obj2mod_put(*module_paths, builtin_module_len);
> +		module_paths++;
> +	}
> +
> +	/* add the module name */
> +	if (builtin_module_size <= builtin_module_len)
> +		realloc_builtin_modules();
> +	builtin_modules[builtin_module_len] = strdup(module_name);
> +	builtin_module_offsets[builtin_module_len] = offset;
> +	builtin_module_len++;
> +
> +	return (offset + strlen(module_name) + 1);
> +}
> +
> +/*
> + * Read the linker map.
> + */
> +static void read_linker_map(void)
> +{
> +	unsigned long long addr, size;
> +	char obj[PATH_MAX+1];
> +	FILE *f = fopen(".tmp_vmlinux.ranges", "r");
> +
> +	if (!f) {
> +		fprintf(stderr, "Cannot open '.tmp_vmlinux.ranges'.\n");
> +		exit(1);
> +	}
> +
> +	addrmap_num = 0;
> +	addrmap_alloced = 4096;
> +	addrmap = malloc(sizeof(*addrmap) * addrmap_alloced);
> +	if (!addrmap)
> +		goto oom;
> +
> +	/*
> +	 * For each address range (addr,size) and object, add to addrmap
> +	 * the range and the built-in module to which the object maps.
> +	 */
> +	while (fscanf(f, "%llx %llx %s\n", &addr, &size, obj) == 3) {
> +		int m = obj2mod_get(obj);
> +
> +		if (addr == 0 || size == 0 || m == 0)
> +			continue;
> +
> +		if (addrmap_num >= addrmap_alloced) {
> +			addrmap_alloced *= 2;
> +			addrmap = realloc(addrmap,
> +			    sizeof(*addrmap) * addrmap_alloced);
> +			if (!addrmap)
> +				goto oom;
> +		}
> +
> +		addrmap[addrmap_num].addr = addr;
> +		addrmap[addrmap_num].size = size;
> +		addrmap[addrmap_num].module = m;
> +		addrmap_num++;
> +	}
> +	fclose(f);
> +	return;
> +
> +oom:
> +	fprintf(stderr, "kallsyms: out of memory\n");
> +	exit(1);
> +}
> +
> +/*
> + * Read "modules_thick.builtin" (the list of built-in modules).  Construct:
> + *   - builtin_modules: array of built-in-module names
> + *   - builtin_module_offsets: array of offsets that will later be
> + *       used to access a concatenated list of built-in-module names
> + *   - obj2mod: a temporary, many-to-one, hash mapping
> + *       from object-file paths to built-in-module names
> + * Read ".tmp_vmlinux.ranges" (the linker map).
> + *   - addrmap[] maps address ranges to built-in module names (using obj2mod)
> + */
> +static void read_modules(void)
> +{
> +	FILE *f;
> +	char *line;
> +	size_t line_size;
> +	size_t offset = 0;
> +
> +	realloc_builtin_modules(); /* initial allocation */
> +
> +	builtin_modules[0] = strdup(""); /* a symbol that cannot be modular */
> +	builtin_module_offsets[0] = 0;
> +	builtin_module_len = 1;
> +	offset++;
> +
> +	/*
> +	 * Iterate over all modules in modules_thick.builtin and add each.
> +	 */
> +	f = fopen("modules_thick.builtin", "r");
> +	if (f == NULL) {
> +		fprintf(stderr, "Cannot open modules_thick.builtin: %s\n",
> +		    strerror(errno));
> +		exit(1);
> +	}
> +
> +	while (getline(&line, &line_size, f) > 0) {
> +		char **paths;
> +		char *module_name = NULL;
> +
> +		paths = modules_thick_parse(line, &module_name);
> +		if (paths == NULL)
> +			break;
> +		offset = add_builtin_module(module_name, paths, offset);
> +		free(paths);
> +		free(module_name);
> +	}
> +	if (ferror(f)) {
> +		fprintf(stderr, "Error reading from modules_thick file: %s\n",
> +		    strerror(errno));
> +		exit(1);
> +	}
> +
> +	fclose(f);
> +	free(line);
> +
> +	/*
> +	 * Read linker map.
> +	 */
> +	read_linker_map();
> +
> +	obj2mod_free();
> +}
> +
>   int main(int argc, char **argv)
>   {
>   	if (argc >= 2) {
>   		int i;
>   		for (i = 1; i < argc; i++) {
> -			if(strcmp(argv[i], "--all-symbols") == 0)
> +			if (strcmp(argv[i], "--all-symbols") == 0)
>   				all_symbols = 1;
>   			else if (strcmp(argv[i], "--absolute-percpu") == 0)
>   				absolute_percpu = 1;
> @@ -751,6 +1040,7 @@ int main(int argc, char **argv)
>   	} else if (argc != 1)
>   		usage();
>   
> +	read_modules();
>   	read_map(stdin);
>   	shrink_table();
>   	if (absolute_percpu)
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 436379940356..ac14d292387a 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -76,6 +76,7 @@ vmlinux_link()
>   			--start-group				\
>   			${KBUILD_VMLINUX_LIBS}			\
>   			--end-group				\
> +			-Map=.tmp_vmlinux.map			\
>   			${@}"
>   
>   		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
> @@ -88,6 +89,7 @@ vmlinux_link()
>   			-Wl,--start-group			\
>   			${KBUILD_VMLINUX_LIBS}			\
>   			-Wl,--end-group				\
> +			-Wl,-Map=.tmp_vmlinux.map		\
>   			${@}"
>   
>   		${CC} ${CFLAGS_vmlinux}				\
> @@ -140,6 +142,19 @@ kallsyms()
>   	info KSYM ${2}
>   	local kallsymopt;
>   
> +	# read the linker map to identify ranges of addresses:
> +	#   - for each *.o file, report address, size, pathname
> +	#       - most such lines will have four fields
> +	#       - but sometimes there is a line break after the first field
> +	#   - start reading at "Linker script and memory map"
> +	#   - stop reading at ".brk"
> +	${AWK} '
> +	    /\.o$/ && start==1 && NF>=3 { print $(NF-2), $(NF-1), $NF }
> +	    /^Linker script and memory map/ { start = 1 }
> +	    /^\.brk/ { exit(0) }
> +	' .tmp_vmlinux.map | sort > .tmp_vmlinux.ranges
> +
> +	# get kallsyms options
>   	if [ -n "${CONFIG_KALLSYMS_ALL}" ]; then
>   		kallsymopt="${kallsymopt} --all-symbols"
>   	fi
> @@ -152,11 +167,13 @@ kallsyms()
>   		kallsymopt="${kallsymopt} --base-relative"
>   	fi
>   
> +	# set up compilation
>   	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
>   		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
>   
>   	local afile="`basename ${2} .o`.S"
>   
> +	# construct file and compile
>   	${NM} -n ${1} | scripts/kallsyms ${kallsymopt} > ${afile}
>   	${CC} ${aflags} -c -o ${2} ${afile}
>   }
> diff --git a/scripts/modules_thick.c b/scripts/modules_thick.c
> new file mode 100644
> index 000000000000..2d33b92060f9
> --- /dev/null
> +++ b/scripts/modules_thick.c
> @@ -0,0 +1,104 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * A simple modules_thick reader.
> + *
> + * (C) 2014, 2019 Oracle, Inc.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include "modules_thick.h"
> +
> +/*
> + * Parse a line from "modules_thick.builtin".  Allocate and return a module name
> + * and a null-terminated array of object paths (file names).  The name and array
> + * should be freed by the caller; the strings the array points to are in "line".
> + *
> + * Modules can consist of multiple paths: in this case, the portion before the
> + * colon is the path to the module, while the portion after the colon is a
> + * space-separated list of object paths.  In this case, the portion before the
> + * colon is an "object file" that does not actually exist: it is merged into
> + * built-in.a without ever being written out.
> + */
> +char ** __attribute__((__nonnull__))
> +modules_thick_parse(char *line, char **module_name)
> +{
> +	size_t npaths = 1;
> +	char **paths;
> +	char *tmp;
> +	char *olist;
> +
> +	/* find object-file list after the colon, if any */
> +	olist = strchr(line, ':');
> +	if (olist != NULL) {
> +		*olist = '\0';
> +		olist++;
> +		olist += strspn(olist, " \n");
> +		if (*olist != '\0') {
> +			/* replace any trailing \n with \0 */
> +			tmp = strchr(olist, '\n');
> +			if (tmp != NULL)
> +				*tmp = '\0';
> +		} else
> +			olist = NULL;
> +	}
> +
> +	/* get pathless module_name, starting after the last '/', if any */
> +	tmp = strrchr(line, '/');
> +	*module_name = strdup(tmp ? tmp + 1 : line);
> +
> +	/* replace '-' with '_' as is done to names when built as modules */
> +	for (tmp = *module_name; *tmp != '\0'; tmp++)
> +		if (*tmp == '-')
> +			*tmp = '_';
> +
> +	/* terminate at the last '.' to remove any suffix */
> +	tmp = strrchr(*module_name, '.');
> +	if (tmp != NULL)
> +		*tmp = '\0';
> +
> +	/*
> +	 * Count the number of paths by counting the number of spaces.
> +	 * This could be an overestimate.
> +	 */
> +	if (olist) {
> +		npaths = 0;
> +		for (tmp = olist; tmp != NULL; tmp = strchr(tmp + 1, ' '))
> +			npaths++;
> +	}
> +
> +	paths = malloc((npaths + 1) * sizeof(char *));
> +	if (!paths) {
> +		fprintf(stderr, "%s: out of memory\n", __func__);
> +		exit(1);
> +	}
> +
> +	/* copy the paths in */
> +	if (olist) {
> +		size_t i = 0;
> +
> +		while ((tmp = strsep(&olist, " ")) != NULL) {
> +			if (i >= npaths) {
> +				fprintf(stderr,
> +				    "%s bug: npaths overflow on module %s\n",
> +				    __func__, *module_name);
> +				exit(1);
> +			}
> +			paths[i++] = tmp;
> +		}
> +		npaths = i;
> +	} else
> +		paths[0] = line;	/* untransformed module name */
> +
> +	paths[npaths] = NULL;
> +
> +	return paths;
> +}
> diff --git a/scripts/modules_thick.h b/scripts/modules_thick.h
> new file mode 100644
> index 000000000000..7e2c0309c731
> --- /dev/null
> +++ b/scripts/modules_thick.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * A simple modules_thick reader.
> + *
> + * (C) 2014, 2019 Oracle, Inc.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#ifndef _LINUX_MODULES_THICK_H
> +#define _LINUX_MODULES_THICK_H
> +
> +#include <stdio.h>
> +#include <stddef.h>
> +
> +/*
> + * Parse a line from "modules_thick.builtin".  Return a module name
> + * and a null-terminated array of object paths (file names).
> + */
> +
> +char ** __attribute__((__nonnull__))
> +modules_thick_parse(char *line, char **module_name);
> +
> +#endif
> diff --git a/scripts/namespace.pl b/scripts/namespace.pl
> index 1da7bca201a4..4c7615e720de 100755
> --- a/scripts/namespace.pl
> +++ b/scripts/namespace.pl
> @@ -120,6 +120,11 @@ my %nameexception = (
>       'kallsyms_addresses'=> 1,
>       'kallsyms_offsets'	=> 1,
>       'kallsyms_relative_base'=> 1,
> +    'kallsyms_token_table'=> 1,
> +    'kallsyms_token_index'=> 1,
> +    'kallsyms_markers'	=> 1,
> +    'kallsyms_modules'	=> 1,
> +    'kallsyms_symbol_modules'=> 1,
>       '__this_module'	=> 1,
>       '_etext'		=> 1,
>       '_edata'		=> 1,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2019-12-18 23:55                   ` Eugene Loh
@ 2019-12-19  3:29                     ` Steven Rostedt
  2019-12-19  4:28                       ` Masahiro Yamada
  2019-12-19  9:43                       ` Jessica Yu
  0 siblings, 2 replies; 21+ messages in thread
From: Steven Rostedt @ 2019-12-19  3:29 UTC (permalink / raw)
  To: Eugene Loh
  Cc: corbet, yamada.masahiro, michal.lkml, jeyu, linux-kbuild, maz,
	songliubraving, tglx, jacob.e.keller, Kris Van Hees, Nick Alcock

On Wed, 18 Dec 2019 15:55:18 -0800
Eugene Loh <eugene.loh@oracle.com> wrote:

> Ping.

Couple of notes:

1) this affects code that doesn't really have a maintainer. I could
take it in my tree, but I would like to have acks from other
maintainers. Perhaps Jessica Yu (Module maintainer), and probably one
from Linus himself.

2) Do not send new versions of a patch as a reply to the old version. I
and many other maintainers sort our inbox by threads, and I look at the
top of the thread for patches. That is, if there's another version of a
patch that is a reply to a previous version, it is basically off my
radar, unless I happen to notice it by chance (which I did with this
email).

You can send your v4 patch again, but please send it as its own thread,
that way it will be on the radar of other maintainers. Hopefully we can
get some acks on this as well.

-- Steve


> 
> 
> On 12/10/2019 09:48 AM, eugene.loh@oracle.com wrote:
> > From: Eugene Loh <eugene.loh@oracle.com>
> >
> > /proc/kallsyms is very useful for tracers and other tools that need
> > to map kernel symbols to addresses.
> >
> > It would be useful if there were a mapping between kernel symbol and
> > module name that only changed when the kernel source code is changed.
> > This mapping should not vanish simply because a module becomes built
> > into the kernel.
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2019-12-19  3:29                     ` Steven Rostedt
@ 2019-12-19  4:28                       ` Masahiro Yamada
  2019-12-19 10:22                         ` Masahiro Yamada
  2020-01-08 18:32                         ` Eugene Loh
  2019-12-19  9:43                       ` Jessica Yu
  1 sibling, 2 replies; 21+ messages in thread
From: Masahiro Yamada @ 2019-12-19  4:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Eugene Loh, Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Nick Alcock

On Thu, Dec 19, 2019 at 12:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 18 Dec 2019 15:55:18 -0800
> Eugene Loh <eugene.loh@oracle.com> wrote:
>
> > Ping.
>
> Couple of notes:
>
> 1) this affects code that doesn't really have a maintainer. I could
> take it in my tree, but I would like to have acks from other
> maintainers. Perhaps Jessica Yu (Module maintainer), and probably one
> from Linus himself.
>
> 2) Do not send new versions of a patch as a reply to the old version. I
> and many other maintainers sort our inbox by threads, and I look at the
> top of the thread for patches. That is, if there's another version of a
> patch that is a reply to a previous version, it is basically off my
> radar, unless I happen to notice it by chance (which I did with this
> email).
>
> You can send your v4 patch again, but please send it as its own thread,
> that way it will be on the radar of other maintainers. Hopefully we can
> get some acks on this as well.
>
> -- Steve


I do not like this patch.

scripts/Makefile.modbuiltin is really ugly.
It traverses all the directories once again.

This patch makes it even worse,
Kbuild would traverse the
whole directories three times.

I was thinking to remove scripts/Makefile.modbuiltin
and Kconfig's tristate.conf entirely
because it is possible to generate modules.builtin more simply.


As I said, the name of builtin module is not fixed info.
And, this makes kallsyms fat just for less important info.


Masahiro Yamada

> >

> >
> > On 12/10/2019 09:48 AM, eugene.loh@oracle.com wrote:
> > > From: Eugene Loh <eugene.loh@oracle.com>
> > >
> > > /proc/kallsyms is very useful for tracers and other tools that need
> > > to map kernel symbols to addresses.
> > >
> > > It would be useful if there were a mapping between kernel symbol and
> > > module name that only changed when the kernel source code is changed.
> > > This mapping should not vanish simply because a module becomes built
> > > into the kernel.
> > >



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2019-12-19  3:29                     ` Steven Rostedt
  2019-12-19  4:28                       ` Masahiro Yamada
@ 2019-12-19  9:43                       ` Jessica Yu
  1 sibling, 0 replies; 21+ messages in thread
From: Jessica Yu @ 2019-12-19  9:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Eugene Loh, corbet, yamada.masahiro, michal.lkml, linux-kbuild,
	maz, songliubraving, tglx, jacob.e.keller, Kris Van Hees,
	Nick Alcock

+++ Steven Rostedt [18/12/19 22:29 -0500]:
>On Wed, 18 Dec 2019 15:55:18 -0800
>Eugene Loh <eugene.loh@oracle.com> wrote:
>
>> Ping.
>
>Couple of notes:
>
>1) this affects code that doesn't really have a maintainer. I could
>take it in my tree, but I would like to have acks from other
>maintainers. Perhaps Jessica Yu (Module maintainer), and probably one
>from Linus himself.

I hardly look through scripts/, so this patch would definitely need
Masahiro's (for all things kbuild) ack as well.

>2) Do not send new versions of a patch as a reply to the old version. I
>and many other maintainers sort our inbox by threads, and I look at the
>top of the thread for patches. That is, if there's another version of a
>patch that is a reply to a previous version, it is basically off my
>radar, unless I happen to notice it by chance (which I did with this
>email).
>
>You can send your v4 patch again, but please send it as its own thread,
>that way it will be on the radar of other maintainers. Hopefully we can
>get some acks on this as well.

Also, why wasn't this patch sent to lkml? At least I don't see it on
cc. If you resend v4, please send it there as well so it can get more
coverage.

Thanks,

Jessica

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2019-12-19  4:28                       ` Masahiro Yamada
@ 2019-12-19 10:22                         ` Masahiro Yamada
  2020-01-08 18:32                         ` Eugene Loh
  1 sibling, 0 replies; 21+ messages in thread
From: Masahiro Yamada @ 2019-12-19 10:22 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Eugene Loh, Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Steven Rostedt,
	Nick Alcock

On Thu, Dec 19, 2019 at 1:28 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Thu, Dec 19, 2019 at 12:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 18 Dec 2019 15:55:18 -0800
> > Eugene Loh <eugene.loh@oracle.com> wrote:
> >
> > > Ping.
> >
> > Couple of notes:
> >
> > 1) this affects code that doesn't really have a maintainer. I could
> > take it in my tree, but I would like to have acks from other
> > maintainers. Perhaps Jessica Yu (Module maintainer), and probably one
> > from Linus himself.
> >
> > 2) Do not send new versions of a patch as a reply to the old version. I
> > and many other maintainers sort our inbox by threads, and I look at the
> > top of the thread for patches. That is, if there's another version of a
> > patch that is a reply to a previous version, it is basically off my
> > radar, unless I happen to notice it by chance (which I did with this
> > email).
> >
> > You can send your v4 patch again, but please send it as its own thread,
> > that way it will be on the radar of other maintainers. Hopefully we can
> > get some acks on this as well.
> >
> > -- Steve
>
>
> I do not like this patch.
>
> scripts/Makefile.modbuiltin is really ugly.
> It traverses all the directories once again.
>
> This patch makes it even worse,
> Kbuild would traverse the
> whole directories three times.
>
> I was thinking to remove scripts/Makefile.modbuiltin
> and Kconfig's tristate.conf entirely
> because it is possible to generate modules.builtin more simply.

FYI: This is the idea I had in my mind:
https://lore.kernel.org/patchwork/project/lkml/list/?series=423205



--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2019-12-19  4:28                       ` Masahiro Yamada
  2019-12-19 10:22                         ` Masahiro Yamada
@ 2020-01-08 18:32                         ` Eugene Loh
  2020-01-20  6:37                           ` Masahiro Yamada
  1 sibling, 1 reply; 21+ messages in thread
From: Eugene Loh @ 2020-01-08 18:32 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt
  Cc: Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Nick Alcock

On 12/18/2019 08:28 PM, Masahiro Yamada wrote:

> On Thu, Dec 19, 2019 at 12:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>> Couple of notes:
>> 1) this affects code that doesn't really have a maintainer. I could
>> take it in my tree, but I would like to have acks from other
>> maintainers. Perhaps Jessica Yu (Module maintainer), and probably one
>> from Linus himself.
>>
>> 2) Do not send new versions of a patch as a reply to the old version. I
>> and many other maintainers sort our inbox by threads, and I look at the
>> top of the thread for patches. That is, if there's another version of a
>> patch that is a reply to a previous version, it is basically off my
>> radar, unless I happen to notice it by chance (which I did with this
>> email).
>>
>> You can send your v4 patch again, but please send it as its own thread,
>> that way it will be on the radar of other maintainers. Hopefully we can
>> get some acks on this as well.

Sorry.  I misunderstood some process doc.  But before I resend...

> I do not like this patch.
>
> scripts/Makefile.modbuiltin is really ugly.
> It traverses all the directories once again.
>
> This patch makes it even worse,
> Kbuild would traverse the
> whole directories three times.
>
> I was thinking to remove scripts/Makefile.modbuiltin
> and Kconfig's tristate.conf entirely
> because it is possible to generate modules.builtin more simply.

Sorry about the delayed response, due in part to holidays.  Thank you 
for your on-going review and the pointer to 
https://lore.kernel.org/patchwork/project/lkml/list/?series=423205

I agree your proposed patch simplifies some build code, but this is 
long-standing code.  Also, the build time -- either that would be saved 
by your patch or that would be incurred by a third traversal -- is 
miniscule.

Further, I do not see how to add object-to-module information to your 
proposed scheme.  Can you suggest something?  If not, then it seems the 
proposed code simplification is limiting functionality.

> As I said, the name of builtin module is not fixed info.
> And, this makes kallsyms fat just for less important info.

The name of the builtin module can be ambiguous in some cases, but in 
most cases it is not.  Indeed, the extra information is typically 
useful, and comments from, e.g., Linus and Steve were positive about 
adding that information to kallsyms.  Further, we have even heard 
favorable feedback for adding such built-in-module information to 
available_filter_functions as well.

>>> On 12/10/2019 09:48 AM, eugene.loh@oracle.com wrote:
>>>> From: Eugene Loh <eugene.loh@oracle.com>
>>>>
>>>> /proc/kallsyms is very useful for tracers and other tools that need
>>>> to map kernel symbols to addresses.
>>>>
>>>> It would be useful if there were a mapping between kernel symbol and
>>>> module name that only changed when the kernel source code is changed.
>>>> This mapping should not vanish simply because a module becomes built
>>>> into the kernel.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2020-01-08 18:32                         ` Eugene Loh
@ 2020-01-20  6:37                           ` Masahiro Yamada
  2020-01-24 18:08                             ` Eugene Loh
  0 siblings, 1 reply; 21+ messages in thread
From: Masahiro Yamada @ 2020-01-20  6:37 UTC (permalink / raw)
  To: Eugene Loh
  Cc: Steven Rostedt, Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Nick Alcock

Hi Eugene,

On Thu, Jan 9, 2020 at 3:32 AM Eugene Loh <eugene.loh@oracle.com> wrote:
>
> On 12/18/2019 08:28 PM, Masahiro Yamada wrote:
>
> > On Thu, Dec 19, 2019 at 12:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >> Couple of notes:
> >> 1) this affects code that doesn't really have a maintainer. I could
> >> take it in my tree, but I would like to have acks from other
> >> maintainers. Perhaps Jessica Yu (Module maintainer), and probably one
> >> from Linus himself.
> >>
> >> 2) Do not send new versions of a patch as a reply to the old version. I
> >> and many other maintainers sort our inbox by threads, and I look at the
> >> top of the thread for patches. That is, if there's another version of a
> >> patch that is a reply to a previous version, it is basically off my
> >> radar, unless I happen to notice it by chance (which I did with this
> >> email).
> >>
> >> You can send your v4 patch again, but please send it as its own thread,
> >> that way it will be on the radar of other maintainers. Hopefully we can
> >> get some acks on this as well.
>
> Sorry.  I misunderstood some process doc.  But before I resend...
>
> > I do not like this patch.
> >
> > scripts/Makefile.modbuiltin is really ugly.
> > It traverses all the directories once again.
> >
> > This patch makes it even worse,
> > Kbuild would traverse the
> > whole directories three times.
> >
> > I was thinking to remove scripts/Makefile.modbuiltin
> > and Kconfig's tristate.conf entirely
> > because it is possible to generate modules.builtin more simply.
>
> Sorry about the delayed response, due in part to holidays.  Thank you
> for your on-going review and the pointer to
> https://lore.kernel.org/patchwork/project/lkml/list/?series=423205
>
> I agree your proposed patch simplifies some build code, but this is
> long-standing code.  Also, the build time -- either that would be saved
> by your patch or that would be incurred by a third traversal -- is
> miniscule.
>
> Further, I do not see how to add object-to-module information to your
> proposed scheme.  Can you suggest something?  If not, then it seems the
> proposed code simplification is limiting functionality.


The object-to-module information can be retrieved by a similar
way as I did in
https://lore.kernel.org/patchwork/project/lkml/list/?series=423205

But, even if modules_think.builtin is produced in a new way,
there would make no difference in the fact that
the build system needs to generate modules_think.builtin and
.tmp_vmlinux.range, and kallsyms must integrate a big
parser of them.

So, I think this patch lacks the taste as overall.


>
> > As I said, the name of builtin module is not fixed info.
> > And, this makes kallsyms fat just for less important info.
>
> The name of the builtin module can be ambiguous in some cases, but in
> most cases it is not.  Indeed, the extra information is typically
> useful, and comments from, e.g., Linus and Steve were positive about
> adding that information to kallsyms.  Further, we have even heard
> favorable feedback for adding such built-in-module information to
> available_filter_functions as well.

In my opinion, this should be determined by the balance
between the added value and the ugliness of the code.

(Real) module names are obvious, but as I stated,
built-in module names are somewhat subtle, so I do not
like to extend it too much.

Perhaps, I was the only person who reviewed the code in detail.
After looking at how this feature is integrated,
I do not believe this should go in. Sorry.

Masahiro Yamada


>
> >>> On 12/10/2019 09:48 AM, eugene.loh@oracle.com wrote:
> >>>> From: Eugene Loh <eugene.loh@oracle.com>
> >>>>
> >>>> /proc/kallsyms is very useful for tracers and other tools that need
> >>>> to map kernel symbols to addresses.
> >>>>
> >>>> It would be useful if there were a mapping between kernel symbol and
> >>>> module name that only changed when the kernel source code is changed.
> >>>> This mapping should not vanish simply because a module becomes built
> >>>> into the kernel.



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] kallsyms: add names of built-in modules
  2020-01-20  6:37                           ` Masahiro Yamada
@ 2020-01-24 18:08                             ` Eugene Loh
  0 siblings, 0 replies; 21+ messages in thread
From: Eugene Loh @ 2020-01-24 18:08 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt
  Cc: Jonathan Corbet, Michal Marek, Jessica Yu,
	Linux Kbuild mailing list, Marc Zyngier, Song Liu,
	Thomas Gleixner, Keller, Jacob E, Kris Van Hees, Nick Alcock

On 01/19/2020 10:37 PM, Masahiro Yamada wrote:

> Hi Eugene,
>
> On Thu, Jan 9, 2020 at 3:32 AM Eugene Loh <eugene.loh@oracle.com> wrote:
>> On 12/18/2019 08:28 PM, Masahiro Yamada wrote:
>>
>>> On Thu, Dec 19, 2019 at 12:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>>>> Couple of notes:
>>>> 1) this affects code that doesn't really have a maintainer. I could
>>>> take it in my tree, but I would like to have acks from other
>>>> maintainers. Perhaps Jessica Yu (Module maintainer), and probably one
>>>> from Linus himself.
>>>>
>>>> 2) Do not send new versions of a patch as a reply to the old version. I
>>>> and many other maintainers sort our inbox by threads, and I look at the
>>>> top of the thread for patches. That is, if there's another version of a
>>>> patch that is a reply to a previous version, it is basically off my
>>>> radar, unless I happen to notice it by chance (which I did with this
>>>> email).
>>>>
>>>> You can send your v4 patch again, but please send it as its own thread,
>>>> that way it will be on the radar of other maintainers. Hopefully we can
>>>> get some acks on this as well.
>> Sorry.  I misunderstood some process doc.  But before I resend...
>>
>>> I do not like this patch.
>>>
>>> scripts/Makefile.modbuiltin is really ugly.
>>> It traverses all the directories once again.
>>>
>>> This patch makes it even worse,
>>> Kbuild would traverse the
>>> whole directories three times.
>>>
>>> I was thinking to remove scripts/Makefile.modbuiltin
>>> and Kconfig's tristate.conf entirely
>>> because it is possible to generate modules.builtin more simply.
>> Sorry about the delayed response, due in part to holidays.  Thank you
>> for your on-going review and the pointer to
>> https://lore.kernel.org/patchwork/project/lkml/list/?series=423205
>>
>> I agree your proposed patch simplifies some build code, but this is
>> long-standing code.  Also, the build time -- either that would be saved
>> by your patch or that would be incurred by a third traversal -- is
>> miniscule.
>>
>> Further, I do not see how to add object-to-module information to your
>> proposed scheme.  Can you suggest something?  If not, then it seems the
>> proposed code simplification is limiting functionality.
>
> The object-to-module information can be retrieved by a similar
> way as I did in
> https://lore.kernel.org/patchwork/project/lkml/list/?series=423205

Thanks again for that pointer.  I'm looking at how that approach can be 
generalized to the functionality we need for the object-to-module mapping.

> But, even if modules_think.builtin is produced in a new way,
> there would make no difference in the fact that
> the build system needs to generate modules_think.builtin and
> .tmp_vmlinux.range, and kallsyms must integrate a big
> parser of them.
>
> So, I think this patch lacks the taste as overall.

Unfortunately we need to get the information from somewhere.  I'll work 
on streamlining the implementation more, and prepare an updated patch.

>>> As I said, the name of builtin module is not fixed info.
>>> And, this makes kallsyms fat just for less important info.
>> The name of the builtin module can be ambiguous in some cases, but in
>> most cases it is not.  Indeed, the extra information is typically
>> useful, and comments from, e.g., Linus and Steve were positive about
>> adding that information to kallsyms.  Further, we have even heard
>> favorable feedback for adding such built-in-module information to
>> available_filter_functions as well.
> In my opinion, this should be determined by the balance
> between the added value and the ugliness of the code.
>
> (Real) module names are obvious, but as I stated,
> built-in module names are somewhat subtle, so I do not
> like to extend it too much.
>
> Perhaps, I was the only person who reviewed the code in detail.
> After looking at how this feature is integrated,
> I do not believe this should go in. Sorry.

Thanks for the review.  In light of the value of the functionality and the
expressed interest in this feature, I'll work towards an updated patch for
further review.

>>>>> On 12/10/2019 09:48 AM, eugene.loh@oracle.com wrote:
>>>>>> From: Eugene Loh <eugene.loh@oracle.com>
>>>>>>
>>>>>> /proc/kallsyms is very useful for tracers and other tools that need
>>>>>> to map kernel symbols to addresses.
>>>>>>
>>>>>> It would be useful if there were a mapping between kernel symbol and
>>>>>> module name that only changed when the kernel source code is changed.
>>>>>> This mapping should not vanish simply because a module becomes built
>>>>>> into the kernel.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-01-24 18:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 22:30 [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes eugene.loh
2019-11-15 16:47 ` Steven Rostedt
2019-11-15 17:26   ` Linus Torvalds
2019-11-16 17:58     ` Eugene Loh
2019-11-17  0:32       ` Linus Torvalds
2019-11-19 22:42         ` [PATCH v2] kallsyms: add names of built-in modules eugene.loh
2019-11-20  4:59           ` [PATCH v3] " eugene.loh
2019-11-22 10:00             ` Masahiro Yamada
2019-11-22 15:23               ` Nick Alcock
2019-11-22 17:04                 ` Eugene Loh
2019-12-10 17:45               ` Eugene Loh
2019-12-10 17:48                 ` [PATCH v4] " eugene.loh
2019-12-18 23:55                   ` Eugene Loh
2019-12-19  3:29                     ` Steven Rostedt
2019-12-19  4:28                       ` Masahiro Yamada
2019-12-19 10:22                         ` Masahiro Yamada
2020-01-08 18:32                         ` Eugene Loh
2020-01-20  6:37                           ` Masahiro Yamada
2020-01-24 18:08                             ` Eugene Loh
2019-12-19  9:43                       ` Jessica Yu
2019-11-20  0:11         ` [PATCH] kallsyms: new /proc/kallmodsyms with builtin modules and symbol sizes Eugene Loh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.