live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH modules-next v10 00/13] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms
       [not found] <20221205163157.269335-1-nick.alcock@oracle.com>
@ 2023-04-07 23:21 ` Josh Poimboeuf
  2023-04-10 13:08   ` Joe Lawrence
  2023-04-24 19:47   ` Luis Chamberlain
  0 siblings, 2 replies; 4+ messages in thread
From: Josh Poimboeuf @ 2023-04-07 23:21 UTC (permalink / raw)
  To: Nick Alcock
  Cc: mcgrof, masahiroy, linux-modules, linux-trace-kernel,
	linux-kernel, arnd, akpm, eugene.loh, kris.van.hees,
	live-patching, Peter Zijlstra, Steven Rostedt

On Mon, Dec 05, 2022 at 04:31:44PM +0000, Nick Alcock wrote:
> The whole point of symbols is that their names are unique: you can look up a
> symbol and get back a unique address, and vice versa.  Alas, because
> /proc/kallsyms (rightly) reports all symbols, even hidden ones, it does not
> really satisfy this requirement.  Large numbers of symbols are duplicated
> many times (just search for __list_del_entry!), and while usually these are
> just out-of-lined things defined in header files and thus all have the same
> implementation, it does make it needlessly hard to figure out which one is
> which in stack dumps, when tracing, and such things.  Some configuration
> options make things much worse: my test make allyesconfig runs introduced
> thousands of text symbols named _sub_I_65535_1, one per compiler-generated
> object file, and it was fairly easy to make them appear in ftrace output.
> 
> Right now the kernel has no way at all to tell such symbols apart, and nor
> has the user: their address differs and that's all.  Which module did they
> come from?  Which object file?  We don't know.  Figuring out which is which
> when tracing needs a combination of guesswork and luck, and if there are
> thousands of them that's not a pleasant prospect.  In discussions at LPC it
> became clear that this is not just annoying me but Steve Rostedt and others,
> so it's probably desirable to fix this.
> 
> It turns out that the linker, and the kernel build system, can be made to
> give us everything we need to resolve this once and for all.  This series
> provides a new /proc/kallmodsyms which is like /proc/kallsyms except that it
> annotates every (textual) symbol which comes from a built-in kernel module
> with the module's name, in square brackets: if a symbol is used by multiple
> modules, it gets [multiple] [names]; if a symbol is still ambiguous it gets
> a cut-down {object file name}; the combination of symbol, [module] [names]
> and {object file name} is unique (with one minor exception: the arm64 nvhe
> module is pre-linked with ld -r, causing all symbols in it to appear to come
> from the same object file: if it was reworked to use thin archives this
> problem would go away).

Hi Nick,

Sorry for jumping in late on an old patch set.  I just saw the LWN
article about the MODULE_LICENSE() patches and I have some comments
about duplicate symbols and a question about the motivation for this
patch set.

For livepatch we have a solution for disambiguating duplicate local
symbols called "sympos".  It works (for now) but there are some cases
(like LTO) where it falls apart and it may not be the best long term
solution.

The function granularity KASLR (fgkaslr) patches proposed a potentially
better option: use the GNU linker -zunique_symbols flag which renames
all duplicates to have unique names across the entire linked object.

There are other components which also struggle with duplicate symbols:
ftrace, kprobes, BPF, etc.  It would be good to come up with a kallsyms
solution that works for everybody.

Anyway, I was nodding along with the above cover letter until I got to
the third paragraph.

A "built-in kernel module" is not actually a module, as it's built in to
vmlinux.  I suspect the point is that if you rebuild with a different
config, it might become a module.  But many other changes could also
occur with a changed config, including changed inlining decisions and
GCC IPA optimization function renaming, in which case the symbol might
no longer exist with the new config.

Also I'm confused what it means for a symbol to be "used by multiple
modules".  If the same TU or inline symbol is linked into two modules,
it will be loaded twice at two different addresses, and the
implementations could even differ.

It sounds like there are two problems being conflated:

  1) how to uniquely identify symbols in the current kernel

     For this, all we really need is file+sym.

     Or, enable -zunique-symbols in the linker.

  2) how to uniquely identify symbols across multiple kernels/configs

     This seems much trickier, as much can change across kernels and
     configs, including compiler inlining and naming decisions, not to
     mention actual code changes.

The problems are related, but distinct.

#2 seems significantly harder to implement properly.

Would solving #1 give you most of what you need?

Based on the difficulty of #2, it really needs a proper justification.
I didn't see that in either of the patch sets.

Can you share more details about what specific problem needs solved and
why?  And how this would be used?  Examples would be helpful.

The article linked to this brief explanation [1], but that doesn't
clarify why "distinct notation used by users for things in named
modules" would be important.

Is there a reason the user can't just use whatever notation is
appropriate for their specific kernel?  Or, once we have #1, couldn't
tooling do an intermediate translation?

[1] https://lwn.net/ml/linux-kernel/87h6z5wqlk.fsf@esperi.org.uk/

-- 
Josh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH modules-next v10 00/13] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms
  2023-04-07 23:21 ` [PATCH modules-next v10 00/13] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms Josh Poimboeuf
@ 2023-04-10 13:08   ` Joe Lawrence
  2023-04-24 19:47   ` Luis Chamberlain
  1 sibling, 0 replies; 4+ messages in thread
From: Joe Lawrence @ 2023-04-10 13:08 UTC (permalink / raw)
  To: Josh Poimboeuf, Nick Alcock
  Cc: mcgrof, masahiroy, linux-modules, linux-trace-kernel,
	linux-kernel, arnd, akpm, eugene.loh, kris.van.hees,
	live-patching, Peter Zijlstra, Steven Rostedt,
	Fāng-ruì Sòng

On 4/7/23 19:21, Josh Poimboeuf wrote:
> On Mon, Dec 05, 2022 at 04:31:44PM +0000, Nick Alcock wrote:
>> The whole point of symbols is that their names are unique: you can look up a
>> symbol and get back a unique address, and vice versa.  Alas, because
>> /proc/kallsyms (rightly) reports all symbols, even hidden ones, it does not
>> really satisfy this requirement.  Large numbers of symbols are duplicated
>> many times (just search for __list_del_entry!), and while usually these are
>> just out-of-lined things defined in header files and thus all have the same
>> implementation, it does make it needlessly hard to figure out which one is
>> which in stack dumps, when tracing, and such things.  Some configuration
>> options make things much worse: my test make allyesconfig runs introduced
>> thousands of text symbols named _sub_I_65535_1, one per compiler-generated
>> object file, and it was fairly easy to make them appear in ftrace output.
>>
>> Right now the kernel has no way at all to tell such symbols apart, and nor
>> has the user: their address differs and that's all.  Which module did they
>> come from?  Which object file?  We don't know.  Figuring out which is which
>> when tracing needs a combination of guesswork and luck, and if there are
>> thousands of them that's not a pleasant prospect.  In discussions at LPC it
>> became clear that this is not just annoying me but Steve Rostedt and others,
>> so it's probably desirable to fix this.
>>
>> It turns out that the linker, and the kernel build system, can be made to
>> give us everything we need to resolve this once and for all.  This series
>> provides a new /proc/kallmodsyms which is like /proc/kallsyms except that it
>> annotates every (textual) symbol which comes from a built-in kernel module
>> with the module's name, in square brackets: if a symbol is used by multiple
>> modules, it gets [multiple] [names]; if a symbol is still ambiguous it gets
>> a cut-down {object file name}; the combination of symbol, [module] [names]
>> and {object file name} is unique (with one minor exception: the arm64 nvhe
>> module is pre-linked with ld -r, causing all symbols in it to appear to come
>> from the same object file: if it was reworked to use thin archives this
>> problem would go away).
> 
> Hi Nick,
> 
> Sorry for jumping in late on an old patch set.  I just saw the LWN
> article about the MODULE_LICENSE() patches and I have some comments
> about duplicate symbols and a question about the motivation for this
> patch set.
> 
> For livepatch we have a solution for disambiguating duplicate local
> symbols called "sympos".  It works (for now) but there are some cases
> (like LTO) where it falls apart and it may not be the best long term
> solution.
> 
> The function granularity KASLR (fgkaslr) patches proposed a potentially
> better option: use the GNU linker -zunique_symbols flag which renames
> all duplicates to have unique names across the entire linked object.
> 

And IIRC, that idea was eventually dropped.  Fāng-ruì Sòng posted a few
reasons why -zunique-symbols wouldn't be a great solution [1]

[1]
https://lore.kernel.org/lkml/CAFP8O3K1mkiCGMTEeuSifZtr2piHsKTjP5TOA25nqpv2SrbzYQ@mail.gmail.com/

<file + symbol> was suggested instead, but I'm not 100% if that ever
became the preferred solution.

> There are other components which also struggle with duplicate symbols:
> ftrace, kprobes, BPF, etc.  It would be good to come up with a kallsyms
> solution that works for everybody.
> 
> Anyway, I was nodding along with the above cover letter until I got to
> the third paragraph.
> 
> A "built-in kernel module" is not actually a module, as it's built in to
> vmlinux.  I suspect the point is that if you rebuild with a different
> config, it might become a module.  But many other changes could also
> occur with a changed config, including changed inlining decisions and
> GCC IPA optimization function renaming, in which case the symbol might
> no longer exist with the new config.
> 
> Also I'm confused what it means for a symbol to be "used by multiple
> modules".  If the same TU or inline symbol is linked into two modules,
> it will be loaded twice at two different addresses, and the
> implementations could even differ.
> 
> It sounds like there are two problems being conflated:
> 
>   1) how to uniquely identify symbols in the current kernel
> 
>      For this, all we really need is file+sym.
> 
>      Or, enable -zunique-symbols in the linker.
> 
>   2) how to uniquely identify symbols across multiple kernels/configs
> 
>      This seems much trickier, as much can change across kernels and
>      configs, including compiler inlining and naming decisions, not to
>      mention actual code changes.
> 
> The problems are related, but distinct.
> 
> #2 seems significantly harder to implement properly.
> 
> Would solving #1 give you most of what you need?
> 
> Based on the difficulty of #2, it really needs a proper justification.
> I didn't see that in either of the patch sets.
> 
> Can you share more details about what specific problem needs solved and
> why?  And how this would be used?  Examples would be helpful.
> 
> The article linked to this brief explanation [1], but that doesn't
> clarify why "distinct notation used by users for things in named
> modules" would be important.
> 
> Is there a reason the user can't just use whatever notation is
> appropriate for their specific kernel?  Or, once we have #1, couldn't
> tooling do an intermediate translation?
> 
> [1] https://lwn.net/ml/linux-kernel/87h6z5wqlk.fsf@esperi.org.uk/
> 

-- 
Joe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH modules-next v10 00/13] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms
  2023-04-07 23:21 ` [PATCH modules-next v10 00/13] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms Josh Poimboeuf
  2023-04-10 13:08   ` Joe Lawrence
@ 2023-04-24 19:47   ` Luis Chamberlain
  2023-04-25  8:27     ` Petr Mladek
  1 sibling, 1 reply; 4+ messages in thread
From: Luis Chamberlain @ 2023-04-24 19:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nick Alcock, masahiroy, linux-modules, linux-trace-kernel,
	linux-kernel, arnd, akpm, eugene.loh, kris.van.hees,
	live-patching, Peter Zijlstra, Steven Rostedt

On Fri, Apr 07, 2023 at 04:21:18PM -0700, Josh Poimboeuf wrote:
> Anyway, I was nodding along with the above cover letter until I got to
> the third paragraph.
> 
> A "built-in kernel module" is not actually a module, as it's built in to
> vmlinux.  I suspect the point is that if you rebuild with a different
> config, it might become a module.  But many other changes could also
> occur with a changed config, including changed inlining decisions and
> GCC IPA optimization function renaming, in which case the symbol might
> no longer exist with the new config.

Yes it does not matter, for his tooling effort it was just to be able
to map a possible module to a symbol so tooling can display this to
disambiguate.

> Also I'm confused what it means for a symbol to be "used by multiple
> modules".  If the same TU or inline symbol is linked into two modules,
> it will be loaded twice at two different addresses, and the
> implementations could even differ.

He just wants to be able to map if a symbol with the same name but
different addresses is due to a built-in or a module declaration of
the same symbol so it can use it.

> It sounds like there are two problems being conflated:
> 
>   1) how to uniquely identify symbols in the current kernel
> 
>      For this, all we really need is file+sym.
> 
>      Or, enable -zunique-symbols in the linker.
> 
>   2) how to uniquely identify symbols across multiple kernels/configs
> 
>      This seems much trickier, as much can change across kernels and
>      configs, including compiler inlining and naming decisions, not to
>      mention actual code changes.
> 
> The problems are related, but distinct.
> 
> #2 seems significantly harder to implement properly.
> 
> Would solving #1 give you most of what you need?

I'm not nick but my reading of his goals is that if you peg a
"possible_module" prefix or postfix or whatever, then yes.

For 2) I think it would be good to see if one could just force Kconfig
tristate to add -DPOSSIBLE_MODULE, that would be an easier approach
than the possible-obj-m thing [0] I had suggested last

[0] https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/

  Luis

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH modules-next v10 00/13] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms
  2023-04-24 19:47   ` Luis Chamberlain
@ 2023-04-25  8:27     ` Petr Mladek
  0 siblings, 0 replies; 4+ messages in thread
From: Petr Mladek @ 2023-04-25  8:27 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Josh Poimboeuf, Nick Alcock, masahiroy, linux-modules,
	linux-trace-kernel, linux-kernel, arnd, akpm, eugene.loh,
	kris.van.hees, live-patching, Peter Zijlstra, Steven Rostedt

On Mon 2023-04-24 12:47:39, Luis Chamberlain wrote:
> On Fri, Apr 07, 2023 at 04:21:18PM -0700, Josh Poimboeuf wrote:
> > Anyway, I was nodding along with the above cover letter until I got to
> > the third paragraph.
> > 
> > A "built-in kernel module" is not actually a module, as it's built in to
> > vmlinux.  I suspect the point is that if you rebuild with a different
> > config, it might become a module.  But many other changes could also
> > occur with a changed config, including changed inlining decisions and
> > GCC IPA optimization function renaming, in which case the symbol might
> > no longer exist with the new config.
> 
> Yes it does not matter, for his tooling effort it was just to be able
> to map a possible module to a symbol so tooling can display this to
> disambiguate.

Note that the same symbol name might be used many times even within
one module. The module might be linked from more .o files. And
more .o files might have a local symbol with the same name.

I think that the best solution is to associate the symbol with
the file name and line. These is very useful information in general.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-25  8:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20221205163157.269335-1-nick.alcock@oracle.com>
2023-04-07 23:21 ` [PATCH modules-next v10 00/13] kallsyms: reliable symbol->address lookup with /proc/kallmodsyms Josh Poimboeuf
2023-04-10 13:08   ` Joe Lawrence
2023-04-24 19:47   ` Luis Chamberlain
2023-04-25  8:27     ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).