Re: how to collect information regarding function calls in run time?

From: Pedro Terra Delboni <terra0009@gmail.com>
To: "Valdis Klētnieks" <valdis.kletnieks@vt.edu>
Cc: kernelnewbies@kernelnewbies.org
Subject: Re: how to collect information regarding function calls in run time?
Date: Tue, 14 May 2019 16:11:51 -0300	[thread overview]
Message-ID: <CAHKDPP_=QNwKLF9rN-sKvG6+NjNgc5-y1tFa+f2K-TeA5r4ocw@mail.gmail.com> (raw)
In-Reply-To: <2216.1557855958@turing-police>

On Tue, May 14, 2019 at 2:46 PM Valdis Klētnieks
<valdis.kletnieks@vt.edu> wrote:
>
> On Tue, 14 May 2019 10:55:40 -0300, Pedro Terra Delboni said:
>
> > Regarding bpftrace: This seemed like the best option since I could use it
> > to count frames of the stack with depth 2, allowing me to know precisely
> > the amount of times each specific call has been made. However, I could not
> > use it because since I have to probe every function, it would raise an
> > error related to open file limit. I've tried setting the open file limit to
> > unlimited, but the command I used to do so said it was impossible, also the
> > current limit is set to 1048576, so I'm guessing that probing every
> > function isn't a viable solution.
>
> What problem are you trying to solve?
>
> If you're trying to count how often *every* function is called, and the fact
> that one way to do it has an upper limit of a million is a problem, chances are
> that you haven't figured out what the *question* is yet.
>
> Usually, the number of calls isn't that important, the total runtime spent in
> the function is important.  A one-liner inline accessor function that compiles
> down to 2-3 machine opcodes can be called tens of thousands of times a second
> and not be noticed.  A function that takes milliseconds to complete will be
> noticed if it's called only a few dozen times a second.
>
> If you're trying to figure out how the functions fit together, a static call
> graph analysis tool to produce a map of what calls what may be what you need.
>
> Having said that, a kernel built with gcov or ftrace support will give you the
> info you need.
>
> See kernel/gcove/Kconfig and http://heim.ifi.uio.no/~knuto/kernel/4.14/dev-tools/gcov.html
> if you want to go that route.
>
> Resources for ftrace call counts:
>
> http://www.brendangregg.com/blog/2014-07-13/linux-ftrace-function-counting.html
>
> https://wiki.linaro.org/KenWerner/Sandbox/ftrace and see section 'function profiler'.
>
> Be prepared for your kernel to be quite slow, and have to do a *lot* of data
> reduction.
>
> Note that you'll probably need to run for at least several hours, and of course
> the function counts will be *very* dependent on what you do - what gets called
> while I'm doing stuff like writing e-mail is very different from what happens
> during a kernel compile, and both of those are different from the function
> counts that happen when I back up my laptop to an external USB disk.
>
> (Note I've not *tried* any of the above - this laptop is slow enough as it is :)

Thank you for the answer, I'll follow the appointed links.

I agree that the question alone seems like a weird one, I just assumed
when I wrote my first email that the explaining the motivation would
only consume time of the reader.

The subject I'm working on is Control-Flow Integrity, which instrument
a code so that each indirect jump (which are usually returns or
indirect calls) verify if the address they are returning is a valid
one (so there is a code stub that runs in every function call and
return).
One tool was implemented by one of the university researchers in order
to add such instrumentation to the kernel Linux [1].

[1] https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf

A more "secure" version of this implementation was proven to increase
the latency by quite a lot, thus it isn't a viable solution atm. The
latency increase is done mostly by the instrumentation when the
function returns.
The reason I want to count call instructions execution is because the
function return tied to the most executed call instruction will be the
one that will cause the greater increase in execution time, so by
inlining that call we'll be exchanging this cost for the cache impact
of the code expansion (as the code stub won't exist anymore for this
call).
The objective is to try to measure in which cases this exchange is a
viable one (so we can decide which functions to inline/expand), and
also try to find how many expansions would be necessary in order to
increase viability of the current solution.
I understand that the profiling result will change based on execution.
We don't assume that the profiling will solve the latency issue for
every case, but if it does for the profiled case it would already be
an interesting result.

This is only researching for now, I hope the results would be
interesting to the community in the future, so any help would be
appreciated. Please, let me know if I wasn't clear, or if you have any
other ideas.

Thanks a lot

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies