linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
@ 2013-01-22  9:46 Ezequiel Garcia
  2013-01-22 13:41 ` Pekka Enberg
  2013-01-23  4:27 ` Minchan Kim
  0 siblings, 2 replies; 8+ messages in thread
From: Ezequiel Garcia @ 2013-01-22  9:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Tim Bird, Ezequiel Garcia, Pekka Enberg,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar

From: Ezequiel Garcia <elezegarcia@gmail.com>

The purpose of trace_analyze.py tool is to perform static
and dynamic memory analysis using a kmem ftrace
log file and a built kernel tree.

This script and related work has been done on the CEWG/2012 project:
"Kernel dynamic memory allocation tracking and reduction"
(More info here [1])

It produces mainly two kinds of outputs:
 * an account-like output, similar to the one given by Perf, example below.
 * a ring-char output, examples here [2].

$ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log --account-file account.txt
$ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log -c account.txt

This will produce an account file like this:

    current bytes allocated:     669696
    current bytes requested:     618823
    current wasted bytes:         50873
    number of allocs:              7649
    number of frees:               2563
    number of callers:              115

     total    waste      net alloc/free  caller
    ---------------------------------------------
    299200        0   298928  1100/1     alloc_inode+0x4fL
    189824        0   140544  1483/385   __d_alloc+0x22L
     51904        0    47552   811/68    sysfs_new_dirent+0x4eL
    [...]

[1] http://elinux.org/Kernel_dynamic_memory_analysis
[2] http://elinux.org/Kernel_dynamic_memory_analysis#Current_dynamic_footprint

Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
---
 RFC/scripts/tracing/trace_analyze.py | 1249 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 1249 insertions(+), 0 deletions(-)
 create mode 100755 scripts/tracing/trace_analyze.py

diff --git a/scripts/tracing/trace_analyze.py b/scripts/tracing/trace_analyze.py
new file mode 100755
index 0000000..ad49c9a
--- /dev/null
+++ b/scripts/tracing/trace_analyze.py
@@ -0,0 +1,1249 @@
+#!/usr/bin/env python
+
+"""
+Copyright (C) 2012 Ezequiel Garcia <elezegarcia@gmail.com>
+Licensed under the terms of the GNU GPL License version 2
+
+trace_analize.py
+----------------
+
+0. Introduction
+---------------
+
+This script allows to perform some analysis on kernel dynamic memory
+allocations by post-processing ftrace kmem event.
+In addition, it can also report on static footprint on a built kernel tree.
+
+trace_analyze.py typically needs access to:
+1) a built kernel tree and, 2) an ftrace kmem log.
+
+Since reading the kmem event log is a costly operation,
+you can also generate a 'db' file to speed-up subsequent runs of the script.
+
+This script and work related has been done thanks to the CEWG project
+"Kernel dynamic memory allocation tracking and reduction"
+You can find lot more information about this script and on kernel dynamic
+memory tracking here:
+
+    http://elinux.org/Kernel_dynamic_memory_analysis
+
+Disclaimer:
+trace_analyze.py is not stable, so expect some roughness.
+Testing and feedback is more than welcome.
+In fact, even some flames are welcome.
+
+1. Using trace_analyze.py for static analysis
+---------------------------------------------
+
+Usage is fairly simple
+
+    $ ./trace_analyze.py -k /usr/src/linux -r foo.png
+    $ ./trace_analyze.py --kernel /usr/src/linux --rings-file foo.png
+
+This should produce a ringchart png file in the current directory.
+Of course, you can use absolute and relative paths in the path parameter
+
+    $ ./trace_analyze.py -k ../../torvalds -r foo.png
+
+If you're interested in a specific subsystem you can use a parameter to specify
+the directory tree branch to take as root
+
+    $ ./trace_analyze -k linux --start-branch fs/ext2 -r ext2.png
+    $ ./trace_analyze -k linux -b drivers -r drivers.png
+    $ ./trace_analyze -k linux -b mm -r mm.png
+
+Each of this commands will produce a ringchart png file in the
+curent directory, named as specified.
+
+What's under the hood?
+The script will perform a directory walk, internally creating a tree matching
+the provided kernel tree. On each object file found (like fs/inode.o) it will
+perform a 'readelf --syms' to get a list of symbols contained in it. Nothing fancy.
+
+2. Using trace_analyze.py for dynamic analysis
+----------------------------------------------
+
+2.1. Producing a kmem trace log file
+
+In case you don't know or don't remember how to use ftrace to
+produce kmem events, here's a little remainder.
+For more information, please refer to the canonical
+trace documentation at the linux tree:
+
+- Documentation/trace/ftrace.txt
+- Documentation/trace/tracepoint-analysis.txt
+- and everything else inside Documentation/trace/
+
+The purpose of trace_analyze script is to perform dynamic memory analysis.
+For this to work you need feed it with a kmem trace log file
+(of course, you also need to give hime a built kernel tree).
+
+Such log must be produced on the running target kernel,
+but you can post-process it off-box.
+For instance, you boot your kernel with kmem parameters
+to enable ftrace kmem events:
+(it's recommended to enable all events, despite not running a NUMA machine).
+
+    trace_event="kmem:kmalloc,kmem:kmalloc_node,kmem:kfree,kmem:kmem_cache_alloc,kmem:kmem_cache_alloc_node,kmem:kmem_cache_free"
+
+This parameter will have linux to start tracing as soon as possible.
+Of course some early traces will be lost, see below.
+
+(on your target kernel)
+
+    # To stop tracing
+    $ echo "0" > /sys/kernel/debug/tracing/tracing_on
+    # Dump
+    $ cat /sys/kernel/debug/tracing/trace > kmem.log
+
+Now you need to get this file so you can post-process
+it using trace_analyze.py.
+In my case, I use qemu with a file backing serial device,
+so I simply do:
+
+(on your target kernel)
+
+    $ cat /sys/kernel/debug/tracing/trace > /dev/ttyS0
+
+And I get the log on qemu's backing file.
+
+Now you have everything you need to start the analysis.
+
+2.2. Slab accounting file output
+
+To obtain a memory accounting file you need to use
+--acount-file (-c) parameter, like this:
+
+    $ ./trace_analyze.py -k linux -f kmem.log --account-file account.txt
+    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt
+
+This will produce an account file like this:
+
+    current bytes allocated:     669696
+    current bytes requested:     618823
+    current wasted bytes:         50873
+    number of allocs:              7649
+    number of frees:               2563
+    number of callers:              115
+
+     total    waste      net alloc/free  caller
+    ---------------------------------------------
+    299200        0   298928  1100/1     alloc_inode+0x4fL
+    189824        0   140544  1483/385   __d_alloc+0x22L
+     51904        0    47552   811/68    sysfs_new_dirent+0x4eL
+     16384     8088    16384     1/0     __seq_open_private+0x24L
+     15936     1328    15936    83/0     device_create_vargs+0x42L
+     14720    10898    14016   460/22    sysfs_new_dirent+0x29L
+
+2.3. Controlling account output
+
+You can ask the script to read only kmalloc events
+(notice the option name is *--malloc*):
+
+    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --malloc
+
+Or you can ask the script to read only kmem_cache events:
+
+    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --cache
+
+If you want to order the account file you can use --order-by (-o):
+
+    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --order-by=waste
+    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --malloc -o waste
+The possible options for order-by parameter are:
+
+* total_dynamic: Added allocations size
+* current_dynamic: Currently allocated size
+* alloc_count: Number of allocations
+* free_count: Number of frees
+* waste: Currently wasted size
+
+You can pick a directory to get an account file showing
+only the allocations from that directory.
+This is done with the --start-branch (-b) option,
+just like we've done for the static analysis:
+
+    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt -b drivers/base/
+
+All of these options can be combined.
+For instance, if you want to get kmalloc events only,
+coming from fs/ directory and ordered by current dynamic footprint:
+
+    $ ./trace_analyze.py -k linux -f kmem.log -b fs -c account.txt -o current_dynamic --malloc
+
+2.4. Producing a pretty ringchart for dynamic allocations
+
+As already explained in the static analysis section, it's possible to produce
+a ringchart to get **the big picture** of dynamic allocations.
+You will need to have *matplotlib* installed, which should be as easy as:
+
+    $ {your_pkg_manager} install matplotlib
+
+The script usage is very simple,
+just pass the parameter --rings-file (-r) along with a filename
+
+    $ ./trace_analyze.py -k linux -f kmem.log --rings-file=dynamic.png
+
+This command will produce a png file named as specified.
+The plot will show current dynamic allocations by default.
+You can control the used attrbute used for the ringchart
+plot using --rings-attr (-a) parameter.
+
+The available options are:
+
+- current: static + current dynamic size
+- static: static size
+- waste: wasted size
+- current_dynamic: current dynamic size
+- total_dyamic: added dynamic size
+
+For instance, you may want a ringchart for wasted bytes
+
+    $ ./trace_analyze.py -k linux -f kmem.log -r -a waste
+
+You can use --start-branch (-b) parameter to plot allocations made from just one directory.
+For instance, if you want to get wasted bytes for ext4 filesystem:
+
+    $ ./trace_analyze.py -k ../torvalds -f kmem.log \
+      -r ext4_waste.png -a waste -b fs/ext4
+
+Or, if you want to see static footprint of arch-dependent mm code:
+
+    $ ./trace_analyze.py -k ../torvalds -f kmem.log \
+      -r x86_static.png -a static -b arch/x86/mm
+
+Also, you can filter kmalloc or kmem_cache traces
+using either --malloc, or --cache:
+
+    $ ./trace_analyze.py -k linux/ -f boot_kmem.log -r kmallocs.png --malloc
+
+2.5. Pitfall: wrongly reported allocation (and how to fix it)
+
+There are a number of functions (kstrdup, kmemdup, krealloc, etc) that do
+some kind of allocation on behalf of its caller.
+
+Of course, we don't want to get trace reports from these functions,
+but rather from its caller. To acomplish this, we must use a variant
+of kmalloc, called kmalloc_track_caller, which does exactly that.
+
+Let's see an example. As of today kvasprintf() implementation looks
+like this
+
+    (see lib/kasprintf.c:14)
+    char *kvasprintf(gfp_t gfp, const char *fmt, va_list ap)
+    {
+	   /* code removed */
+	   p = kmalloc(len+1, gfp);
+
+And trace_analyze produces the account file
+
+    total    waste      net alloc/free  caller
+    ---------------------------------------------
+    2161     1184     2161   148/0     kvasprintf
+
+The source of this 148 allocations may be a single caller,
+or it may be multiple callers. We just can't know.
+However, if we replace kmalloc with kmalloc_track_caller,
+we're going to find that out.
+
+    char *kvasprintf(gfp_t gfp, const char *fmt, va_list ap)
+    {
+           /* code removed */
+           p = kmalloc_track_caller(len+1, gfp);
+
+After running the re-built kernel, and comparing both current
+and previous account files, we find this is the real caller:
+
+    total    waste      net alloc/free  caller
+    ---------------------------------------------
+    2161     1184     2161   148/0     kobject_set_name_vargs
+
+So, we've accurately tracked this allocation down to the kobject code.
+
+3. Using a DB file to speed-up multiple runs
+--------------------------------------------
+
+You may find yourself analyzing a large kmem log file.
+Probably, you want to run the script
+several times to get different kinds of results.
+
+The script is not very clever and will re-read the
+long kmem file on each run.
+To alleviate this problem you can have trace_analyze.py
+create a so-called DB file,and use this file instead
+of the kmem log file on subsequent runs.
+
+This is done using the --save-db and --db-file parameters.
+Like this:
+
+    $ ./trace_analyze.py -k ../torvalds/ -f kmem.log --save-db db
+
+Notice you should create the DB file without any filters,
+like --malloc or --start-branch, in order to save the full kmem event log.
+
+Once you have the **db** file created, you would use it on each run
+
+    $ ./trace_analyze.py -k ../torvalds/ --db-file db \
+      -r rings.png -c account.txt
+
+Hopefully, this would prevent you from cursing trace_analyze for being so slow.
+
+"""
+
+import sys
+import string
+import re
+import subprocess
+import math
+import pickle
+import os
+from optparse import OptionParser
+
+# Skip this directories when walking kernel build
+BLACKLIST = ("scripts", "tools")
+
+class Ptr:
+    def __init__(self, fun, ptr, alloc, req):
+        self.fun = fun
+        self.ptr = ptr
+        self.alloc = alloc
+        self.req = req
+
+
+class Callsite:
+    def __init__(self):
+        self.__alloc = 0
+        self.__req = 0
+        self.__alloc_count = 0
+        self.__free_count = 0
+        self.ptrs = []
+
+    def total_dynamic(self):
+        return self.__alloc
+
+    def alloc_count(self):
+        return self.__alloc_count
+
+    def free_count(self):
+        return self.__free_count
+
+    def current_dynamic(self):
+        alloc = 0
+        for ptr in self.ptrs:
+            alloc += ptr.alloc
+        return alloc
+
+    def current_req(self):
+        req = 0
+        for ptr in self.ptrs:
+            req += ptr.req
+        return req
+
+    def waste(self):
+        return self.current_dynamic() - self.current_req()
+
+    def do_alloc(self, alloc, req, ptr):
+        self.__alloc += alloc
+        self.__req += req
+        self.__alloc_count += 1
+        self.ptrs.append(ptr)
+
+    def do_free(self, ptr):
+        self.__free_count += 1
+        self.ptrs.remove(ptr)
+
+
+# Based on addr2sym.py
+class SymbolMap:
+    def __init__(self, filemap):
+        self.fmap = {}
+        self.flist = []
+        self.cache = {}
+
+        try:
+            f = open(filemap)
+        except:
+            print "[ERROR] Cannot read symbol map file {}".format(filemap)
+            sys.exit(1)
+
+        for line in f.readlines():
+            (addr_str, symtype, name) = string.split(line, None, 3)
+            self.fmap[addr_str] = name
+            addr = eval("0x" + addr_str + "L")
+            self.flist.append((addr, name))
+
+        f.close()
+
+    def lookup(self, addr_str):
+
+        # return a tuple (string, offset) for a given address
+        if addr_str in self.fmap:
+            return (self.fmap[addr_str],0)
+
+        # convert address from string to number
+        addr = eval("0x" + addr_str + "L")
+        if addr in self.cache:
+            return self.cache[addr]
+
+        # if address is outside range of addresses in the
+        # map file, just return the address without converting it
+        if addr < self.flist[0][0] or addr > self.flist[-1][0]:
+            return (addr_str,0)
+
+        # no exact match found, now do binary search for closest function
+        # do a binary search in funclist for the function
+        # use a collapsing range to find the closest addr
+        lower = 0
+        upper = len(self.flist)-1
+        while (lower != upper-1):
+            guess_index = lower + (upper-lower)/2
+            guess_addr = self.flist[guess_index][0]
+            if addr < guess_addr:
+                upper = guess_index
+            if addr >= guess_addr:
+                lower = guess_index
+
+        offset = hex(addr-self.flist[lower][0])
+        name = self.flist[lower][1]
+        if name.startswith("."):
+            name = name[1:]
+        self.cache[addr] = (name, offset)
+        return (name, offset)
+
+
+class EventDB:
+    def __init__(self):
+        self.f = {}
+        self.p = {}
+        self.num_allocs = 0
+        self.total_dynamic = 0
+        self.total_req = 0
+        self.num_frees = 0
+        self.num_lost_frees = 0
+
+    def slurp(self, path, buildpath, do_malloc, do_cache):
+        print "Reading symbol map at {}".format(buildpath)
+        sym = SymbolMap(buildpath + "/System.map")
+
+        try:
+            logfile = open(path)
+        except:
+            print "[ERROR] Cannot read log file {}".format(path)
+            sys.exit(1)
+
+        kmalloc_re = r".*kmalloc.*call_site=([a-f0-9]+).*ptr=([a-f0-9]+).*bytes_req=([0-9]+)\s*bytes_alloc=([0-9]+)"
+        kfree_re = r".*kfree.*call_site=[a-f0-9+]+.*ptr=([a-f0-9]+)"
+        cache_alloc_re = r".*cache_alloc.*call_site=([a-f0-9]+).*ptr=([a-f0-9]+).*bytes_req=([0-9]+)\s*bytes_alloc=([0-9]+)"
+        cache_free_re = r".*cache_free.*call_site=[a-f0-9+]+.*ptr=([a-f0-9]+)"
+        both_alloc_re = r".*k.*alloc.*call_site=([a-f0-9]+).*ptr=([a-f0-9]+).*bytes_req=([0-9]+)\s*bytes_alloc=([0-9]+)"
+        both_free_re = r".*k.*free.*call_site=[a-f0-9+]+.*ptr=([a-f0-9]+)"
+
+        if do_malloc is True and do_cache is None:
+            print "Slurping event log, kmalloc events only"
+            alloc_re = kmalloc_re
+            free_re = kfree_re
+        elif do_malloc is None and do_cache is True:
+            print "Slurping event log, kmem_cache events only"
+            alloc_re = cache_alloc_re
+            free_re = cache_free_re
+        else:
+            print "Slurping event log"
+            alloc_re = both_alloc_re
+            free_re = both_free_re
+
+        for line in logfile:
+            m = re.match(alloc_re, line)
+            if m:
+                (fun, offset) = sym.lookup(m.group(1))
+                self.add_malloc("{}+{}".format(fun, offset),
+                                  m.group(2),
+                                  int(m.group(3)),
+                                  int(m.group(4)), line)
+
+            m = re.match(free_re, line)
+            if m:
+                self.add_free(m.group(1))
+
+    def get_bytes(self):
+        alloc = 0
+        req = 0
+        for fun, callsite in self.f.items():
+            alloc += callsite.current_dynamic()
+            req += callsite.current_req()
+        return (alloc, req)
+
+    def add_malloc(self, fun, ptr, req, alloc, line):
+        self.num_allocs += 1
+        self.total_dynamic += alloc
+        self.total_req += req
+
+        ptr_obj = Ptr(fun, ptr, alloc, req)
+
+        if ptr in self.p:
+            print("[WARNING] Duplicate pointer! {}".format(line))
+
+        self.p[ptr] = ptr_obj
+
+        if not fun in self.f:
+            self.f[fun] = Callsite()
+
+        self.f[fun].do_alloc(alloc, req, ptr_obj)
+
+    def add_free(self, ptr):
+        self.num_frees += 1
+
+        if not ptr in self.p:
+            self.num_lost_frees += 1
+            return
+
+        ptr_obj = self.p[ptr]
+
+        self.f[ptr_obj.fun].do_free(ptr_obj)
+
+        # Remove it from pointers dictionary
+        del self.p[ptr]
+
+    def print_callers(self, filepath, filter_tree=None):
+
+        if filter_tree is None:
+            filter_symbol = lambda f: True
+            get_symbol_dir = lambda f: ""
+        else:
+            filter_symbol = filter_tree.symbol_is_here
+            get_symbol_dir = filter_tree.get_symbol_dir
+
+        syms = [(f,c) for f,c in self.f.items() if filter_symbol(f)]
+
+        f = open(filepath, 'w')
+
+        for name, c in syms:
+
+            symdir = get_symbol_dir(name)
+            f.write("{:<60} {:<8} {:<8} {:<8}\n".format(name,
+                                           c.current_dynamic(),
+                                           c.waste(),
+                                           symdir))
+
+        f.close()
+
+    def print_account(self, filepath, order_by, filter_tree=None):
+
+        current_dynamic = 0
+        current_req = 0
+        alloc_count = 0
+        free_count = 0
+
+        if filter_tree is None:
+            filter_symbol = lambda f: True
+        else:
+            filter_symbol = filter_tree.symbol_is_here
+
+        syms = [(f,c) for f,c in self.f.items() if filter_symbol(f)]
+
+        f = open(filepath, 'w')
+
+        for fun, callsite in syms:
+            current_dynamic += callsite.current_dynamic()
+            current_req += callsite.current_req()
+            alloc_count += callsite.alloc_count()
+            free_count += callsite.free_count()
+
+        f.write("current bytes allocated: {:>10}\n".format(current_dynamic))
+        f.write("current bytes requested: {:>10}\n".format(current_req))
+        f.write("current wasted bytes:    {:>10}\n".format((current_dynamic -
+                                                         current_req)))
+        f.write("number of allocs:        {:>10}\n".format(alloc_count))
+        f.write("number of frees:         {:>10}\n".format(free_count))
+        f.write("number of callers:       {:>10}\n".format(len(syms)))
+        f.write("\n")
+        f.write("   total    waste      net alloc/free  caller\n")
+        f.write("---------------------------------------------\n")
+
+        for fun, callsite in sorted(syms,
+                                    key=lambda item: getattr(item[1],
+                                                             order_by)(),
+                                    reverse=True):
+
+            f.write("%8d %8d %8d %5d/%-5d %s\n" % (callsite.total_dynamic(),
+                                               callsite.waste(),
+                                               callsite.current_dynamic(),
+                                               callsite.alloc_count(),
+                                               callsite.free_count(),
+                                               fun))
+
+        f.close()
+
+
+class MemTreeNodeSize:
+    def __init__(self, node):
+        self.__static = 0
+        self.__total_dynamic = 0
+        self.__current_dynamic = 0
+        self.__waste = 0
+
+        # First for my symbols
+        for sym, size in node.data.items():
+            self.__static += size
+        for sym, size in node.text.items():
+            self.__static += size
+        for sym, call in node.funcs.items():
+            self.__total_dynamic += call.total_dynamic()
+            self.__current_dynamic += call.current_dynamic()
+            self.__waste += call.current_dynamic() - call.current_req()
+
+        # Now, for my children's symbols.
+        # Or, instead, we could first add all my children's
+        # symbols here and then get the node size.
+        for name, child in node.childs.items():
+            self.__total_dynamic += child.size().total_dynamic()
+            self.__current_dynamic += child.size().current_dynamic()
+            self.__static += child.size().static()
+            self.__waste += child.size().waste()
+
+    def current(self):
+        return self.__static + self.__current_dynamic
+
+    def waste(self):
+        return self.__waste
+
+    def static(self):
+        return self.__static
+
+    def current_dynamic(self):
+        return self.__current_dynamic
+
+    def total_dynamic(self):
+        return self.__total_dynamic
+
+
+class MemTreeNode:
+    def __init__(self, name="", parent=None, db=None):
+        self.name = name
+        self.parent = parent
+        self.childs = {}
+        self.funcs = {}
+        self.data = {}
+        self.text = {}
+        self.node_size = None
+        self.fill = getattr(self, "fill_per_file")
+
+        # If db is None, use parent db
+        if db is None:
+            if parent is not None:
+                self.db = parent.db
+        else:
+            self.db = db
+
+    def get_symbol_dir(self, symbol):
+        if symbol in self.funcs:
+            return self.full_name()
+        else:
+            for name, child in self.childs.items():
+                symdir = child.get_symbol_dir(symbol)
+                if symdir is not None:
+                    return symdir
+        return None
+
+    def symbol_is_here(self, symbol):
+        if symbol in self.funcs:
+            return True
+        else:
+            for name, child in self.childs.items():
+                if child.symbol_is_here(symbol):
+                    return True
+        return False
+
+    def full_name(self):
+        l = [self.name,]
+        parent = self.parent
+        while parent:
+            if parent.name != "":
+                l.append(parent.name)
+            parent = parent.parent
+
+        return "/".join(reversed(l))
+
+    def size(self):
+        if self.node_size is None:
+            self.node_size = MemTreeNodeSize(self)
+        return self.node_size
+
+    def __collapse(self):
+        # Collapse one-child empty nodes
+        for name, child in self.childs.items():
+            if len(child.childs) > 2:
+                child.__collapse()
+
+            if len(child.childs) == 1 and not child.funcs and not child.data:
+                # Remove from child
+                (k, v) = child.childs.items()[0]
+                del child.childs[k]
+
+                # Add here
+                self.childs[k] = v
+                v.parent = self
+
+    def __strip(self):
+        # Remove empty nodes
+        for name, child in self.childs.items():
+            if child.childs:
+                child.__strip()
+            if not child.funcs and not child.data and not child.childs:
+                del self.childs[name]
+
+    def __get_root(self):
+        if len(self.childs) == 1:
+            child = self.childs.itervalues().next()
+            # This is a pedantic test, the first node with
+            # multiple childs is the root we're searching
+            if not child.name.endswith(".o"):
+                return child.__get_root()
+
+        return self
+
+    # Obtain a clean tree.
+    # We do it this way because collapse() and strip() must be called
+    # in an ordered fashion.
+    def get_clean(self):
+        self.__collapse()
+        self.__strip()
+        return self.__get_root()
+
+    def find_first_branch(self, which):
+        if self.name == which:
+            return self
+
+        for name, node in self.childs.items():
+            if which == name:
+                return node
+
+        for name, node in self.childs.items():
+            return node.find_first_branch(which)
+
+        print("[WARNING] Can't find first branch '{}'".format(which))
+        return None
+
+    # This are for debug purposes, move along
+    def treelike(self, level=0, attr="current_dynamic"):
+        str = ""
+        str += "{}\n".format(self.name)
+        for name, node in self.childs.items():
+            child_str = node.treelike(level+1, attr)
+            if child_str:
+                str += "{}{}".format("  "*(level+1), child_str)
+        return str
+
+    def treelike2(self, level=0, attr="current_dynamic"):
+        str = ""
+
+        attr_val = getattr(self.size(), attr)()
+
+        if self.name and attr_val != 0:
+            str += "{} - {}={}\n".format(self.name, attr, attr_val)
+
+        for name, node in self.childs.items():
+            child_str = node.treelike2(level+1, attr)
+            if child_str:
+                str += "{}{}".format("  "*(level+1), child_str)
+        return str
+
+    def fill_per_file(self, path):
+
+        filepath = "{}{}/{}".format(MemTreeNode.abs_slash, self.full_name(), path)
+
+        if path not in self.childs:
+            self.childs[path] = MemTreeNode(path, self)
+
+        child = self.childs[path]
+
+        output = []
+        try:
+            p1 = subprocess.Popen(["readelf", "--wide", "-s", filepath], stdout=subprocess.PIPE)
+            output = p1.communicate()[0].split("\n")
+        except:
+            pass
+
+        for line in output:
+            if line == '':
+                continue
+
+            m = re.match(r".*\s([0-9]+)\sFUNC.*\s+([a-zA-Z0-9_\.]+)\b", line)
+            if m:
+                if m.group(2) in child.text:
+                    print "Duplicate text entry! {}".format(m.group(2))
+                child.text[m.group(2)] = int(m.group(1))
+
+                # Search every callsite in db matching this name
+                for name, callsite in child.db.f.iteritems():
+                    if name.startswith(m.group(2)):
+                        child.funcs[name] = callsite
+
+            m = re.match(r".*\s([0-9]+)\sOBJECT.*\s+([a-zA-Z0-9_\.]+)\b", line)
+            if m:
+                if m.group(2) in child.data:
+                    print "[WARNING] Duplicate data entry! {}".format(m.group(2))
+                child.data[m.group(2)] = int(m.group(1))
+
+    # This is deprecated, fill_per_file should be used instead.
+    # I keep it here just to have the code handy.
+    def fill_per_dir(self, path):
+
+        if self.funcs or self.data:
+            print "[WARNING] Oooops, already filled"
+
+        filepath = "." + self.full_name() + "/built-in.o"
+
+        output = []
+        try:
+            p1 = subprocess.Popen(["readelf", "--wide", "-s", filepath], stdout=subprocess.PIPE)
+            output = p1.communicate()[0].split("\n")
+        except:
+            pass
+
+        for line in output:
+            if line == '':
+                continue
+            m = re.match(r".*FUNC.*\b([a-zA-Z0-9_]+)\b", line)
+            if m:
+                if m.group(1) in self.funcs:
+                    print "[WARNING] Duplicate entry! {}".format(m.group(1))
+
+                if m.group(1) in self.db.f:
+                    self.funcs[m.group(1)] = self.db.f[m.group(1)]
+
+            m = re.match(r".*([0-9]+)\sOBJECT.*\b([a-zA-Z0-9_]+)\b", line)
+            if m:
+                self.data[m.group(2)] = int(m.group(1))
+
+    # path is should be an object file, like fs/ext2/inode.o
+    def add_child(self, path):
+        # adding a child invalidates node_size object
+        self.node_size = None
+
+        parts = path.split('/', 1)
+        if len(parts) == 1:
+            self.fill(path)
+            pass
+        else:
+            node, others = parts
+            if node not in self.childs:
+                self.childs[node] = MemTreeNode(node, self)
+            self.childs[node].add_child(others)
+
+    def add_path(self, path):
+        for root, dirs, files in os.walk(path):
+
+            blacklisted = False
+            for bdir in BLACKLIST:
+                if root.startswith("{}/{}".format(path, bdir)):
+                    blacklisted = True
+
+            if blacklisted:
+                continue
+
+            for filepath in [os.path.join(root,f) for f in files]:
+                if filepath.endswith("built-in.o"):
+                    continue
+                if filepath.endswith("vmlinux.o"):
+                    continue
+                if filepath.endswith(".o"):
+                    # We need to check if this object file,
+                    # has a corresponding source file
+                    filesrc = "{}.c".format(os.path.splitext(filepath)[0])
+                    if os.path.exists(filesrc):
+                        self.add_child(filepath)
+
+
+
+##########################################################################
+##
+## Main
+##
+##########################################################################
+
+def main():
+
+    parser = OptionParser()
+    parser.add_option("-k", "--kernel",
+                      dest="buildpath",
+                      default="",
+                      help="path to built kernel tree")
+
+    parser.add_option("-f", "--file",
+                      dest="file",
+                      default="",
+                      help="trace log file to analyze")
+
+    parser.add_option("--db-file",
+                      dest="db_file",
+                      default="",
+                      help="use db_file as DB instead of creating one")
+
+    parser.add_option("--save-db",
+                      dest="save_db_file",
+                      default="",
+                      help="save a db_file to use as DB")
+
+    parser.add_option("-b", "--start-branch",
+                      dest="start_branch",
+                      default="",
+                      help="first directory name to use as ringchart root")
+
+    parser.add_option("-r", "--rings-file",
+                      dest="rings_file",
+                      default="",
+                      help="plot ringchart information")
+
+    parser.add_option("-i", "--rings-show",
+                      dest="rings_show",
+                      action="store_true",
+                      help="show interactive ringchart")
+
+    parser.add_option("-a", "--rings-attr",
+                      dest="rings_attr",
+                      default="current_dynamic",
+                      help="attribute to visualize [static, current, \
+                                    current_dynamic, total_dynamic, waste]")
+
+    parser.add_option("--malloc",
+                      dest="do_malloc",
+                      action="store_true",
+                      help="trace kmalloc/kfree only")
+
+    parser.add_option("--cache",
+                      dest="do_cache",
+                      action="store_true",
+                      help="trace kmem_cache_alloc/kmem_cache_free only")
+
+    parser.add_option("-c", "--account-file",
+                      dest="account_file",
+                      default="",
+                      help="show output matching slab_account output")
+
+    parser.add_option("-l", "--callers-file",
+                      dest="callers_file",
+                      default="",
+                      help="show callers file suitable for ringchart generation")
+
+    parser.add_option("-o", "--order-by",
+                      dest="order_by",
+                      default="current_dynamic",
+                      help="attribute to order account \
+                            [current_dynamic, total_dynamic, alloc_count, free_count, waste]")
+
+
+    (opts, args) = parser.parse_args()
+
+    # Kernel build path is a mandatory parameter.
+    # We need to look at compiled objects and also for System.map.
+    if len(opts.db_file) == 0 and len(opts.buildpath) == 0:
+        print "Please set a kernel build path or a DB file!"
+        parser.print_help()
+        return
+
+    # Check valid options
+    if len(opts.order_by) > 0:
+        if opts.order_by not in dir(Callsite):
+            print "Hey! {} is not a valid --order-by option".format(opts.order_by)
+            parser.print_help()
+            return
+
+    if len(opts.rings_attr) > 0:
+        if opts.rings_attr not in dir(MemTreeNodeSize):
+            print "Hey! {} is not a valid --rings-attr option".format(opts.rings_attr)
+            parser.print_help()
+            return
+
+    # Clean user provided kernel path from dirty slashes
+    buildpath = opts.buildpath.rstrip("/")
+
+    # If we don't have a trace log file,
+    # and we don't have a DB file
+    # then we'll fallback to static report mode.
+    if len(opts.db_file) == 0 and len(opts.file) == 0:
+        print "No trace log file or DB file specified: will report on static size only"
+        opts.rings_attr = "static"
+        opts.do_malloc = False
+        opts.do_cache = False
+        opts.account_file = ""
+        opts.just_static = True
+        # Set some default
+        if len(opts.rings_file) == 0:
+            opts.rings_file = "rings_static.png"
+    else:
+        opts.just_static = False
+
+    if opts.rings_show is None:
+        opts.rings_show = False
+
+    rootDB = EventDB()
+    # Get root database, if need to
+    if not opts.just_static:
+        if len(opts.db_file) != 0:
+            print "Using db file '{}'".format(opts.db_file)
+            f = open(opts.db_file)
+            buildpath = pickle.load(f)
+            rootDB = pickle.load(f)
+            f.close()
+        else:
+            rootDB.slurp(opts.file, buildpath, opts.do_malloc, opts.do_cache)
+
+            if len (opts.save_db_file) != 0:
+                print "Saving db file at '{}'".format(opts.save_db_file)
+                f = open(opts.save_db_file, 'w')
+                pickle.dump(buildpath,f)
+                pickle.dump(rootDB, f)
+                f.close()
+
+    if len(opts.callers_file) == 0 and \
+       len(opts.account_file) == 0 and \
+       len(opts.rings_file) == 0:
+            sys.exit(0)
+
+    root_path = "{}/{}".format(buildpath, opts.start_branch).rstrip("/")
+
+    print "Creating tree from compiled symbols at '{}'".format(root_path)
+
+    # We need to specify if user provided buildpath is absolute
+    MemTreeNode.abs_slash = buildpath.startswith("/") and "/" or ""
+
+    tree = MemTreeNode(db = rootDB)
+    tree.add_path(root_path)
+
+    print "Cleaning tree"
+    tree = tree.get_clean()
+
+    # DEBUG--ONLY. Should we add an option for this?
+    #print(tree.treelike2(attr = opts.rings_attr))
+    if len(opts.callers_file) != 0:
+        print "Creating callers file at '{}'".format(opts.callers_file)
+        rootDB.print_callers(opts.callers_file,
+                             tree)
+
+
+    if len(opts.account_file) != 0:
+        print "Creating account file at '{}'".format(opts.account_file)
+        rootDB.print_account(opts.account_file,
+                             opts.order_by,
+                             tree)
+
+    if len(opts.rings_file) != 0:
+        if tree is None:
+            print "Sorry, there is nothing to plot for branch '{}'".format(opts.start_branch)
+        else:
+            print "Creating ringchart for attribute '{}'".format(opts.rings_attr)
+            visualize_mem_tree(tree, opts.rings_attr, opts.rings_file, opts.rings_show)
+
+
+##########################################################################
+##
+## Visualization stuff
+##
+##########################################################################
+
+
+CENTER_X = 1.0
+CENTER_Y = 1.0
+WIDTH = 0.2
+tango_colors = ['#ef2929',
+        '#ad7fa8',
+        '#729fcf',
+        '#8ae234',
+        '#e9b96e',
+        '#fcaf3e',]
+
+
+def human_bytes(bytes, precision=1):
+    """Return a humanized string representation of a number of bytes.
+
+    Assumes `from __future__ import division`.
+
+    >>> humanize_bytes(1)
+    '1 byte'
+    >>> humanize_bytes(1024)
+    '1.0 kB'
+    >>> humanize_bytes(1024*123)
+    '123.0 kB'
+    >>> humanize_bytes(1024*12342)
+    '12.1 MB'
+    >>> humanize_bytes(1024*12342,2)
+    '12.05 MB'
+    >>> humanize_bytes(1024*1234,2)
+    '1.21 MB'
+    >>> humanize_bytes(1024*1234*1111,2)
+    '1.31 GB'
+    >>> humanize_bytes(1024*1234*1111,1)
+    '1.3 GB'
+    """
+    abbrevs = (
+        (1<<50L, 'PB'),
+        (1<<40L, 'TB'),
+        (1<<30L, 'GB'),
+        (1<<20L, 'MB'),
+        (1<<10L, 'kB'),
+        (1, 'bytes')
+    )
+    if bytes == 1:
+        return '1 byte'
+    for factor, suffix in abbrevs:
+        if bytes >= factor:
+            break
+    return '{0:.{1}f} {2}'.format(float(bytes)/factor, precision, suffix)
+
+
+class Section:
+    def __init__(self, node, size, total_size, total_angle, start_angle):
+        self.node = node
+        self.size = size
+        self.start_angle = start_angle
+        self.angle = size * total_angle / total_size
+
+
+def ring_color(start_angle, level):
+    from matplotlib.colors import colorConverter
+
+    # f:      [1 - 0.26]
+    # rel:    [0 - 198]
+    # icolor: [0 - 5]
+
+    if level == 1:
+        return colorConverter.to_rgb('#808080')
+
+    f = 1 - (((level-1) * 0.3) / 8)
+    rel = start_angle / 180. * 99
+    icolor = int(rel / (100./3))
+    next_icolor = (icolor + 1) % 6
+
+    # Interpolate (?)
+    color = colorConverter.to_rgb(tango_colors[icolor])
+    next_color = colorConverter.to_rgb(tango_colors[next_icolor])
+    p = (rel - icolor * 100./3) / (100./3)
+
+    color = [f * (c - p * (c - n)) for c, n in zip(color, next_color)]
+
+    return color
+
+
+def create_child_rings(tree, level=2, level_angle=360, start_angle=0, rings=[],
+         radius=WIDTH, center=(CENTER_X, CENTER_Y), size_attr="static"):
+
+    from matplotlib.patches import Wedge
+
+    child_size = 0
+    max_size = getattr(tree.size(), size_attr)()
+
+    if len(tree.childs) == 0:
+        return rings
+
+    if max_size == 0:
+        for name, node in tree.childs.items():
+            max_size += getattr(node.size(), size_attr)()
+    if max_size == 0:
+        return rings
+
+    s_angle = start_angle
+    sections = {}
+
+    # Create child wedges
+    for name, node in tree.childs.items():
+
+        size = getattr(node.size(), size_attr)()
+        s = Section(node, size, max_size, level_angle, s_angle)
+        sections[name] = s
+
+        create_child_rings(node, level+1, s.angle, s_angle, rings, radius, center, size_attr)
+        s_angle += s.angle
+        child_size += size
+
+    # Just a check
+    if child_size > max_size:
+        print "[{}] Ooops, child size is greater than max size".format(name)
+
+    for name, section in sections.items():
+
+        # Create tuple: (wedge, name)
+        name = "{} {}".format(name, human_bytes(section.size))
+        tup = ( Wedge(center,
+            level * radius,
+            section.start_angle,
+            section.start_angle + section.angle,
+            width=radius,
+            facecolor=ring_color(section.start_angle, level)),
+            name)
+
+        rings.append(tup)
+
+    return rings
+
+
+def visualize_mem_tree(tree, size_attr, filename, show):
+    import pylab
+
+    RING_MIN_WIDTH = 1
+    TEXT_MIN_WIDTH = 5
+
+    rings = create_child_rings(tree, size_attr=size_attr)
+
+    fig = pylab.figure()
+    ax = fig.add_subplot(111)
+    annotations = []
+    labels = []
+
+    text = "{} {}".format(tree.name,
+                          human_bytes(getattr(tree.size(), size_attr)()))
+    ann = ax.annotate(text,
+                      size=12,
+                      bbox=dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8),
+                      xy=(CENTER_X, CENTER_Y), xycoords='data',
+                      xytext=(CENTER_X, CENTER_Y), textcoords='data')
+    annotations.append(ann)
+
+    for p in rings:
+        wedge = p[0]
+
+        # Skip if too small
+        if (wedge.theta2 - wedge.theta1) < RING_MIN_WIDTH:
+            continue
+
+        # Add wedge
+        ax.add_patch(wedge)
+
+        # Skip text if too small
+        if (wedge.theta2 - wedge.theta1) < TEXT_MIN_WIDTH:
+            continue
+
+        theta = math.radians((wedge.theta1 + wedge.theta2) / 2.)
+        x0 = wedge.center[0] + (wedge.r - wedge.width / 2.) * math.cos(theta)
+        y0 = wedge.center[1] + (wedge.r - wedge.width / 2.) * math.sin(theta)
+        x = wedge.center[0] + (0.1 + wedge.r * 1.5 - wedge.width / 2.) * math.cos(theta)
+        y = wedge.center[1] + (0.1 + wedge.r * 1.5 - wedge.width / 2.) * math.sin(theta)
+
+        ax.plot(x0, y0, ".", color="black")
+
+        text = p[1]
+        ann = ax.annotate(text,
+                    size=12,
+                    bbox=dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8),
+                    xy=(x0, y0), xycoords='data',
+                    xytext=(x, y), textcoords='data',
+                    arrowprops=dict(arrowstyle="-", connectionstyle="angle3, angleA=0, angleB=90"),)
+        annotations.append(ann)
+
+    (alloc, req) = tree.db.get_bytes()
+
+    pylab.axis('off')
+
+    if len(filename) != 0:
+        print("Plotting to file '{}'".format(filename))
+        pylab.savefig("{}".format(filename),
+                      bbox_extra_artists=annotations,
+                      bbox_inches='tight', dpi=300)
+    if show:
+        print("Plotting interactive")
+        pylab.show()
+
+
+##########################################################################
+
+if __name__ == "__main__":
+    main()
-- 
1.7.8.6


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
  2013-01-22  9:46 [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool Ezequiel Garcia
@ 2013-01-22 13:41 ` Pekka Enberg
  2013-01-22 16:16   ` Ezequiel Garcia
  2013-01-23  4:27 ` Minchan Kim
  1 sibling, 1 reply; 8+ messages in thread
From: Pekka Enberg @ 2013-01-22 13:41 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: linux-kernel, linux-mm, Tim Bird, Ezequiel Garcia,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Arnaldo Carvalho de Melo

(Adding acme to CC.)

On Tue, Jan 22, 2013 at 11:46 AM, Ezequiel Garcia
<ezequiel.garcia@free-electrons.com> wrote:
> From: Ezequiel Garcia <elezegarcia@gmail.com>
>
> The purpose of trace_analyze.py tool is to perform static
> and dynamic memory analysis using a kmem ftrace
> log file and a built kernel tree.
>
> This script and related work has been done on the CEWG/2012 project:
> "Kernel dynamic memory allocation tracking and reduction"
> (More info here [1])
>
> It produces mainly two kinds of outputs:
>  * an account-like output, similar to the one given by Perf, example below.
>  * a ring-char output, examples here [2].
>
> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log --account-file account.txt
> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log -c account.txt
>
> This will produce an account file like this:
>
>     current bytes allocated:     669696
>     current bytes requested:     618823
>     current wasted bytes:         50873
>     number of allocs:              7649
>     number of frees:               2563
>     number of callers:              115
>
>      total    waste      net alloc/free  caller
>     ---------------------------------------------
>     299200        0   298928  1100/1     alloc_inode+0x4fL
>     189824        0   140544  1483/385   __d_alloc+0x22L
>      51904        0    47552   811/68    sysfs_new_dirent+0x4eL
>     [...]
>
> [1] http://elinux.org/Kernel_dynamic_memory_analysis
> [2] http://elinux.org/Kernel_dynamic_memory_analysis#Current_dynamic_footprint
>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>

Looks really useful! Dunno if this makes most sense as a separate
script or as an extension perf.

                        Pekka

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
  2013-01-22 13:41 ` Pekka Enberg
@ 2013-01-22 16:16   ` Ezequiel Garcia
  0 siblings, 0 replies; 8+ messages in thread
From: Ezequiel Garcia @ 2013-01-22 16:16 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ezequiel Garcia, linux-kernel, linux-mm, Tim Bird,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Arnaldo Carvalho de Melo

Hi Pekka,

On Tue, Jan 22, 2013 at 10:41 AM, Pekka Enberg <penberg@kernel.org> wrote:
> (Adding acme to CC.)
>
> On Tue, Jan 22, 2013 at 11:46 AM, Ezequiel Garcia
> <ezequiel.garcia@free-electrons.com> wrote:
>> From: Ezequiel Garcia <elezegarcia@gmail.com>
>>
>> The purpose of trace_analyze.py tool is to perform static
>> and dynamic memory analysis using a kmem ftrace
>> log file and a built kernel tree.
>>
>> This script and related work has been done on the CEWG/2012 project:
>> "Kernel dynamic memory allocation tracking and reduction"
>> (More info here [1])
>>
>> It produces mainly two kinds of outputs:
>>  * an account-like output, similar to the one given by Perf, example below.
>>  * a ring-char output, examples here [2].
>>
>> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log --account-file account.txt
>> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log -c account.txt
>>
>> This will produce an account file like this:
>>
>>     current bytes allocated:     669696
>>     current bytes requested:     618823
>>     current wasted bytes:         50873
>>     number of allocs:              7649
>>     number of frees:               2563
>>     number of callers:              115
>>
>>      total    waste      net alloc/free  caller
>>     ---------------------------------------------
>>     299200        0   298928  1100/1     alloc_inode+0x4fL
>>     189824        0   140544  1483/385   __d_alloc+0x22L
>>      51904        0    47552   811/68    sysfs_new_dirent+0x4eL
>>     [...]
>>
>> [1] http://elinux.org/Kernel_dynamic_memory_analysis
>> [2] http://elinux.org/Kernel_dynamic_memory_analysis#Current_dynamic_footprint
>>
>> Cc: Pekka Enberg <penberg@kernel.org>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
>
> Looks really useful! Dunno if this makes most sense as a separate
> script or as an extension perf.
>

I'm glad you think so.
Regarding the perf extension, I would have to think about that.
I guess you mean convert this script to use the python binding?

Will it still be able to work off-box? (a typical embedded scenario)

-- 
    Ezequiel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
  2013-01-22  9:46 [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool Ezequiel Garcia
  2013-01-22 13:41 ` Pekka Enberg
@ 2013-01-23  4:27 ` Minchan Kim
  2013-01-23 21:37   ` Ezequiel Garcia
  1 sibling, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2013-01-23  4:27 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: linux-kernel, linux-mm, Tim Bird, Ezequiel Garcia, Pekka Enberg,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Hi Ezequiel,

On Tue, Jan 22, 2013 at 06:46:58AM -0300, Ezequiel Garcia wrote:
> From: Ezequiel Garcia <elezegarcia@gmail.com>
> 
> The purpose of trace_analyze.py tool is to perform static
> and dynamic memory analysis using a kmem ftrace
> log file and a built kernel tree.
> 
> This script and related work has been done on the CEWG/2012 project:
> "Kernel dynamic memory allocation tracking and reduction"
> (More info here [1])
> 
> It produces mainly two kinds of outputs:
>  * an account-like output, similar to the one given by Perf, example below.
>  * a ring-char output, examples here [2].
> 
> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log --account-file account.txt
> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log -c account.txt
> 
> This will produce an account file like this:
> 
>     current bytes allocated:     669696
>     current bytes requested:     618823
>     current wasted bytes:         50873
>     number of allocs:              7649
>     number of frees:               2563
>     number of callers:              115
> 
>      total    waste      net alloc/free  caller
>     ---------------------------------------------
>     299200        0   298928  1100/1     alloc_inode+0x4fL
>     189824        0   140544  1483/385   __d_alloc+0x22L
>      51904        0    47552   811/68    sysfs_new_dirent+0x4eL
>     [...]
> 
> [1] http://elinux.org/Kernel_dynamic_memory_analysis
> [2] http://elinux.org/Kernel_dynamic_memory_analysis#Current_dynamic_footprint

First of all, Thanks for nice work! It could be very useful for
embedded side.

Questions.

1. Can we detect different call path but same function?
   I mean 

        A       C
         \     /
          B   D
           \ /
            E
            |
         kmalloc

In this case, E could be called by A or C. I would like to know the call path.
It could point out exact culprit of memory hogger.

2. Does it support alloc_pages family?
   kmem event trace already supports it. If it supports, maybe we can replace
   CONFIG_PAGE_OWNER hack.

Thanks!

> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
> ---
>  RFC/scripts/tracing/trace_analyze.py | 1249 ++++++++++++++++++++++++++++++++++++++
>  1 files changed, 1249 insertions(+), 0 deletions(-)
>  create mode 100755 scripts/tracing/trace_analyze.py
> 
> diff --git a/scripts/tracing/trace_analyze.py b/scripts/tracing/trace_analyze.py
> new file mode 100755
> index 0000000..ad49c9a
> --- /dev/null
> +++ b/scripts/tracing/trace_analyze.py
> @@ -0,0 +1,1249 @@
> +#!/usr/bin/env python
> +
> +"""
> +Copyright (C) 2012 Ezequiel Garcia <elezegarcia@gmail.com>
> +Licensed under the terms of the GNU GPL License version 2
> +
> +trace_analize.py
> +----------------
> +
> +0. Introduction
> +---------------
> +
> +This script allows to perform some analysis on kernel dynamic memory
> +allocations by post-processing ftrace kmem event.
> +In addition, it can also report on static footprint on a built kernel tree.
> +
> +trace_analyze.py typically needs access to:
> +1) a built kernel tree and, 2) an ftrace kmem log.
> +
> +Since reading the kmem event log is a costly operation,
> +you can also generate a 'db' file to speed-up subsequent runs of the script.
> +
> +This script and work related has been done thanks to the CEWG project
> +"Kernel dynamic memory allocation tracking and reduction"
> +You can find lot more information about this script and on kernel dynamic
> +memory tracking here:
> +
> +    http://elinux.org/Kernel_dynamic_memory_analysis
> +
> +Disclaimer:
> +trace_analyze.py is not stable, so expect some roughness.
> +Testing and feedback is more than welcome.
> +In fact, even some flames are welcome.
> +
> +1. Using trace_analyze.py for static analysis
> +---------------------------------------------
> +
> +Usage is fairly simple
> +
> +    $ ./trace_analyze.py -k /usr/src/linux -r foo.png
> +    $ ./trace_analyze.py --kernel /usr/src/linux --rings-file foo.png
> +
> +This should produce a ringchart png file in the current directory.
> +Of course, you can use absolute and relative paths in the path parameter
> +
> +    $ ./trace_analyze.py -k ../../torvalds -r foo.png
> +
> +If you're interested in a specific subsystem you can use a parameter to specify
> +the directory tree branch to take as root
> +
> +    $ ./trace_analyze -k linux --start-branch fs/ext2 -r ext2.png
> +    $ ./trace_analyze -k linux -b drivers -r drivers.png
> +    $ ./trace_analyze -k linux -b mm -r mm.png
> +
> +Each of this commands will produce a ringchart png file in the
> +curent directory, named as specified.
> +
> +What's under the hood?
> +The script will perform a directory walk, internally creating a tree matching
> +the provided kernel tree. On each object file found (like fs/inode.o) it will
> +perform a 'readelf --syms' to get a list of symbols contained in it. Nothing fancy.
> +
> +2. Using trace_analyze.py for dynamic analysis
> +----------------------------------------------
> +
> +2.1. Producing a kmem trace log file
> +
> +In case you don't know or don't remember how to use ftrace to
> +produce kmem events, here's a little remainder.
> +For more information, please refer to the canonical
> +trace documentation at the linux tree:
> +
> +- Documentation/trace/ftrace.txt
> +- Documentation/trace/tracepoint-analysis.txt
> +- and everything else inside Documentation/trace/
> +
> +The purpose of trace_analyze script is to perform dynamic memory analysis.
> +For this to work you need feed it with a kmem trace log file
> +(of course, you also need to give hime a built kernel tree).
> +
> +Such log must be produced on the running target kernel,
> +but you can post-process it off-box.
> +For instance, you boot your kernel with kmem parameters
> +to enable ftrace kmem events:
> +(it's recommended to enable all events, despite not running a NUMA machine).
> +
> +    trace_event="kmem:kmalloc,kmem:kmalloc_node,kmem:kfree,kmem:kmem_cache_alloc,kmem:kmem_cache_alloc_node,kmem:kmem_cache_free"
> +
> +This parameter will have linux to start tracing as soon as possible.
> +Of course some early traces will be lost, see below.
> +
> +(on your target kernel)
> +
> +    # To stop tracing
> +    $ echo "0" > /sys/kernel/debug/tracing/tracing_on
> +    # Dump
> +    $ cat /sys/kernel/debug/tracing/trace > kmem.log
> +
> +Now you need to get this file so you can post-process
> +it using trace_analyze.py.
> +In my case, I use qemu with a file backing serial device,
> +so I simply do:
> +
> +(on your target kernel)
> +
> +    $ cat /sys/kernel/debug/tracing/trace > /dev/ttyS0
> +
> +And I get the log on qemu's backing file.
> +
> +Now you have everything you need to start the analysis.
> +
> +2.2. Slab accounting file output
> +
> +To obtain a memory accounting file you need to use
> +--acount-file (-c) parameter, like this:
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log --account-file account.txt
> +    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt
> +
> +This will produce an account file like this:
> +
> +    current bytes allocated:     669696
> +    current bytes requested:     618823
> +    current wasted bytes:         50873
> +    number of allocs:              7649
> +    number of frees:               2563
> +    number of callers:              115
> +
> +     total    waste      net alloc/free  caller
> +    ---------------------------------------------
> +    299200        0   298928  1100/1     alloc_inode+0x4fL
> +    189824        0   140544  1483/385   __d_alloc+0x22L
> +     51904        0    47552   811/68    sysfs_new_dirent+0x4eL
> +     16384     8088    16384     1/0     __seq_open_private+0x24L
> +     15936     1328    15936    83/0     device_create_vargs+0x42L
> +     14720    10898    14016   460/22    sysfs_new_dirent+0x29L
> +
> +2.3. Controlling account output
> +
> +You can ask the script to read only kmalloc events
> +(notice the option name is *--malloc*):
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --malloc
> +
> +Or you can ask the script to read only kmem_cache events:
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --cache
> +
> +If you want to order the account file you can use --order-by (-o):
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --order-by=waste
> +    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt --malloc -o waste
> +The possible options for order-by parameter are:
> +
> +* total_dynamic: Added allocations size
> +* current_dynamic: Currently allocated size
> +* alloc_count: Number of allocations
> +* free_count: Number of frees
> +* waste: Currently wasted size
> +
> +You can pick a directory to get an account file showing
> +only the allocations from that directory.
> +This is done with the --start-branch (-b) option,
> +just like we've done for the static analysis:
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log -c account.txt -b drivers/base/
> +
> +All of these options can be combined.
> +For instance, if you want to get kmalloc events only,
> +coming from fs/ directory and ordered by current dynamic footprint:
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log -b fs -c account.txt -o current_dynamic --malloc
> +
> +2.4. Producing a pretty ringchart for dynamic allocations
> +
> +As already explained in the static analysis section, it's possible to produce
> +a ringchart to get **the big picture** of dynamic allocations.
> +You will need to have *matplotlib* installed, which should be as easy as:
> +
> +    $ {your_pkg_manager} install matplotlib
> +
> +The script usage is very simple,
> +just pass the parameter --rings-file (-r) along with a filename
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log --rings-file=dynamic.png
> +
> +This command will produce a png file named as specified.
> +The plot will show current dynamic allocations by default.
> +You can control the used attrbute used for the ringchart
> +plot using --rings-attr (-a) parameter.
> +
> +The available options are:
> +
> +- current: static + current dynamic size
> +- static: static size
> +- waste: wasted size
> +- current_dynamic: current dynamic size
> +- total_dyamic: added dynamic size
> +
> +For instance, you may want a ringchart for wasted bytes
> +
> +    $ ./trace_analyze.py -k linux -f kmem.log -r -a waste
> +
> +You can use --start-branch (-b) parameter to plot allocations made from just one directory.
> +For instance, if you want to get wasted bytes for ext4 filesystem:
> +
> +    $ ./trace_analyze.py -k ../torvalds -f kmem.log \
> +      -r ext4_waste.png -a waste -b fs/ext4
> +
> +Or, if you want to see static footprint of arch-dependent mm code:
> +
> +    $ ./trace_analyze.py -k ../torvalds -f kmem.log \
> +      -r x86_static.png -a static -b arch/x86/mm
> +
> +Also, you can filter kmalloc or kmem_cache traces
> +using either --malloc, or --cache:
> +
> +    $ ./trace_analyze.py -k linux/ -f boot_kmem.log -r kmallocs.png --malloc
> +
> +2.5. Pitfall: wrongly reported allocation (and how to fix it)
> +
> +There are a number of functions (kstrdup, kmemdup, krealloc, etc) that do
> +some kind of allocation on behalf of its caller.
> +
> +Of course, we don't want to get trace reports from these functions,
> +but rather from its caller. To acomplish this, we must use a variant
> +of kmalloc, called kmalloc_track_caller, which does exactly that.
> +
> +Let's see an example. As of today kvasprintf() implementation looks
> +like this
> +
> +    (see lib/kasprintf.c:14)
> +    char *kvasprintf(gfp_t gfp, const char *fmt, va_list ap)
> +    {
> +	   /* code removed */
> +	   p = kmalloc(len+1, gfp);
> +
> +And trace_analyze produces the account file
> +
> +    total    waste      net alloc/free  caller
> +    ---------------------------------------------
> +    2161     1184     2161   148/0     kvasprintf
> +
> +The source of this 148 allocations may be a single caller,
> +or it may be multiple callers. We just can't know.
> +However, if we replace kmalloc with kmalloc_track_caller,
> +we're going to find that out.
> +
> +    char *kvasprintf(gfp_t gfp, const char *fmt, va_list ap)
> +    {
> +           /* code removed */
> +           p = kmalloc_track_caller(len+1, gfp);
> +
> +After running the re-built kernel, and comparing both current
> +and previous account files, we find this is the real caller:
> +
> +    total    waste      net alloc/free  caller
> +    ---------------------------------------------
> +    2161     1184     2161   148/0     kobject_set_name_vargs
> +
> +So, we've accurately tracked this allocation down to the kobject code.
> +
> +3. Using a DB file to speed-up multiple runs
> +--------------------------------------------
> +
> +You may find yourself analyzing a large kmem log file.
> +Probably, you want to run the script
> +several times to get different kinds of results.
> +
> +The script is not very clever and will re-read the
> +long kmem file on each run.
> +To alleviate this problem you can have trace_analyze.py
> +create a so-called DB file,and use this file instead
> +of the kmem log file on subsequent runs.
> +
> +This is done using the --save-db and --db-file parameters.
> +Like this:
> +
> +    $ ./trace_analyze.py -k ../torvalds/ -f kmem.log --save-db db
> +
> +Notice you should create the DB file without any filters,
> +like --malloc or --start-branch, in order to save the full kmem event log.
> +
> +Once you have the **db** file created, you would use it on each run
> +
> +    $ ./trace_analyze.py -k ../torvalds/ --db-file db \
> +      -r rings.png -c account.txt
> +
> +Hopefully, this would prevent you from cursing trace_analyze for being so slow.
> +
> +"""
> +
> +import sys
> +import string
> +import re
> +import subprocess
> +import math
> +import pickle
> +import os
> +from optparse import OptionParser
> +
> +# Skip this directories when walking kernel build
> +BLACKLIST = ("scripts", "tools")
> +
> +class Ptr:
> +    def __init__(self, fun, ptr, alloc, req):
> +        self.fun = fun
> +        self.ptr = ptr
> +        self.alloc = alloc
> +        self.req = req
> +
> +
> +class Callsite:
> +    def __init__(self):
> +        self.__alloc = 0
> +        self.__req = 0
> +        self.__alloc_count = 0
> +        self.__free_count = 0
> +        self.ptrs = []
> +
> +    def total_dynamic(self):
> +        return self.__alloc
> +
> +    def alloc_count(self):
> +        return self.__alloc_count
> +
> +    def free_count(self):
> +        return self.__free_count
> +
> +    def current_dynamic(self):
> +        alloc = 0
> +        for ptr in self.ptrs:
> +            alloc += ptr.alloc
> +        return alloc
> +
> +    def current_req(self):
> +        req = 0
> +        for ptr in self.ptrs:
> +            req += ptr.req
> +        return req
> +
> +    def waste(self):
> +        return self.current_dynamic() - self.current_req()
> +
> +    def do_alloc(self, alloc, req, ptr):
> +        self.__alloc += alloc
> +        self.__req += req
> +        self.__alloc_count += 1
> +        self.ptrs.append(ptr)
> +
> +    def do_free(self, ptr):
> +        self.__free_count += 1
> +        self.ptrs.remove(ptr)
> +
> +
> +# Based on addr2sym.py
> +class SymbolMap:
> +    def __init__(self, filemap):
> +        self.fmap = {}
> +        self.flist = []
> +        self.cache = {}
> +
> +        try:
> +            f = open(filemap)
> +        except:
> +            print "[ERROR] Cannot read symbol map file {}".format(filemap)
> +            sys.exit(1)
> +
> +        for line in f.readlines():
> +            (addr_str, symtype, name) = string.split(line, None, 3)
> +            self.fmap[addr_str] = name
> +            addr = eval("0x" + addr_str + "L")
> +            self.flist.append((addr, name))
> +
> +        f.close()
> +
> +    def lookup(self, addr_str):
> +
> +        # return a tuple (string, offset) for a given address
> +        if addr_str in self.fmap:
> +            return (self.fmap[addr_str],0)
> +
> +        # convert address from string to number
> +        addr = eval("0x" + addr_str + "L")
> +        if addr in self.cache:
> +            return self.cache[addr]
> +
> +        # if address is outside range of addresses in the
> +        # map file, just return the address without converting it
> +        if addr < self.flist[0][0] or addr > self.flist[-1][0]:
> +            return (addr_str,0)
> +
> +        # no exact match found, now do binary search for closest function
> +        # do a binary search in funclist for the function
> +        # use a collapsing range to find the closest addr
> +        lower = 0
> +        upper = len(self.flist)-1
> +        while (lower != upper-1):
> +            guess_index = lower + (upper-lower)/2
> +            guess_addr = self.flist[guess_index][0]
> +            if addr < guess_addr:
> +                upper = guess_index
> +            if addr >= guess_addr:
> +                lower = guess_index
> +
> +        offset = hex(addr-self.flist[lower][0])
> +        name = self.flist[lower][1]
> +        if name.startswith("."):
> +            name = name[1:]
> +        self.cache[addr] = (name, offset)
> +        return (name, offset)
> +
> +
> +class EventDB:
> +    def __init__(self):
> +        self.f = {}
> +        self.p = {}
> +        self.num_allocs = 0
> +        self.total_dynamic = 0
> +        self.total_req = 0
> +        self.num_frees = 0
> +        self.num_lost_frees = 0
> +
> +    def slurp(self, path, buildpath, do_malloc, do_cache):
> +        print "Reading symbol map at {}".format(buildpath)
> +        sym = SymbolMap(buildpath + "/System.map")
> +
> +        try:
> +            logfile = open(path)
> +        except:
> +            print "[ERROR] Cannot read log file {}".format(path)
> +            sys.exit(1)
> +
> +        kmalloc_re = r".*kmalloc.*call_site=([a-f0-9]+).*ptr=([a-f0-9]+).*bytes_req=([0-9]+)\s*bytes_alloc=([0-9]+)"
> +        kfree_re = r".*kfree.*call_site=[a-f0-9+]+.*ptr=([a-f0-9]+)"
> +        cache_alloc_re = r".*cache_alloc.*call_site=([a-f0-9]+).*ptr=([a-f0-9]+).*bytes_req=([0-9]+)\s*bytes_alloc=([0-9]+)"
> +        cache_free_re = r".*cache_free.*call_site=[a-f0-9+]+.*ptr=([a-f0-9]+)"
> +        both_alloc_re = r".*k.*alloc.*call_site=([a-f0-9]+).*ptr=([a-f0-9]+).*bytes_req=([0-9]+)\s*bytes_alloc=([0-9]+)"
> +        both_free_re = r".*k.*free.*call_site=[a-f0-9+]+.*ptr=([a-f0-9]+)"
> +
> +        if do_malloc is True and do_cache is None:
> +            print "Slurping event log, kmalloc events only"
> +            alloc_re = kmalloc_re
> +            free_re = kfree_re
> +        elif do_malloc is None and do_cache is True:
> +            print "Slurping event log, kmem_cache events only"
> +            alloc_re = cache_alloc_re
> +            free_re = cache_free_re
> +        else:
> +            print "Slurping event log"
> +            alloc_re = both_alloc_re
> +            free_re = both_free_re
> +
> +        for line in logfile:
> +            m = re.match(alloc_re, line)
> +            if m:
> +                (fun, offset) = sym.lookup(m.group(1))
> +                self.add_malloc("{}+{}".format(fun, offset),
> +                                  m.group(2),
> +                                  int(m.group(3)),
> +                                  int(m.group(4)), line)
> +
> +            m = re.match(free_re, line)
> +            if m:
> +                self.add_free(m.group(1))
> +
> +    def get_bytes(self):
> +        alloc = 0
> +        req = 0
> +        for fun, callsite in self.f.items():
> +            alloc += callsite.current_dynamic()
> +            req += callsite.current_req()
> +        return (alloc, req)
> +
> +    def add_malloc(self, fun, ptr, req, alloc, line):
> +        self.num_allocs += 1
> +        self.total_dynamic += alloc
> +        self.total_req += req
> +
> +        ptr_obj = Ptr(fun, ptr, alloc, req)
> +
> +        if ptr in self.p:
> +            print("[WARNING] Duplicate pointer! {}".format(line))
> +
> +        self.p[ptr] = ptr_obj
> +
> +        if not fun in self.f:
> +            self.f[fun] = Callsite()
> +
> +        self.f[fun].do_alloc(alloc, req, ptr_obj)
> +
> +    def add_free(self, ptr):
> +        self.num_frees += 1
> +
> +        if not ptr in self.p:
> +            self.num_lost_frees += 1
> +            return
> +
> +        ptr_obj = self.p[ptr]
> +
> +        self.f[ptr_obj.fun].do_free(ptr_obj)
> +
> +        # Remove it from pointers dictionary
> +        del self.p[ptr]
> +
> +    def print_callers(self, filepath, filter_tree=None):
> +
> +        if filter_tree is None:
> +            filter_symbol = lambda f: True
> +            get_symbol_dir = lambda f: ""
> +        else:
> +            filter_symbol = filter_tree.symbol_is_here
> +            get_symbol_dir = filter_tree.get_symbol_dir
> +
> +        syms = [(f,c) for f,c in self.f.items() if filter_symbol(f)]
> +
> +        f = open(filepath, 'w')
> +
> +        for name, c in syms:
> +
> +            symdir = get_symbol_dir(name)
> +            f.write("{:<60} {:<8} {:<8} {:<8}\n".format(name,
> +                                           c.current_dynamic(),
> +                                           c.waste(),
> +                                           symdir))
> +
> +        f.close()
> +
> +    def print_account(self, filepath, order_by, filter_tree=None):
> +
> +        current_dynamic = 0
> +        current_req = 0
> +        alloc_count = 0
> +        free_count = 0
> +
> +        if filter_tree is None:
> +            filter_symbol = lambda f: True
> +        else:
> +            filter_symbol = filter_tree.symbol_is_here
> +
> +        syms = [(f,c) for f,c in self.f.items() if filter_symbol(f)]
> +
> +        f = open(filepath, 'w')
> +
> +        for fun, callsite in syms:
> +            current_dynamic += callsite.current_dynamic()
> +            current_req += callsite.current_req()
> +            alloc_count += callsite.alloc_count()
> +            free_count += callsite.free_count()
> +
> +        f.write("current bytes allocated: {:>10}\n".format(current_dynamic))
> +        f.write("current bytes requested: {:>10}\n".format(current_req))
> +        f.write("current wasted bytes:    {:>10}\n".format((current_dynamic -
> +                                                         current_req)))
> +        f.write("number of allocs:        {:>10}\n".format(alloc_count))
> +        f.write("number of frees:         {:>10}\n".format(free_count))
> +        f.write("number of callers:       {:>10}\n".format(len(syms)))
> +        f.write("\n")
> +        f.write("   total    waste      net alloc/free  caller\n")
> +        f.write("---------------------------------------------\n")
> +
> +        for fun, callsite in sorted(syms,
> +                                    key=lambda item: getattr(item[1],
> +                                                             order_by)(),
> +                                    reverse=True):
> +
> +            f.write("%8d %8d %8d %5d/%-5d %s\n" % (callsite.total_dynamic(),
> +                                               callsite.waste(),
> +                                               callsite.current_dynamic(),
> +                                               callsite.alloc_count(),
> +                                               callsite.free_count(),
> +                                               fun))
> +
> +        f.close()
> +
> +
> +class MemTreeNodeSize:
> +    def __init__(self, node):
> +        self.__static = 0
> +        self.__total_dynamic = 0
> +        self.__current_dynamic = 0
> +        self.__waste = 0
> +
> +        # First for my symbols
> +        for sym, size in node.data.items():
> +            self.__static += size
> +        for sym, size in node.text.items():
> +            self.__static += size
> +        for sym, call in node.funcs.items():
> +            self.__total_dynamic += call.total_dynamic()
> +            self.__current_dynamic += call.current_dynamic()
> +            self.__waste += call.current_dynamic() - call.current_req()
> +
> +        # Now, for my children's symbols.
> +        # Or, instead, we could first add all my children's
> +        # symbols here and then get the node size.
> +        for name, child in node.childs.items():
> +            self.__total_dynamic += child.size().total_dynamic()
> +            self.__current_dynamic += child.size().current_dynamic()
> +            self.__static += child.size().static()
> +            self.__waste += child.size().waste()
> +
> +    def current(self):
> +        return self.__static + self.__current_dynamic
> +
> +    def waste(self):
> +        return self.__waste
> +
> +    def static(self):
> +        return self.__static
> +
> +    def current_dynamic(self):
> +        return self.__current_dynamic
> +
> +    def total_dynamic(self):
> +        return self.__total_dynamic
> +
> +
> +class MemTreeNode:
> +    def __init__(self, name="", parent=None, db=None):
> +        self.name = name
> +        self.parent = parent
> +        self.childs = {}
> +        self.funcs = {}
> +        self.data = {}
> +        self.text = {}
> +        self.node_size = None
> +        self.fill = getattr(self, "fill_per_file")
> +
> +        # If db is None, use parent db
> +        if db is None:
> +            if parent is not None:
> +                self.db = parent.db
> +        else:
> +            self.db = db
> +
> +    def get_symbol_dir(self, symbol):
> +        if symbol in self.funcs:
> +            return self.full_name()
> +        else:
> +            for name, child in self.childs.items():
> +                symdir = child.get_symbol_dir(symbol)
> +                if symdir is not None:
> +                    return symdir
> +        return None
> +
> +    def symbol_is_here(self, symbol):
> +        if symbol in self.funcs:
> +            return True
> +        else:
> +            for name, child in self.childs.items():
> +                if child.symbol_is_here(symbol):
> +                    return True
> +        return False
> +
> +    def full_name(self):
> +        l = [self.name,]
> +        parent = self.parent
> +        while parent:
> +            if parent.name != "":
> +                l.append(parent.name)
> +            parent = parent.parent
> +
> +        return "/".join(reversed(l))
> +
> +    def size(self):
> +        if self.node_size is None:
> +            self.node_size = MemTreeNodeSize(self)
> +        return self.node_size
> +
> +    def __collapse(self):
> +        # Collapse one-child empty nodes
> +        for name, child in self.childs.items():
> +            if len(child.childs) > 2:
> +                child.__collapse()
> +
> +            if len(child.childs) == 1 and not child.funcs and not child.data:
> +                # Remove from child
> +                (k, v) = child.childs.items()[0]
> +                del child.childs[k]
> +
> +                # Add here
> +                self.childs[k] = v
> +                v.parent = self
> +
> +    def __strip(self):
> +        # Remove empty nodes
> +        for name, child in self.childs.items():
> +            if child.childs:
> +                child.__strip()
> +            if not child.funcs and not child.data and not child.childs:
> +                del self.childs[name]
> +
> +    def __get_root(self):
> +        if len(self.childs) == 1:
> +            child = self.childs.itervalues().next()
> +            # This is a pedantic test, the first node with
> +            # multiple childs is the root we're searching
> +            if not child.name.endswith(".o"):
> +                return child.__get_root()
> +
> +        return self
> +
> +    # Obtain a clean tree.
> +    # We do it this way because collapse() and strip() must be called
> +    # in an ordered fashion.
> +    def get_clean(self):
> +        self.__collapse()
> +        self.__strip()
> +        return self.__get_root()
> +
> +    def find_first_branch(self, which):
> +        if self.name == which:
> +            return self
> +
> +        for name, node in self.childs.items():
> +            if which == name:
> +                return node
> +
> +        for name, node in self.childs.items():
> +            return node.find_first_branch(which)
> +
> +        print("[WARNING] Can't find first branch '{}'".format(which))
> +        return None
> +
> +    # This are for debug purposes, move along
> +    def treelike(self, level=0, attr="current_dynamic"):
> +        str = ""
> +        str += "{}\n".format(self.name)
> +        for name, node in self.childs.items():
> +            child_str = node.treelike(level+1, attr)
> +            if child_str:
> +                str += "{}{}".format("  "*(level+1), child_str)
> +        return str
> +
> +    def treelike2(self, level=0, attr="current_dynamic"):
> +        str = ""
> +
> +        attr_val = getattr(self.size(), attr)()
> +
> +        if self.name and attr_val != 0:
> +            str += "{} - {}={}\n".format(self.name, attr, attr_val)
> +
> +        for name, node in self.childs.items():
> +            child_str = node.treelike2(level+1, attr)
> +            if child_str:
> +                str += "{}{}".format("  "*(level+1), child_str)
> +        return str
> +
> +    def fill_per_file(self, path):
> +
> +        filepath = "{}{}/{}".format(MemTreeNode.abs_slash, self.full_name(), path)
> +
> +        if path not in self.childs:
> +            self.childs[path] = MemTreeNode(path, self)
> +
> +        child = self.childs[path]
> +
> +        output = []
> +        try:
> +            p1 = subprocess.Popen(["readelf", "--wide", "-s", filepath], stdout=subprocess.PIPE)
> +            output = p1.communicate()[0].split("\n")
> +        except:
> +            pass
> +
> +        for line in output:
> +            if line == '':
> +                continue
> +
> +            m = re.match(r".*\s([0-9]+)\sFUNC.*\s+([a-zA-Z0-9_\.]+)\b", line)
> +            if m:
> +                if m.group(2) in child.text:
> +                    print "Duplicate text entry! {}".format(m.group(2))
> +                child.text[m.group(2)] = int(m.group(1))
> +
> +                # Search every callsite in db matching this name
> +                for name, callsite in child.db.f.iteritems():
> +                    if name.startswith(m.group(2)):
> +                        child.funcs[name] = callsite
> +
> +            m = re.match(r".*\s([0-9]+)\sOBJECT.*\s+([a-zA-Z0-9_\.]+)\b", line)
> +            if m:
> +                if m.group(2) in child.data:
> +                    print "[WARNING] Duplicate data entry! {}".format(m.group(2))
> +                child.data[m.group(2)] = int(m.group(1))
> +
> +    # This is deprecated, fill_per_file should be used instead.
> +    # I keep it here just to have the code handy.
> +    def fill_per_dir(self, path):
> +
> +        if self.funcs or self.data:
> +            print "[WARNING] Oooops, already filled"
> +
> +        filepath = "." + self.full_name() + "/built-in.o"
> +
> +        output = []
> +        try:
> +            p1 = subprocess.Popen(["readelf", "--wide", "-s", filepath], stdout=subprocess.PIPE)
> +            output = p1.communicate()[0].split("\n")
> +        except:
> +            pass
> +
> +        for line in output:
> +            if line == '':
> +                continue
> +            m = re.match(r".*FUNC.*\b([a-zA-Z0-9_]+)\b", line)
> +            if m:
> +                if m.group(1) in self.funcs:
> +                    print "[WARNING] Duplicate entry! {}".format(m.group(1))
> +
> +                if m.group(1) in self.db.f:
> +                    self.funcs[m.group(1)] = self.db.f[m.group(1)]
> +
> +            m = re.match(r".*([0-9]+)\sOBJECT.*\b([a-zA-Z0-9_]+)\b", line)
> +            if m:
> +                self.data[m.group(2)] = int(m.group(1))
> +
> +    # path is should be an object file, like fs/ext2/inode.o
> +    def add_child(self, path):
> +        # adding a child invalidates node_size object
> +        self.node_size = None
> +
> +        parts = path.split('/', 1)
> +        if len(parts) == 1:
> +            self.fill(path)
> +            pass
> +        else:
> +            node, others = parts
> +            if node not in self.childs:
> +                self.childs[node] = MemTreeNode(node, self)
> +            self.childs[node].add_child(others)
> +
> +    def add_path(self, path):
> +        for root, dirs, files in os.walk(path):
> +
> +            blacklisted = False
> +            for bdir in BLACKLIST:
> +                if root.startswith("{}/{}".format(path, bdir)):
> +                    blacklisted = True
> +
> +            if blacklisted:
> +                continue
> +
> +            for filepath in [os.path.join(root,f) for f in files]:
> +                if filepath.endswith("built-in.o"):
> +                    continue
> +                if filepath.endswith("vmlinux.o"):
> +                    continue
> +                if filepath.endswith(".o"):
> +                    # We need to check if this object file,
> +                    # has a corresponding source file
> +                    filesrc = "{}.c".format(os.path.splitext(filepath)[0])
> +                    if os.path.exists(filesrc):
> +                        self.add_child(filepath)
> +
> +
> +
> +##########################################################################
> +##
> +## Main
> +##
> +##########################################################################
> +
> +def main():
> +
> +    parser = OptionParser()
> +    parser.add_option("-k", "--kernel",
> +                      dest="buildpath",
> +                      default="",
> +                      help="path to built kernel tree")
> +
> +    parser.add_option("-f", "--file",
> +                      dest="file",
> +                      default="",
> +                      help="trace log file to analyze")
> +
> +    parser.add_option("--db-file",
> +                      dest="db_file",
> +                      default="",
> +                      help="use db_file as DB instead of creating one")
> +
> +    parser.add_option("--save-db",
> +                      dest="save_db_file",
> +                      default="",
> +                      help="save a db_file to use as DB")
> +
> +    parser.add_option("-b", "--start-branch",
> +                      dest="start_branch",
> +                      default="",
> +                      help="first directory name to use as ringchart root")
> +
> +    parser.add_option("-r", "--rings-file",
> +                      dest="rings_file",
> +                      default="",
> +                      help="plot ringchart information")
> +
> +    parser.add_option("-i", "--rings-show",
> +                      dest="rings_show",
> +                      action="store_true",
> +                      help="show interactive ringchart")
> +
> +    parser.add_option("-a", "--rings-attr",
> +                      dest="rings_attr",
> +                      default="current_dynamic",
> +                      help="attribute to visualize [static, current, \
> +                                    current_dynamic, total_dynamic, waste]")
> +
> +    parser.add_option("--malloc",
> +                      dest="do_malloc",
> +                      action="store_true",
> +                      help="trace kmalloc/kfree only")
> +
> +    parser.add_option("--cache",
> +                      dest="do_cache",
> +                      action="store_true",
> +                      help="trace kmem_cache_alloc/kmem_cache_free only")
> +
> +    parser.add_option("-c", "--account-file",
> +                      dest="account_file",
> +                      default="",
> +                      help="show output matching slab_account output")
> +
> +    parser.add_option("-l", "--callers-file",
> +                      dest="callers_file",
> +                      default="",
> +                      help="show callers file suitable for ringchart generation")
> +
> +    parser.add_option("-o", "--order-by",
> +                      dest="order_by",
> +                      default="current_dynamic",
> +                      help="attribute to order account \
> +                            [current_dynamic, total_dynamic, alloc_count, free_count, waste]")
> +
> +
> +    (opts, args) = parser.parse_args()
> +
> +    # Kernel build path is a mandatory parameter.
> +    # We need to look at compiled objects and also for System.map.
> +    if len(opts.db_file) == 0 and len(opts.buildpath) == 0:
> +        print "Please set a kernel build path or a DB file!"
> +        parser.print_help()
> +        return
> +
> +    # Check valid options
> +    if len(opts.order_by) > 0:
> +        if opts.order_by not in dir(Callsite):
> +            print "Hey! {} is not a valid --order-by option".format(opts.order_by)
> +            parser.print_help()
> +            return
> +
> +    if len(opts.rings_attr) > 0:
> +        if opts.rings_attr not in dir(MemTreeNodeSize):
> +            print "Hey! {} is not a valid --rings-attr option".format(opts.rings_attr)
> +            parser.print_help()
> +            return
> +
> +    # Clean user provided kernel path from dirty slashes
> +    buildpath = opts.buildpath.rstrip("/")
> +
> +    # If we don't have a trace log file,
> +    # and we don't have a DB file
> +    # then we'll fallback to static report mode.
> +    if len(opts.db_file) == 0 and len(opts.file) == 0:
> +        print "No trace log file or DB file specified: will report on static size only"
> +        opts.rings_attr = "static"
> +        opts.do_malloc = False
> +        opts.do_cache = False
> +        opts.account_file = ""
> +        opts.just_static = True
> +        # Set some default
> +        if len(opts.rings_file) == 0:
> +            opts.rings_file = "rings_static.png"
> +    else:
> +        opts.just_static = False
> +
> +    if opts.rings_show is None:
> +        opts.rings_show = False
> +
> +    rootDB = EventDB()
> +    # Get root database, if need to
> +    if not opts.just_static:
> +        if len(opts.db_file) != 0:
> +            print "Using db file '{}'".format(opts.db_file)
> +            f = open(opts.db_file)
> +            buildpath = pickle.load(f)
> +            rootDB = pickle.load(f)
> +            f.close()
> +        else:
> +            rootDB.slurp(opts.file, buildpath, opts.do_malloc, opts.do_cache)
> +
> +            if len (opts.save_db_file) != 0:
> +                print "Saving db file at '{}'".format(opts.save_db_file)
> +                f = open(opts.save_db_file, 'w')
> +                pickle.dump(buildpath,f)
> +                pickle.dump(rootDB, f)
> +                f.close()
> +
> +    if len(opts.callers_file) == 0 and \
> +       len(opts.account_file) == 0 and \
> +       len(opts.rings_file) == 0:
> +            sys.exit(0)
> +
> +    root_path = "{}/{}".format(buildpath, opts.start_branch).rstrip("/")
> +
> +    print "Creating tree from compiled symbols at '{}'".format(root_path)
> +
> +    # We need to specify if user provided buildpath is absolute
> +    MemTreeNode.abs_slash = buildpath.startswith("/") and "/" or ""
> +
> +    tree = MemTreeNode(db = rootDB)
> +    tree.add_path(root_path)
> +
> +    print "Cleaning tree"
> +    tree = tree.get_clean()
> +
> +    # DEBUG--ONLY. Should we add an option for this?
> +    #print(tree.treelike2(attr = opts.rings_attr))
> +    if len(opts.callers_file) != 0:
> +        print "Creating callers file at '{}'".format(opts.callers_file)
> +        rootDB.print_callers(opts.callers_file,
> +                             tree)
> +
> +
> +    if len(opts.account_file) != 0:
> +        print "Creating account file at '{}'".format(opts.account_file)
> +        rootDB.print_account(opts.account_file,
> +                             opts.order_by,
> +                             tree)
> +
> +    if len(opts.rings_file) != 0:
> +        if tree is None:
> +            print "Sorry, there is nothing to plot for branch '{}'".format(opts.start_branch)
> +        else:
> +            print "Creating ringchart for attribute '{}'".format(opts.rings_attr)
> +            visualize_mem_tree(tree, opts.rings_attr, opts.rings_file, opts.rings_show)
> +
> +
> +##########################################################################
> +##
> +## Visualization stuff
> +##
> +##########################################################################
> +
> +
> +CENTER_X = 1.0
> +CENTER_Y = 1.0
> +WIDTH = 0.2
> +tango_colors = ['#ef2929',
> +        '#ad7fa8',
> +        '#729fcf',
> +        '#8ae234',
> +        '#e9b96e',
> +        '#fcaf3e',]
> +
> +
> +def human_bytes(bytes, precision=1):
> +    """Return a humanized string representation of a number of bytes.
> +
> +    Assumes `from __future__ import division`.
> +
> +    >>> humanize_bytes(1)
> +    '1 byte'
> +    >>> humanize_bytes(1024)
> +    '1.0 kB'
> +    >>> humanize_bytes(1024*123)
> +    '123.0 kB'
> +    >>> humanize_bytes(1024*12342)
> +    '12.1 MB'
> +    >>> humanize_bytes(1024*12342,2)
> +    '12.05 MB'
> +    >>> humanize_bytes(1024*1234,2)
> +    '1.21 MB'
> +    >>> humanize_bytes(1024*1234*1111,2)
> +    '1.31 GB'
> +    >>> humanize_bytes(1024*1234*1111,1)
> +    '1.3 GB'
> +    """
> +    abbrevs = (
> +        (1<<50L, 'PB'),
> +        (1<<40L, 'TB'),
> +        (1<<30L, 'GB'),
> +        (1<<20L, 'MB'),
> +        (1<<10L, 'kB'),
> +        (1, 'bytes')
> +    )
> +    if bytes == 1:
> +        return '1 byte'
> +    for factor, suffix in abbrevs:
> +        if bytes >= factor:
> +            break
> +    return '{0:.{1}f} {2}'.format(float(bytes)/factor, precision, suffix)
> +
> +
> +class Section:
> +    def __init__(self, node, size, total_size, total_angle, start_angle):
> +        self.node = node
> +        self.size = size
> +        self.start_angle = start_angle
> +        self.angle = size * total_angle / total_size
> +
> +
> +def ring_color(start_angle, level):
> +    from matplotlib.colors import colorConverter
> +
> +    # f:      [1 - 0.26]
> +    # rel:    [0 - 198]
> +    # icolor: [0 - 5]
> +
> +    if level == 1:
> +        return colorConverter.to_rgb('#808080')
> +
> +    f = 1 - (((level-1) * 0.3) / 8)
> +    rel = start_angle / 180. * 99
> +    icolor = int(rel / (100./3))
> +    next_icolor = (icolor + 1) % 6
> +
> +    # Interpolate (?)
> +    color = colorConverter.to_rgb(tango_colors[icolor])
> +    next_color = colorConverter.to_rgb(tango_colors[next_icolor])
> +    p = (rel - icolor * 100./3) / (100./3)
> +
> +    color = [f * (c - p * (c - n)) for c, n in zip(color, next_color)]
> +
> +    return color
> +
> +
> +def create_child_rings(tree, level=2, level_angle=360, start_angle=0, rings=[],
> +         radius=WIDTH, center=(CENTER_X, CENTER_Y), size_attr="static"):
> +
> +    from matplotlib.patches import Wedge
> +
> +    child_size = 0
> +    max_size = getattr(tree.size(), size_attr)()
> +
> +    if len(tree.childs) == 0:
> +        return rings
> +
> +    if max_size == 0:
> +        for name, node in tree.childs.items():
> +            max_size += getattr(node.size(), size_attr)()
> +    if max_size == 0:
> +        return rings
> +
> +    s_angle = start_angle
> +    sections = {}
> +
> +    # Create child wedges
> +    for name, node in tree.childs.items():
> +
> +        size = getattr(node.size(), size_attr)()
> +        s = Section(node, size, max_size, level_angle, s_angle)
> +        sections[name] = s
> +
> +        create_child_rings(node, level+1, s.angle, s_angle, rings, radius, center, size_attr)
> +        s_angle += s.angle
> +        child_size += size
> +
> +    # Just a check
> +    if child_size > max_size:
> +        print "[{}] Ooops, child size is greater than max size".format(name)
> +
> +    for name, section in sections.items():
> +
> +        # Create tuple: (wedge, name)
> +        name = "{} {}".format(name, human_bytes(section.size))
> +        tup = ( Wedge(center,
> +            level * radius,
> +            section.start_angle,
> +            section.start_angle + section.angle,
> +            width=radius,
> +            facecolor=ring_color(section.start_angle, level)),
> +            name)
> +
> +        rings.append(tup)
> +
> +    return rings
> +
> +
> +def visualize_mem_tree(tree, size_attr, filename, show):
> +    import pylab
> +
> +    RING_MIN_WIDTH = 1
> +    TEXT_MIN_WIDTH = 5
> +
> +    rings = create_child_rings(tree, size_attr=size_attr)
> +
> +    fig = pylab.figure()
> +    ax = fig.add_subplot(111)
> +    annotations = []
> +    labels = []
> +
> +    text = "{} {}".format(tree.name,
> +                          human_bytes(getattr(tree.size(), size_attr)()))
> +    ann = ax.annotate(text,
> +                      size=12,
> +                      bbox=dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8),
> +                      xy=(CENTER_X, CENTER_Y), xycoords='data',
> +                      xytext=(CENTER_X, CENTER_Y), textcoords='data')
> +    annotations.append(ann)
> +
> +    for p in rings:
> +        wedge = p[0]
> +
> +        # Skip if too small
> +        if (wedge.theta2 - wedge.theta1) < RING_MIN_WIDTH:
> +            continue
> +
> +        # Add wedge
> +        ax.add_patch(wedge)
> +
> +        # Skip text if too small
> +        if (wedge.theta2 - wedge.theta1) < TEXT_MIN_WIDTH:
> +            continue
> +
> +        theta = math.radians((wedge.theta1 + wedge.theta2) / 2.)
> +        x0 = wedge.center[0] + (wedge.r - wedge.width / 2.) * math.cos(theta)
> +        y0 = wedge.center[1] + (wedge.r - wedge.width / 2.) * math.sin(theta)
> +        x = wedge.center[0] + (0.1 + wedge.r * 1.5 - wedge.width / 2.) * math.cos(theta)
> +        y = wedge.center[1] + (0.1 + wedge.r * 1.5 - wedge.width / 2.) * math.sin(theta)
> +
> +        ax.plot(x0, y0, ".", color="black")
> +
> +        text = p[1]
> +        ann = ax.annotate(text,
> +                    size=12,
> +                    bbox=dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8),
> +                    xy=(x0, y0), xycoords='data',
> +                    xytext=(x, y), textcoords='data',
> +                    arrowprops=dict(arrowstyle="-", connectionstyle="angle3, angleA=0, angleB=90"),)
> +        annotations.append(ann)
> +
> +    (alloc, req) = tree.db.get_bytes()
> +
> +    pylab.axis('off')
> +
> +    if len(filename) != 0:
> +        print("Plotting to file '{}'".format(filename))
> +        pylab.savefig("{}".format(filename),
> +                      bbox_extra_artists=annotations,
> +                      bbox_inches='tight', dpi=300)
> +    if show:
> +        print("Plotting interactive")
> +        pylab.show()
> +
> +
> +##########################################################################
> +
> +if __name__ == "__main__":
> +    main()
> -- 
> 1.7.8.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
  2013-01-23  4:27 ` Minchan Kim
@ 2013-01-23 21:37   ` Ezequiel Garcia
  2013-01-24  5:50     ` Minchan Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Ezequiel Garcia @ 2013-01-23 21:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Ezequiel Garcia, linux-kernel, linux-mm, Tim Bird, Pekka Enberg,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Hi Minchan,

On Wed, Jan 23, 2013 at 1:27 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hi Ezequiel,
>
> On Tue, Jan 22, 2013 at 06:46:58AM -0300, Ezequiel Garcia wrote:
>> From: Ezequiel Garcia <elezegarcia@gmail.com>
>>
>> The purpose of trace_analyze.py tool is to perform static
>> and dynamic memory analysis using a kmem ftrace
>> log file and a built kernel tree.
>>
>> This script and related work has been done on the CEWG/2012 project:
>> "Kernel dynamic memory allocation tracking and reduction"
>> (More info here [1])
>>
>> It produces mainly two kinds of outputs:
>>  * an account-like output, similar to the one given by Perf, example below.
>>  * a ring-char output, examples here [2].
>>
>> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log --account-file account.txt
>> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log -c account.txt
>>
>> This will produce an account file like this:
>>
>>     current bytes allocated:     669696
>>     current bytes requested:     618823
>>     current wasted bytes:         50873
>>     number of allocs:              7649
>>     number of frees:               2563
>>     number of callers:              115
>>
>>      total    waste      net alloc/free  caller
>>     ---------------------------------------------
>>     299200        0   298928  1100/1     alloc_inode+0x4fL
>>     189824        0   140544  1483/385   __d_alloc+0x22L
>>      51904        0    47552   811/68    sysfs_new_dirent+0x4eL
>>     [...]
>>
>> [1] http://elinux.org/Kernel_dynamic_memory_analysis
>> [2] http://elinux.org/Kernel_dynamic_memory_analysis#Current_dynamic_footprint
>
> First of all, Thanks for nice work! It could be very useful for
> embedded side.
>
> Questions.
>
> 1. Can we detect different call path but same function?
>    I mean
>
>         A       C
>          \     /
>           B   D
>            \ /
>             E
>             |
>          kmalloc
>
> In this case, E could be called by A or C. I would like to know the call path.
> It could point out exact culprit of memory hogger.
>

I'm sorry, I'm not following you:
How can I know which caller in the call path is the 'real' responsible
for the allocation?

The only way I can think of achieving something like this is by using
kmalloc_track_caller() instead of kmalloc().
This is done in cases where an allocer is known to alloc memory on
behalf of its caller.

> 2. Does it support alloc_pages family?
>    kmem event trace already supports it. If it supports, maybe we can replace
>    CONFIG_PAGE_OWNER hack.
>

Mmm.. no, it doesn't support alloc_pages and friends, for we found
no reason to do it.
However, it sounds like a nice idea, on a first thought.

I'll review CONFIG_PAGE_OWNER patches and see if I can come up with something.

Meantime, and given this is just a script submission, is there anything
preventing to merge this? We can move it to perf, and/or add it
features, etc. later,
on top of this. Does this make sense?

-- 
    Ezequiel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
  2013-01-23 21:37   ` Ezequiel Garcia
@ 2013-01-24  5:50     ` Minchan Kim
  2013-01-24 17:16       ` Ezequiel Garcia
  0 siblings, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2013-01-24  5:50 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Ezequiel Garcia, linux-kernel, linux-mm, Tim Bird, Pekka Enberg,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Wed, Jan 23, 2013 at 06:37:56PM -0300, Ezequiel Garcia wrote:
> Hi Minchan,
> 
> On Wed, Jan 23, 2013 at 1:27 AM, Minchan Kim <minchan@kernel.org> wrote:
> > Hi Ezequiel,
> >
> > On Tue, Jan 22, 2013 at 06:46:58AM -0300, Ezequiel Garcia wrote:
> >> From: Ezequiel Garcia <elezegarcia@gmail.com>
> >>
> >> The purpose of trace_analyze.py tool is to perform static
> >> and dynamic memory analysis using a kmem ftrace
> >> log file and a built kernel tree.
> >>
> >> This script and related work has been done on the CEWG/2012 project:
> >> "Kernel dynamic memory allocation tracking and reduction"
> >> (More info here [1])
> >>
> >> It produces mainly two kinds of outputs:
> >>  * an account-like output, similar to the one given by Perf, example below.
> >>  * a ring-char output, examples here [2].
> >>
> >> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log --account-file account.txt
> >> $ ./scripts/tracing/trace_analyze.py -k linux -f kmem.log -c account.txt
> >>
> >> This will produce an account file like this:
> >>
> >>     current bytes allocated:     669696
> >>     current bytes requested:     618823
> >>     current wasted bytes:         50873
> >>     number of allocs:              7649
> >>     number of frees:               2563
> >>     number of callers:              115
> >>
> >>      total    waste      net alloc/free  caller
> >>     ---------------------------------------------
> >>     299200        0   298928  1100/1     alloc_inode+0x4fL
> >>     189824        0   140544  1483/385   __d_alloc+0x22L
> >>      51904        0    47552   811/68    sysfs_new_dirent+0x4eL
> >>     [...]
> >>
> >> [1] http://elinux.org/Kernel_dynamic_memory_analysis
> >> [2] http://elinux.org/Kernel_dynamic_memory_analysis#Current_dynamic_footprint
> >
> > First of all, Thanks for nice work! It could be very useful for
> > embedded side.
> >
> > Questions.
> >
> > 1. Can we detect different call path but same function?
> >    I mean
> >
> >         A       C
> >          \     /
> >           B   D
> >            \ /
> >             E
> >             |
> >          kmalloc
> >
> > In this case, E could be called by A or C. I would like to know the call path.
> > It could point out exact culprit of memory hogger.
> >
> 
> I'm sorry, I'm not following you:
> How can I know which caller in the call path is the 'real' responsible
> for the allocation?
> 
> The only way I can think of achieving something like this is by using
> kmalloc_track_caller() instead of kmalloc().
> This is done in cases where an allocer is known to alloc memory on
> behalf of its caller.

I mean following as.

It's a example from page_owner about alloc_pages.
I'm not sure it's good example but it could give my intent.

358 times:
Page allocated via order 1, mask 0x2852d0
 [<ffffffff811654f5>] new_slab+0x2d5/0x370
 [<ffffffff815705a8>] __slab_alloc+0x2bb/0x41c
 [<ffffffff811682ac>] kmem_cache_alloc+0x18c/0x1a0
 [<ffffffff8118ac07>] __d_alloc+0x27/0x180
 [<ffffffff8118b038>] d_alloc+0x28/0x80
 [<ffffffff8117d313>] lookup_dcache+0xa3/0xd0
 [<ffffffff8117d363>] __lookup_hash+0x23/0x50
 [<ffffffff8157076a>] lookup_slow+0x49/0xad

..
..

1 times:
Page allocated via order 1, mask 0x2852d0
 [<ffffffff811654f5>] new_slab+0x2d5/0x370
 [<ffffffff815705a8>] __slab_alloc+0x2bb/0x41c
 [<ffffffff811682ac>] kmem_cache_alloc+0x18c/0x1a0
 [<ffffffff8118ac07>] __d_alloc+0x27/0x180
 [<ffffffff8118b038>] d_alloc+0x28/0x80
 [<ffffffff8117d313>] lookup_dcache+0xa3/0xd0
 [<ffffffff8117d363>] __lookup_hash+0x23/0x50
 [<ffffffff81181126>] lookup_one_len+0xd6/0x130

>From above example, alloc_pages could be called from several path
The one path is lookup_slow and another is lookup_one_len so
I can investigate who asks lookup_slow frequently.

> 
> > 2. Does it support alloc_pages family?
> >    kmem event trace already supports it. If it supports, maybe we can replace
> >    CONFIG_PAGE_OWNER hack.
> >
> 
> Mmm.. no, it doesn't support alloc_pages and friends, for we found
> no reason to do it.
> However, it sounds like a nice idea, on a first thought.
> 
> I'll review CONFIG_PAGE_OWNER patches and see if I can come up with something.

Thanks!

> 
> Meantime, and given this is just a script submission, is there anything
> preventing to merge this? We can move it to perf, and/or add it
> features, etc. later,
> on top of this. Does this make sense?
> 
> -- 
>     Ezequiel
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
  2013-01-24  5:50     ` Minchan Kim
@ 2013-01-24 17:16       ` Ezequiel Garcia
  2013-01-24 23:24         ` Minchan Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Ezequiel Garcia @ 2013-01-24 17:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Ezequiel Garcia, linux-kernel, linux-mm, Tim Bird, Pekka Enberg,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Thu, Jan 24, 2013 at 2:50 AM, Minchan Kim <minchan@kernel.org> wrote:
> On Wed, Jan 23, 2013 at 06:37:56PM -0300, Ezequiel Garcia wrote:
>
>>
>> > 2. Does it support alloc_pages family?
>> >    kmem event trace already supports it. If it supports, maybe we can replace
>> >    CONFIG_PAGE_OWNER hack.
>> >
>>
>> Mmm.. no, it doesn't support alloc_pages and friends, for we found
>> no reason to do it.
>> However, it sounds like a nice idea, on a first thought.
>>
>> I'll review CONFIG_PAGE_OWNER patches and see if I can come up with something.
>
> Thanks!
>

I'm searching CONFIG_PAGE_OWNER patches, but I could only find this one
for v2.6.13:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm1/broken-out/page-owner-tracking-leak-detector.patch

Is there a more recent one?

-- 
    Ezequiel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool
  2013-01-24 17:16       ` Ezequiel Garcia
@ 2013-01-24 23:24         ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2013-01-24 23:24 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Ezequiel Garcia, linux-kernel, linux-mm, Tim Bird, Pekka Enberg,
	Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Thu, Jan 24, 2013 at 02:16:35PM -0300, Ezequiel Garcia wrote:
> On Thu, Jan 24, 2013 at 2:50 AM, Minchan Kim <minchan@kernel.org> wrote:
> > On Wed, Jan 23, 2013 at 06:37:56PM -0300, Ezequiel Garcia wrote:
> >
> >>
> >> > 2. Does it support alloc_pages family?
> >> >    kmem event trace already supports it. If it supports, maybe we can replace
> >> >    CONFIG_PAGE_OWNER hack.
> >> >
> >>
> >> Mmm.. no, it doesn't support alloc_pages and friends, for we found
> >> no reason to do it.
> >> However, it sounds like a nice idea, on a first thought.
> >>
> >> I'll review CONFIG_PAGE_OWNER patches and see if I can come up with something.
> >
> > Thanks!
> >
> 
> I'm searching CONFIG_PAGE_OWNER patches, but I could only find this one
> for v2.6.13:
> 
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm1/broken-out/page-owner-tracking-leak-detector.patch
> 
> Is there a more recent one?

Recently, update version is merged into mmotm. Please, see below.
http://git.cmpxchg.org/?p=linux-mmotm.git;a=blob;f=mm/pageowner.c;h=2238bfe282a934ee78ede1856776c577dfb2e630;hb=1e0902949ce18822bf21b0fd96ed7a7c3ac3dee5

> 
> -- 
>     Ezequiel
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-24 23:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-22  9:46 [RFC/PATCH] scripts/tracing: Add trace_analyze.py tool Ezequiel Garcia
2013-01-22 13:41 ` Pekka Enberg
2013-01-22 16:16   ` Ezequiel Garcia
2013-01-23  4:27 ` Minchan Kim
2013-01-23 21:37   ` Ezequiel Garcia
2013-01-24  5:50     ` Minchan Kim
2013-01-24 17:16       ` Ezequiel Garcia
2013-01-24 23:24         ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).