All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-09-11 22:24 ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-09-11 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: riel, akpm, vdavydov, mhocko, vbabka, mgorman, rostedt

The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the
setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among
the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
---
 scripts/tracing/allocation_postprocess.py | 267 ++++++++++++++++++++++++++++++
 scripts/tracing/setup_alloc_trace.sh      |  88 ++++++++++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 0000000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN        = 1
+DIRECT_RECLAIM_END          = 2
+SHRINK_SLAB_START           = 3
+SHRINK_INACTIVE_LIST        = 5
+TRY_TO_COMPACT              = 6
+COMPACTION_BEGIN            = 7
+COMPACTION_END              = 8
+
+SECS_TO_US                  = 1000000
+US_TO_MS                    = 1000
+
+# Parse command line arguments
+parser = argparse.ArgumentParser(description='Parser for latency analyzer')
+
+parser.add_argument('-s', '--source', action='store',
+                    default='/sys/kernel/debug/tracing',
+                    dest='source_path',
+                    help='Specify source file to read trace output')
+parser.add_argument('-t', '--threshold', action='store', default=0.0,
+                    dest='threshold', type=float)
+args = parser.parse_args()
+
+source_path = ''.join((args.source_path,'/trace_pipe'))
+threshold = args.threshold
+
+# Regexes
+line_pattern = re.compile(r'\s*(\d+\.\d+)\s+\|\s+\d*\)*\s+([\w-]+)\s+\|\s+.*\s+(\d*\.*\d*)\s+[us]{0,2}\s+\|\s+(.*)')
+tracepoint_pattern = re.compile(r'\/\*\s*([\w]*):\s*(.*)\s*\*\/')
+shrinker_pattern = re.compile(r'\s*([\w]*)\+(.*)\s*')
+function_end_pattern = re.compile(r'.*\/\*\s*([\w]*)\s*\*\/')
+
+# The dictionary which holds tracepoint information for all processes
+all_information = defaultdict(dict)
+
+# The dictionary which holds shrinker latencies
+shrinker_latencies = defaultdict(float)
+shrinker_max_latencies = defaultdict(float)
+
+def print_shrinker_latencies(signum, frame):
+    """ This function prints the time spent in each shrinker, when CTRL+C is 
+    pressed.
+    """
+
+    signal.signal(signal.SIGINT, original_sigint)
+    for key, value in shrinker_latencies.iteritems():
+        print '%s : total time = %f ms, max latency = %f ms' %(key,
+                                        value*US_TO_MS,
+                                        shrinker_max_latencies[key]*US_TO_MS)
+    sys.exit(0)
+
+original_sigint = signal.getsignal(signal.SIGINT)
+signal.signal(signal.SIGINT, print_shrinker_latencies)
+
+
+def set_begin_info(process, EVENT, timestamp, info):
+    """ This function sets information associated with mm_shrink_slab_start.
+    It sets the entire tracepoint info, along with the timestamp, which will
+    be used to calculate latencies when corresponding mm_ shrink_slab_end
+    tracepoints are encountered.
+    """
+
+    per_process_dict = all_information[process]
+    begin_info = {}
+    begin_info["data"] = info
+    begin_info["time"] = timestamp
+    per_process_dict[EVENT] = begin_info
+
+
+def set_trace_info(process, EVENT, info):
+    """ This function sets trace information associated with specific events.
+    """
+
+    per_process_dict = all_information[process]
+    per_process_dict[EVENT] = info
+
+
+def store_max_latency(shrinker_name, latency):
+    """ This function stores the maximum latency encountered in a shrinker. """
+
+    max_latency = shrinker_max_latencies[shrinker_name]
+    if latency > max_latency:
+        shrinker_max_latencies[shrinker_name] = latency
+
+
+def find_latency(process, BEGIN_EVENT, timestamp):
+    """ This function calculates shrinker latencies."""
+
+    per_process_dict = all_information.get(process, None)
+    if per_process_dict:
+        begin_info = per_process_dict.get(BEGIN_EVENT, None)
+        if begin_info:
+            begin_data = begin_info.get("data", None)
+            begin_time = begin_info.get("time", None)
+            if begin_time:
+                time_elapsed = float(timestamp) - float(begin_time)
+                if time_elapsed*SECS_TO_US > threshold:
+                    return (True, begin_data, time_elapsed)
+                return (False, begin_data, time_elapsed)
+    return (False, None, 0.0)
+
+
+def print_line(line_info):
+    print line_info,
+
+
+def print_tracepoint(process, EVENT, info):
+    if info:
+        print info,
+    else:
+        per_process_dict = all_information.get(process, None)
+        TP_info = per_process_dict.get(EVENT, None)
+        if TP_info:
+            print TP_info,
+        per_process_dict.pop(EVENT, None)
+
+with open(source_path) as f:
+    for line in f:
+        line_match = re.match(line_pattern, line)
+        if line_match:
+            timestamp = line_match.group(1)
+            process_info = line_match.group(2)
+            function_match = re.match(function_end_pattern, line_match.group(4))
+            tracepoint_match = re.match(tracepoint_pattern, line_match.group(4))
+            if tracepoint_match:
+                TP_whole = line_match.group(4)
+                TP_name = tracepoint_match.group(1)
+                TP_info = tracepoint_match.group(2)
+
+
+                def call_set_trace_info(EVENT):
+                    set_trace_info(process_info, EVENT, line)
+
+
+                def direct_reclaim_b():
+                    call_set_trace_info(DIRECT_RECLAIM_BEGIN)
+
+
+                def direct_reclaim_e():
+                    call_set_trace_info(DIRECT_RECLAIM_END)
+
+
+                def shrink_inactive_list():
+                    call_set_trace_info(SHRINK_INACTIVE_LIST)
+
+
+                def shrink_slab_b():
+                    set_begin_info(process_info, SHRINK_SLAB_START, timestamp,
+                                    line)
+
+
+                def shrink_slab_e():
+                    delay_status, begin_data, time_elapsed = find_latency(
+                                                                process_info,
+                                                                SHRINK_SLAB_START,
+                                                                timestamp)
+                    shrinker_match = re.match(shrinker_pattern, TP_info)
+                    if shrinker_match:
+                        shrinker_name = shrinker_match.group(1)
+                        shrinker_latencies[shrinker_name] += time_elapsed
+                        store_max_latency(shrinker_name, time_elapsed)
+
+                    if delay_status:
+                        print_tracepoint(process_info,
+                                         SHRINK_SLAB_START,
+                                         begin_data)
+                        print_tracepoint(process_info,
+                                         None,
+                                         line)
+
+
+                def try_to_compact():
+                    call_set_trace_info(TRY_TO_COMPACT)
+
+
+                def compact_b():
+                    call_set_trace_info(COMPACTION_BEGIN)
+
+
+                def compact_e():
+                    call_set_trace_info(COMPACTION_END)
+
+
+                trace_match = {'mm_vmscan_direct_reclaim_begin' : direct_reclaim_b,
+                               'mm_vmscan_direct_reclaim_end'   : direct_reclaim_e,
+                               'mm_shrink_slab_start'           : shrink_slab_b,
+                               'mm_shrink_slab_end'             : shrink_slab_e,
+                               'mm_vmscan_lru_shrink_inactive'  :
+                                                              shrink_inactive_list,
+                               'mm_compaction_try_to_compact_pages':
+                                                              try_to_compact,
+                               'mm_compaction_begin'            : compact_b,
+                               'mm_compaction_end'              : compact_e}
+
+                if TP_name in trace_match:
+                    trace_match[TP_name]()
+                else:
+                    pass
+
+            else:
+                function_match = re.match(function_end_pattern,
+                                          line_match.group(4))
+                if function_match:
+                    function_name = function_match.group(1)
+
+
+                    def alloc_pages():
+                        print_line(line)
+                        all_information.pop(process_info, None)
+
+
+                    def try_to_free_pages():
+                        print_tracepoint(process_info, DIRECT_RECLAIM_BEGIN, None)
+                        print_tracepoint(process_info, DIRECT_RECLAIM_END, None)
+                        print_line(line)
+
+
+                    def shrink_inactive_list():
+                        print_tracepoint(process_info, SHRINK_INACTIVE_LIST, None)
+                        print_line(line)
+
+
+                    def try_to_compact():
+                        print_tracepoint(process_info, TRY_TO_COMPACT, None)
+                        print_line(line)
+
+
+                    def compact_zone():
+                        print_tracepoint(process_info, COMPACTION_BEGIN, None)
+                        print_tracepoint(process_info, COMPACTION_END, None)
+                        print_line(line)
+
+
+                    f_match = {'__alloc_pages_nodemask' : alloc_pages,
+                               'try_to_free_pages'      : try_to_free_pages,
+                               'shrink_inactive_list'   : shrink_inactive_list,
+                               'try_to_compact'         : try_to_compact,
+                               'compact_zone'           : compact_zone}
+
+                    if function_name in f_match:
+                        f_match[function_name]()
+                    else:
+                        print_line(line)
diff --git a/scripts/tracing/setup_alloc_trace.sh b/scripts/tracing/setup_alloc_trace.sh
new file mode 100755
index 0000000..9a558b0
--- /dev/null
+++ b/scripts/tracing/setup_alloc_trace.sh
@@ -0,0 +1,88 @@
+#! /bin/bash
+
+# This script does all the basic setup necessary for allocation_postprocess.py
+# and then invokes the script. All the setup that is done at the beginning
+# is cleared on exit.
+
+# Usage: # ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s
+# path/to/tracing/directory > path/to/output/file.
+
+while getopts :t:s: name
+do
+	case $name in
+		t)threshold=$OPTARG;;
+		s)trace_dir=$OPTARG;;
+		*) echo "Usage: ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s path/to/tracing/directory"
+		esac
+done
+
+if [[ -z $threshold ]]
+then
+	echo "Must specify threshold."
+	exit 1
+fi
+
+if [[ -z $trace_dir ]]
+then
+	trace_dir="/sys/kernel/debug/tracing"
+fi
+
+pwd=`pwd`
+cd $trace_dir
+echo 0 > tracing_on
+
+echo function_graph > current_tracer
+echo funcgraph-proc > trace_options
+echo funcgraph-abstime > trace_options
+
+# set filter functions
+echo __alloc_pages_nodemask > set_ftrace_filter
+echo try_to_free_pages >> set_ftrace_filter
+echo mem_cgroup_soft_limit_reclaim >> set_ftrace_filter
+echo shrink_node >> set_ftrace_filter
+echo shrink_node_memcg >> set_ftrace_filter
+echo shrink_slab >> set_ftrace_filter
+echo shrink_active_list >> set_ftrace_filter
+echo shrink_inactive_list >> set_ftrace_filter
+echo compact_zone >> set_ftrace_filter
+echo try_to_compact_pages >> set_ftrace_filter
+
+echo $threshold > tracing_thresh
+
+# set tracepoints
+echo 1 > events/vmscan/mm_shrink_slab_start/enable
+echo 1 > events/vmscan/mm_shrink_slab_end/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+echo 1 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+echo 1 > events/compaction/mm_compaction_begin/enable
+echo 1 > events/compaction/mm_compaction_end/enable
+echo 1 > events/compaction/mm_compaction_try_to_compact_pages/enable
+echo 1 > tracing_on
+
+cd $pwd
+
+./allocation_postprocess.py -t $threshold -s $trace_dir
+
+function cleanup {
+	cd $trace_dir
+	echo 0 > tracing_on
+	echo nop > current_tracer
+	echo > set_ftrace_filter
+	echo 0 > tracing_thresh
+
+	echo 0 > events/vmscan/mm_shrink_slab_start/enable
+	echo 0 > events/vmscan/mm_shrink_slab_end/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+	echo 0 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+	echo 0 > events/compaction/mm_compaction_begin/enable
+	echo 0 > events/compaction/mm_compaction_end/enable
+	echo 0 > events/compaction/mm_compaction_try_to_compact_pages/enable
+
+	exit $?
+}
+
+trap cleanup SIGINT
+trap cleanup SIGTERM
+trap cleanup EXIT
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-09-11 22:24 ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-09-11 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: riel, akpm, vdavydov, mhocko, vbabka, mgorman, rostedt

The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the
setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among
the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
---
 scripts/tracing/allocation_postprocess.py | 267 ++++++++++++++++++++++++++++++
 scripts/tracing/setup_alloc_trace.sh      |  88 ++++++++++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 0000000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN        = 1
+DIRECT_RECLAIM_END          = 2
+SHRINK_SLAB_START           = 3
+SHRINK_INACTIVE_LIST        = 5
+TRY_TO_COMPACT              = 6
+COMPACTION_BEGIN            = 7
+COMPACTION_END              = 8
+
+SECS_TO_US                  = 1000000
+US_TO_MS                    = 1000
+
+# Parse command line arguments
+parser = argparse.ArgumentParser(description='Parser for latency analyzer')
+
+parser.add_argument('-s', '--source', action='store',
+                    default='/sys/kernel/debug/tracing',
+                    dest='source_path',
+                    help='Specify source file to read trace output')
+parser.add_argument('-t', '--threshold', action='store', default=0.0,
+                    dest='threshold', type=float)
+args = parser.parse_args()
+
+source_path = ''.join((args.source_path,'/trace_pipe'))
+threshold = args.threshold
+
+# Regexes
+line_pattern = re.compile(r'\s*(\d+\.\d+)\s+\|\s+\d*\)*\s+([\w-]+)\s+\|\s+.*\s+(\d*\.*\d*)\s+[us]{0,2}\s+\|\s+(.*)')
+tracepoint_pattern = re.compile(r'\/\*\s*([\w]*):\s*(.*)\s*\*\/')
+shrinker_pattern = re.compile(r'\s*([\w]*)\+(.*)\s*')
+function_end_pattern = re.compile(r'.*\/\*\s*([\w]*)\s*\*\/')
+
+# The dictionary which holds tracepoint information for all processes
+all_information = defaultdict(dict)
+
+# The dictionary which holds shrinker latencies
+shrinker_latencies = defaultdict(float)
+shrinker_max_latencies = defaultdict(float)
+
+def print_shrinker_latencies(signum, frame):
+    """ This function prints the time spent in each shrinker, when CTRL+C is 
+    pressed.
+    """
+
+    signal.signal(signal.SIGINT, original_sigint)
+    for key, value in shrinker_latencies.iteritems():
+        print '%s : total time = %f ms, max latency = %f ms' %(key,
+                                        value*US_TO_MS,
+                                        shrinker_max_latencies[key]*US_TO_MS)
+    sys.exit(0)
+
+original_sigint = signal.getsignal(signal.SIGINT)
+signal.signal(signal.SIGINT, print_shrinker_latencies)
+
+
+def set_begin_info(process, EVENT, timestamp, info):
+    """ This function sets information associated with mm_shrink_slab_start.
+    It sets the entire tracepoint info, along with the timestamp, which will
+    be used to calculate latencies when corresponding mm_ shrink_slab_end
+    tracepoints are encountered.
+    """
+
+    per_process_dict = all_information[process]
+    begin_info = {}
+    begin_info["data"] = info
+    begin_info["time"] = timestamp
+    per_process_dict[EVENT] = begin_info
+
+
+def set_trace_info(process, EVENT, info):
+    """ This function sets trace information associated with specific events.
+    """
+
+    per_process_dict = all_information[process]
+    per_process_dict[EVENT] = info
+
+
+def store_max_latency(shrinker_name, latency):
+    """ This function stores the maximum latency encountered in a shrinker. """
+
+    max_latency = shrinker_max_latencies[shrinker_name]
+    if latency > max_latency:
+        shrinker_max_latencies[shrinker_name] = latency
+
+
+def find_latency(process, BEGIN_EVENT, timestamp):
+    """ This function calculates shrinker latencies."""
+
+    per_process_dict = all_information.get(process, None)
+    if per_process_dict:
+        begin_info = per_process_dict.get(BEGIN_EVENT, None)
+        if begin_info:
+            begin_data = begin_info.get("data", None)
+            begin_time = begin_info.get("time", None)
+            if begin_time:
+                time_elapsed = float(timestamp) - float(begin_time)
+                if time_elapsed*SECS_TO_US > threshold:
+                    return (True, begin_data, time_elapsed)
+                return (False, begin_data, time_elapsed)
+    return (False, None, 0.0)
+
+
+def print_line(line_info):
+    print line_info,
+
+
+def print_tracepoint(process, EVENT, info):
+    if info:
+        print info,
+    else:
+        per_process_dict = all_information.get(process, None)
+        TP_info = per_process_dict.get(EVENT, None)
+        if TP_info:
+            print TP_info,
+        per_process_dict.pop(EVENT, None)
+
+with open(source_path) as f:
+    for line in f:
+        line_match = re.match(line_pattern, line)
+        if line_match:
+            timestamp = line_match.group(1)
+            process_info = line_match.group(2)
+            function_match = re.match(function_end_pattern, line_match.group(4))
+            tracepoint_match = re.match(tracepoint_pattern, line_match.group(4))
+            if tracepoint_match:
+                TP_whole = line_match.group(4)
+                TP_name = tracepoint_match.group(1)
+                TP_info = tracepoint_match.group(2)
+
+
+                def call_set_trace_info(EVENT):
+                    set_trace_info(process_info, EVENT, line)
+
+
+                def direct_reclaim_b():
+                    call_set_trace_info(DIRECT_RECLAIM_BEGIN)
+
+
+                def direct_reclaim_e():
+                    call_set_trace_info(DIRECT_RECLAIM_END)
+
+
+                def shrink_inactive_list():
+                    call_set_trace_info(SHRINK_INACTIVE_LIST)
+
+
+                def shrink_slab_b():
+                    set_begin_info(process_info, SHRINK_SLAB_START, timestamp,
+                                    line)
+
+
+                def shrink_slab_e():
+                    delay_status, begin_data, time_elapsed = find_latency(
+                                                                process_info,
+                                                                SHRINK_SLAB_START,
+                                                                timestamp)
+                    shrinker_match = re.match(shrinker_pattern, TP_info)
+                    if shrinker_match:
+                        shrinker_name = shrinker_match.group(1)
+                        shrinker_latencies[shrinker_name] += time_elapsed
+                        store_max_latency(shrinker_name, time_elapsed)
+
+                    if delay_status:
+                        print_tracepoint(process_info,
+                                         SHRINK_SLAB_START,
+                                         begin_data)
+                        print_tracepoint(process_info,
+                                         None,
+                                         line)
+
+
+                def try_to_compact():
+                    call_set_trace_info(TRY_TO_COMPACT)
+
+
+                def compact_b():
+                    call_set_trace_info(COMPACTION_BEGIN)
+
+
+                def compact_e():
+                    call_set_trace_info(COMPACTION_END)
+
+
+                trace_match = {'mm_vmscan_direct_reclaim_begin' : direct_reclaim_b,
+                               'mm_vmscan_direct_reclaim_end'   : direct_reclaim_e,
+                               'mm_shrink_slab_start'           : shrink_slab_b,
+                               'mm_shrink_slab_end'             : shrink_slab_e,
+                               'mm_vmscan_lru_shrink_inactive'  :
+                                                              shrink_inactive_list,
+                               'mm_compaction_try_to_compact_pages':
+                                                              try_to_compact,
+                               'mm_compaction_begin'            : compact_b,
+                               'mm_compaction_end'              : compact_e}
+
+                if TP_name in trace_match:
+                    trace_match[TP_name]()
+                else:
+                    pass
+
+            else:
+                function_match = re.match(function_end_pattern,
+                                          line_match.group(4))
+                if function_match:
+                    function_name = function_match.group(1)
+
+
+                    def alloc_pages():
+                        print_line(line)
+                        all_information.pop(process_info, None)
+
+
+                    def try_to_free_pages():
+                        print_tracepoint(process_info, DIRECT_RECLAIM_BEGIN, None)
+                        print_tracepoint(process_info, DIRECT_RECLAIM_END, None)
+                        print_line(line)
+
+
+                    def shrink_inactive_list():
+                        print_tracepoint(process_info, SHRINK_INACTIVE_LIST, None)
+                        print_line(line)
+
+
+                    def try_to_compact():
+                        print_tracepoint(process_info, TRY_TO_COMPACT, None)
+                        print_line(line)
+
+
+                    def compact_zone():
+                        print_tracepoint(process_info, COMPACTION_BEGIN, None)
+                        print_tracepoint(process_info, COMPACTION_END, None)
+                        print_line(line)
+
+
+                    f_match = {'__alloc_pages_nodemask' : alloc_pages,
+                               'try_to_free_pages'      : try_to_free_pages,
+                               'shrink_inactive_list'   : shrink_inactive_list,
+                               'try_to_compact'         : try_to_compact,
+                               'compact_zone'           : compact_zone}
+
+                    if function_name in f_match:
+                        f_match[function_name]()
+                    else:
+                        print_line(line)
diff --git a/scripts/tracing/setup_alloc_trace.sh b/scripts/tracing/setup_alloc_trace.sh
new file mode 100755
index 0000000..9a558b0
--- /dev/null
+++ b/scripts/tracing/setup_alloc_trace.sh
@@ -0,0 +1,88 @@
+#! /bin/bash
+
+# This script does all the basic setup necessary for allocation_postprocess.py
+# and then invokes the script. All the setup that is done at the beginning
+# is cleared on exit.
+
+# Usage: # ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s
+# path/to/tracing/directory > path/to/output/file.
+
+while getopts :t:s: name
+do
+	case $name in
+		t)threshold=$OPTARG;;
+		s)trace_dir=$OPTARG;;
+		*) echo "Usage: ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s path/to/tracing/directory"
+		esac
+done
+
+if [[ -z $threshold ]]
+then
+	echo "Must specify threshold."
+	exit 1
+fi
+
+if [[ -z $trace_dir ]]
+then
+	trace_dir="/sys/kernel/debug/tracing"
+fi
+
+pwd=`pwd`
+cd $trace_dir
+echo 0 > tracing_on
+
+echo function_graph > current_tracer
+echo funcgraph-proc > trace_options
+echo funcgraph-abstime > trace_options
+
+# set filter functions
+echo __alloc_pages_nodemask > set_ftrace_filter
+echo try_to_free_pages >> set_ftrace_filter
+echo mem_cgroup_soft_limit_reclaim >> set_ftrace_filter
+echo shrink_node >> set_ftrace_filter
+echo shrink_node_memcg >> set_ftrace_filter
+echo shrink_slab >> set_ftrace_filter
+echo shrink_active_list >> set_ftrace_filter
+echo shrink_inactive_list >> set_ftrace_filter
+echo compact_zone >> set_ftrace_filter
+echo try_to_compact_pages >> set_ftrace_filter
+
+echo $threshold > tracing_thresh
+
+# set tracepoints
+echo 1 > events/vmscan/mm_shrink_slab_start/enable
+echo 1 > events/vmscan/mm_shrink_slab_end/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+echo 1 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+echo 1 > events/compaction/mm_compaction_begin/enable
+echo 1 > events/compaction/mm_compaction_end/enable
+echo 1 > events/compaction/mm_compaction_try_to_compact_pages/enable
+echo 1 > tracing_on
+
+cd $pwd
+
+./allocation_postprocess.py -t $threshold -s $trace_dir
+
+function cleanup {
+	cd $trace_dir
+	echo 0 > tracing_on
+	echo nop > current_tracer
+	echo > set_ftrace_filter
+	echo 0 > tracing_thresh
+
+	echo 0 > events/vmscan/mm_shrink_slab_start/enable
+	echo 0 > events/vmscan/mm_shrink_slab_end/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+	echo 0 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+	echo 0 > events/compaction/mm_compaction_begin/enable
+	echo 0 > events/compaction/mm_compaction_end/enable
+	echo 0 > events/compaction/mm_compaction_try_to_compact_pages/enable
+
+	exit $?
+}
+
+trap cleanup SIGINT
+trap cleanup SIGTERM
+trap cleanup EXIT
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-09-11 22:24 ` Janani Ravichandran
@ 2016-09-12 12:16   ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-12 12:16 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

Hi,
I am sorry I didn't follow up on the previous submission. I find this
_really_ helpful. It is great that you could build on top of existing
tracepoints but one thing is not entirely clear to me. Without a begin
marker in __alloc_pages_nodemask we cannot really tell how long the
whole allocation took, which would be extremely useful. Or do you use
any graph tracer tricks to deduce that? There is a note in your
changelog but I cannot seem to find that in the changelog. And FWIW I
would be open to adding a tracepoint like that. It would make our life
so much easier...

On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
[...]
> allocation_postprocess.py is a script which reads from trace_pipe. It
> does the following to filter out info from tracepoints that may not
> be important:
> 
> 1. Displays mm_vmscan_direct_reclaim_begin and
> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
> exceeded the threshold.
> 2. Displays mm_compaction_begin and mm_compaction_end only when
> compact_zone has exceeded the threshold.
> 3. Displays mm_compaction_try_to_compat_pages only when
> try_to_compact_pages has exceeded the threshold.
> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
> the time elapsed between them exceeds the threshold.
> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
> has exceeded the threshold.
> 
> When CTRL+C is pressed, the script shows the times taken by the
> shrinkers. However, currently it is not possible to differentiate among
> the
> superblock shrinkers.
> 
> Sample output:
> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
> 0.278000 ms
> ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
> scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
> ms

Would it be possible to group those per the context? I mean a single
allocation/per-process drop down values rather than mixing all those
values together? For example if I see that a process is talling due to
direct reclaim I would love to see what is the worst case allocation
stall and what is the most probable source of that stall. Mixing kswapd
traces would be misleading here.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-09-12 12:16   ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-12 12:16 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

Hi,
I am sorry I didn't follow up on the previous submission. I find this
_really_ helpful. It is great that you could build on top of existing
tracepoints but one thing is not entirely clear to me. Without a begin
marker in __alloc_pages_nodemask we cannot really tell how long the
whole allocation took, which would be extremely useful. Or do you use
any graph tracer tricks to deduce that? There is a note in your
changelog but I cannot seem to find that in the changelog. And FWIW I
would be open to adding a tracepoint like that. It would make our life
so much easier...

On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
[...]
> allocation_postprocess.py is a script which reads from trace_pipe. It
> does the following to filter out info from tracepoints that may not
> be important:
> 
> 1. Displays mm_vmscan_direct_reclaim_begin and
> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
> exceeded the threshold.
> 2. Displays mm_compaction_begin and mm_compaction_end only when
> compact_zone has exceeded the threshold.
> 3. Displays mm_compaction_try_to_compat_pages only when
> try_to_compact_pages has exceeded the threshold.
> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
> the time elapsed between them exceeds the threshold.
> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
> has exceeded the threshold.
> 
> When CTRL+C is pressed, the script shows the times taken by the
> shrinkers. However, currently it is not possible to differentiate among
> the
> superblock shrinkers.
> 
> Sample output:
> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
> 0.278000 ms
> ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
> scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
> ms

Would it be possible to group those per the context? I mean a single
allocation/per-process drop down values rather than mixing all those
values together? For example if I see that a process is talling due to
direct reclaim I would love to see what is the worst case allocation
stall and what is the most probable source of that stall. Mixing kswapd
traces would be misleading here.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-09-12 12:16   ` Michal Hocko
@ 2016-09-13 18:04     ` Janani Ravichandran
  -1 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-09-13 18:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm,
	vdavydov, vbabka, mgorman, rostedt


> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> Hi,

Hello Michal,

> I am sorry I didn't follow up on the previous submission.

That’s alright :)

> I find this
> _really_ helpful. It is great that you could build on top of existing
> tracepoints but one thing is not entirely clear to me. Without a begin
> marker in __alloc_pages_nodemask we cannot really tell how long the
> whole allocation took, which would be extremely useful. Or do you use
> any graph tracer tricks to deduce that?

I’m using the function graph tracer to see how long __alloc_pages_nodemask()
took.


> There is a note in your
> changelog but I cannot seem to find that in the changelog. And FWIW I
> would be open to adding a tracepoint like that. It would make our life
> so much easier…

The line
echo __alloc_pages_nodemask > set_ftrace_filter in setup_alloc_trace.sh
sets __alloc_pages_nodemask as a function graph filter and this should help
us observe how long the function took.

> 
> On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
> [...]
>> allocation_postprocess.py is a script which reads from trace_pipe. It
>> does the following to filter out info from tracepoints that may not
>> be important:
>> 
>> 1. Displays mm_vmscan_direct_reclaim_begin and
>> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
>> exceeded the threshold.
>> 2. Displays mm_compaction_begin and mm_compaction_end only when
>> compact_zone has exceeded the threshold.
>> 3. Displays mm_compaction_try_to_compat_pages only when
>> try_to_compact_pages has exceeded the threshold.
>> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
>> the time elapsed between them exceeds the threshold.
>> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
>> has exceeded the threshold.
>> 
>> When CTRL+C is pressed, the script shows the times taken by the
>> shrinkers. However, currently it is not possible to differentiate among
>> the
>> superblock shrinkers.
>> 
>> Sample output:
>> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
>> 0.278000 ms
>> ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
>> scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
>> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
>> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
>> ms
> 
> Would it be possible to group those per the context?

Absolutely!
> I mean a single
> allocation/per-process drop down values rather than mixing all those
> values together? For example if I see that a process is talling due to
> direct reclaim I would love to see what is the worst case allocation
> stall and what is the most probable source of that stall. Mixing kswapd
> traces would be misleading here.
> 

True. I’ll do that and send a v2. Thanks for the suggestions!

Janani
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-09-13 18:04     ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-09-13 18:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm,
	vdavydov, vbabka, mgorman, rostedt


> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> Hi,

Hello Michal,

> I am sorry I didn't follow up on the previous submission.

That’s alright :)

> I find this
> _really_ helpful. It is great that you could build on top of existing
> tracepoints but one thing is not entirely clear to me. Without a begin
> marker in __alloc_pages_nodemask we cannot really tell how long the
> whole allocation took, which would be extremely useful. Or do you use
> any graph tracer tricks to deduce that?

I’m using the function graph tracer to see how long __alloc_pages_nodemask()
took.


> There is a note in your
> changelog but I cannot seem to find that in the changelog. And FWIW I
> would be open to adding a tracepoint like that. It would make our life
> so much easier…

The line
echo __alloc_pages_nodemask > set_ftrace_filter in setup_alloc_trace.sh
sets __alloc_pages_nodemask as a function graph filter and this should help
us observe how long the function took.

> 
> On Sun 11-09-16 18:24:12, Janani Ravichandran wrote:
> [...]
>> allocation_postprocess.py is a script which reads from trace_pipe. It
>> does the following to filter out info from tracepoints that may not
>> be important:
>> 
>> 1. Displays mm_vmscan_direct_reclaim_begin and
>> mm_vmscan_direct_reclaim_end only when try_to_free_pages has
>> exceeded the threshold.
>> 2. Displays mm_compaction_begin and mm_compaction_end only when
>> compact_zone has exceeded the threshold.
>> 3. Displays mm_compaction_try_to_compat_pages only when
>> try_to_compact_pages has exceeded the threshold.
>> 4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
>> the time elapsed between them exceeds the threshold.
>> 5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
>> has exceeded the threshold.
>> 
>> When CTRL+C is pressed, the script shows the times taken by the
>> shrinkers. However, currently it is not possible to differentiate among
>> the
>> superblock shrinkers.
>> 
>> Sample output:
>> ^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
>> 0.278000 ms
>> ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
>> scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
>> super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
>> deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
>> ms
> 
> Would it be possible to group those per the context?

Absolutely!
> I mean a single
> allocation/per-process drop down values rather than mixing all those
> values together? For example if I see that a process is talling due to
> direct reclaim I would love to see what is the worst case allocation
> stall and what is the most probable source of that stall. Mixing kswapd
> traces would be misleading here.
> 

True. I’ll do that and send a v2. Thanks for the suggestions!

Janani
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-09-13 18:04     ` Janani Ravichandran
@ 2016-09-19  9:42       ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-19  9:42 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> 
> > On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > Hi,
> 
> Hello Michal,
> 
> > I am sorry I didn't follow up on the previous submission.
> 
> That’s alright :)
> 
> > I find this
> > _really_ helpful. It is great that you could build on top of existing
> > tracepoints but one thing is not entirely clear to me. Without a begin
> > marker in __alloc_pages_nodemask we cannot really tell how long the
> > whole allocation took, which would be extremely useful. Or do you use
> > any graph tracer tricks to deduce that?
> 
> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
> took.

How can you map the function graph tracer to a specif context? Let's say
I would like to know why a particular allocation took so long. Would
that be possible?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-09-19  9:42       ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-19  9:42 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> 
> > On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > Hi,
> 
> Hello Michal,
> 
> > I am sorry I didn't follow up on the previous submission.
> 
> Thata??s alright :)
> 
> > I find this
> > _really_ helpful. It is great that you could build on top of existing
> > tracepoints but one thing is not entirely clear to me. Without a begin
> > marker in __alloc_pages_nodemask we cannot really tell how long the
> > whole allocation took, which would be extremely useful. Or do you use
> > any graph tracer tricks to deduce that?
> 
> Ia??m using the function graph tracer to see how long __alloc_pages_nodemask()
> took.

How can you map the function graph tracer to a specif context? Let's say
I would like to know why a particular allocation took so long. Would
that be possible?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-09-19  9:42       ` Michal Hocko
@ 2016-09-22 15:30         ` Janani Ravichandran
  -1 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-09-22 15:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm,
	vdavydov, vbabka, mgorman, rostedt


> On Sep 19, 2016, at 5:42 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
>> 
>>> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> 
>> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
>> took.
> 
> How can you map the function graph tracer to a specif context? Let's say
> I would like to know why a particular allocation took so long. Would
> that be possible?

Maybe not. If the latencies are due to direct reclaim or memory compaction, you
get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
mm_compaction_begin, etc). But otherwise, you don’t get any context information. 
Function graph only gives the time spent in alloc_pages_nodemask() in that case.


Regards,
Janani.
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-09-22 15:30         ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-09-22 15:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm,
	vdavydov, vbabka, mgorman, rostedt


> On Sep 19, 2016, at 5:42 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
>> 
>>> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> 
>> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
>> took.
> 
> How can you map the function graph tracer to a specif context? Let's say
> I would like to know why a particular allocation took so long. Would
> that be possible?

Maybe not. If the latencies are due to direct reclaim or memory compaction, you
get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
mm_compaction_begin, etc). But otherwise, you don’t get any context information. 
Function graph only gives the time spent in alloc_pages_nodemask() in that case.


Regards,
Janani.
> 
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-09-22 15:30         ` Janani Ravichandran
@ 2016-09-23  8:07           ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-23  8:07 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> 
> > On Sep 19, 2016, at 5:42 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> >> 
> >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >> 
> >> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
> >> took.
> > 
> > How can you map the function graph tracer to a specif context? Let's say
> > I would like to know why a particular allocation took so long. Would
> > that be possible?
> 
> Maybe not. If the latencies are due to direct reclaim or memory compaction, you
> get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
> mm_compaction_begin, etc). But otherwise, you don’t get any context information. 
> Function graph only gives the time spent in alloc_pages_nodemask() in that case.

Then I really think that we need a starting trace point. I think that
having the full context information is really helpful in order to
understand latencies induced by allocations.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-09-23  8:07           ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-23  8:07 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> 
> > On Sep 19, 2016, at 5:42 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> >> 
> >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >> 
> >> Ia??m using the function graph tracer to see how long __alloc_pages_nodemask()
> >> took.
> > 
> > How can you map the function graph tracer to a specif context? Let's say
> > I would like to know why a particular allocation took so long. Would
> > that be possible?
> 
> Maybe not. If the latencies are due to direct reclaim or memory compaction, you
> get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
> mm_compaction_begin, etc). But otherwise, you dona??t get any context information. 
> Function graph only gives the time spent in alloc_pages_nodemask() in that case.

Then I really think that we need a starting trace point. I think that
having the full context information is really helpful in order to
understand latencies induced by allocations.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-09-23  8:07           ` Michal Hocko
@ 2016-10-06 12:00             ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-06 12:00 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

On Fri 23-09-16 10:07:09, Michal Hocko wrote:
> On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> > 
> > > On Sep 19, 2016, at 5:42 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > 
> > > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> > >> 
> > >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > >> 
> > >> I’m using the function graph tracer to see how long __alloc_pages_nodemask()
> > >> took.
> > > 
> > > How can you map the function graph tracer to a specif context? Let's say
> > > I would like to know why a particular allocation took so long. Would
> > > that be possible?
> > 
> > Maybe not. If the latencies are due to direct reclaim or memory compaction, you
> > get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
> > mm_compaction_begin, etc). But otherwise, you don’t get any context information. 
> > Function graph only gives the time spent in alloc_pages_nodemask() in that case.
> 
> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.

Are you planning to pursue this path?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-06 12:00             ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-06 12:00 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, vdavydov, vbabka, mgorman, rostedt

On Fri 23-09-16 10:07:09, Michal Hocko wrote:
> On Thu 22-09-16 11:30:36, Janani Ravichandran wrote:
> > 
> > > On Sep 19, 2016, at 5:42 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > 
> > > On Tue 13-09-16 14:04:49, Janani Ravichandran wrote:
> > >> 
> > >>> On Sep 12, 2016, at 8:16 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > >> 
> > >> Ia??m using the function graph tracer to see how long __alloc_pages_nodemask()
> > >> took.
> > > 
> > > How can you map the function graph tracer to a specif context? Let's say
> > > I would like to know why a particular allocation took so long. Would
> > > that be possible?
> > 
> > Maybe not. If the latencies are due to direct reclaim or memory compaction, you
> > get some information from the tracepoints (like mm_vmscan_direct_reclaim_begin,
> > mm_compaction_begin, etc). But otherwise, you dona??t get any context information. 
> > Function graph only gives the time spent in alloc_pages_nodemask() in that case.
> 
> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.

Are you planning to pursue this path?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-09-23  8:07           ` Michal Hocko
@ 2016-10-11 14:43             ` Janani Ravichandran
  -1 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-11 14:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm,
	vdavydov, vbabka, mgorman, rostedt

Hi Michal,

> 
Extremely sorry for the delayed response.

> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.
> — 

Alright. I’ll add a starting tracepoint, change the script accordingly and 
send a v2. Thanks!

Regards,
Janani.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-11 14:43             ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-11 14:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm,
	vdavydov, vbabka, mgorman, rostedt

Hi Michal,

> 
Extremely sorry for the delayed response.

> Then I really think that we need a starting trace point. I think that
> having the full context information is really helpful in order to
> understand latencies induced by allocations.
> — 

Alright. I’ll add a starting tracepoint, change the script accordingly and 
send a v2. Thanks!

Regards,
Janani.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-10-11 14:43             ` Janani Ravichandran
@ 2016-10-15 23:31               ` Janani Ravichandran
  -1 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-15 23:31 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Janani Ravichandran, linux-mm, linux-kernel


> On Oct 11, 2016, at 10:43 AM, Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> 
> Alright. I’ll add a starting tracepoint, change the script accordingly and 
> send a v2. Thanks!
> 
I looked at it again and I think that the context information we need 
can be obtained from the tracepoint trace_mm_page_alloc in 
alloc_pages_nodemask().

I’ll include that tracepoint in the script and send it along with the other
changes you suggested, if you’re fine with it.

Thanks!
Janani.
>> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-15 23:31               ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-15 23:31 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Janani Ravichandran, linux-mm, linux-kernel


> On Oct 11, 2016, at 10:43 AM, Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> 
> Alright. I’ll add a starting tracepoint, change the script accordingly and 
> send a v2. Thanks!
> 
I looked at it again and I think that the context information we need 
can be obtained from the tracepoint trace_mm_page_alloc in 
alloc_pages_nodemask().

I’ll include that tracepoint in the script and send it along with the other
changes you suggested, if you’re fine with it.

Thanks!
Janani.
>> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-10-15 23:31               ` Janani Ravichandran
@ 2016-10-16  7:33                 ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-16  7:33 UTC (permalink / raw)
  To: Janani Ravichandran; +Cc: linux-mm, linux-kernel

On Sat 15-10-16 19:31:22, Janani Ravichandran wrote:
> 
> > On Oct 11, 2016, at 10:43 AM, Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> > 
> > Alright. I’ll add a starting tracepoint, change the script accordingly and 
> > send a v2. Thanks!
> > 
> I looked at it again and I think that the context information we need 
> can be obtained from the tracepoint trace_mm_page_alloc in 
> alloc_pages_nodemask().

trace_mm_page_alloc will tell you details about the allocation, like
gfp mask, order but it doesn't tell you how long the allocation took at
its current form. So either you have to note jiffies at the allocation
start and then add the end-start in the trace point or we really need
another trace point to note the start. The later has an advantage that
we do not add unnecessary load for jiffies when the tracepoint is
disabled.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-16  7:33                 ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-16  7:33 UTC (permalink / raw)
  To: Janani Ravichandran; +Cc: linux-mm, linux-kernel

On Sat 15-10-16 19:31:22, Janani Ravichandran wrote:
> 
> > On Oct 11, 2016, at 10:43 AM, Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> > 
> > Alright. Ia??ll add a starting tracepoint, change the script accordingly and 
> > send a v2. Thanks!
> > 
> I looked at it again and I think that the context information we need 
> can be obtained from the tracepoint trace_mm_page_alloc in 
> alloc_pages_nodemask().

trace_mm_page_alloc will tell you details about the allocation, like
gfp mask, order but it doesn't tell you how long the allocation took at
its current form. So either you have to note jiffies at the allocation
start and then add the end-start in the trace point or we really need
another trace point to note the start. The later has an advantage that
we do not add unnecessary load for jiffies when the tracepoint is
disabled.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
       [not found]                 ` <CANnt6X=RpSnuxGXZfF6Qa5mJpzC8gL3wkKJi3tQMZJBZJVWF3w@mail.gmail.com>
@ 2016-10-17 17:31                   ` Janani Ravichandran
  2016-10-18 13:13                       ` Michal Hocko
  0 siblings, 1 reply; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-17 17:31 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Janani Ravichandran, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]


> On Oct 17, 2016, at 1:24 PM, Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> 
> 
> On Sun, Oct 16, 2016 at 3:33 AM, Michal Hocko <mhocko@kernel.org <mailto:mhocko@kernel.org>> wrote:
> 
> trace_mm_page_alloc will tell you details about the allocation, like
> gfp mask, order but it doesn't tell you how long the allocation took at
> its current form. So either you have to note jiffies at the allocation
> start and then add the end-start in the trace point or we really need
> another trace point to note the start. The later has an advantage that
> we do not add unnecessary load for jiffies when the tracepoint is
> disabled.

The function graph tracer can tell us how long alloc_pages_nodemask() took.
Can’t that, combined with the context information given by trace_mm_page_alloc
give us what we want? Correct me if I am wrong.

Regards,
Janani.

> --
> Michal Hocko
> SUSE Labs
> 


[-- Attachment #2: Type: text/html, Size: 2169 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-10-17 17:31                   ` Janani Ravichandran
@ 2016-10-18 13:13                       ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-18 13:13 UTC (permalink / raw)
  To: Janani Ravichandran; +Cc: linux-mm, linux-kernel

On Mon 17-10-16 13:31:57, Janani Ravichandran wrote:
> 
> > On Oct 17, 2016, at 1:24 PM, Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> > 
> > 
> > On Sun, Oct 16, 2016 at 3:33 AM, Michal Hocko <mhocko@kernel.org <mailto:mhocko@kernel.org>> wrote:
> > 
> > trace_mm_page_alloc will tell you details about the allocation, like
> > gfp mask, order but it doesn't tell you how long the allocation took at
> > its current form. So either you have to note jiffies at the allocation
> > start and then add the end-start in the trace point or we really need
> > another trace point to note the start. The later has an advantage that
> > we do not add unnecessary load for jiffies when the tracepoint is
> > disabled.
> 
> The function graph tracer can tell us how long alloc_pages_nodemask() took.
> Can’t that, combined with the context information given by trace_mm_page_alloc
> give us what we want? Correct me if I am wrong.

yes, function_graph tracer will give you _some_ information but it will
not have the context you are looking for, right? See the following
example

 ------------------------------------------
 0) x-www-b-22756  =>  x-termi-4083 
 ------------------------------------------

 0)               |  __alloc_pages_nodemask() {
 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   3.328 us    |  }
 3)               |  __alloc_pages_nodemask() {
 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.011 us    |  }
 0)               |  __alloc_pages_nodemask() {
 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   0.587 us    |  }
 3)               |  __alloc_pages_nodemask() {
 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.125 us    |  }

How do I know which process has performed those allocations? I know that
CPU0 should be running x-termi-4083 but what is running on other CPUs?

Let me explain my usecase I am very interested in. Say I that a usespace
application is not performing well. I would like to see some statistics
about memory allocations performed for that app - are there few outliers
or the allocation stalls increase gradually? Where do we spend time during
that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?

To answer those questions I need to track particular events (alocation,
reclaim, compaction) to the process and know how long each step
took. Maybe we can reconstruct something from the above output but it is
a major PITA.  If we either hard start/stop pairs for each step (which
we already do have for reclaim, compaction AFAIR) then this is an easy
scripting. Another option would be to have only a single tracepoint for
each step with a timing information.

See my point?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-18 13:13                       ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-18 13:13 UTC (permalink / raw)
  To: Janani Ravichandran; +Cc: linux-mm, linux-kernel

On Mon 17-10-16 13:31:57, Janani Ravichandran wrote:
> 
> > On Oct 17, 2016, at 1:24 PM, Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> > 
> > 
> > On Sun, Oct 16, 2016 at 3:33 AM, Michal Hocko <mhocko@kernel.org <mailto:mhocko@kernel.org>> wrote:
> > 
> > trace_mm_page_alloc will tell you details about the allocation, like
> > gfp mask, order but it doesn't tell you how long the allocation took at
> > its current form. So either you have to note jiffies at the allocation
> > start and then add the end-start in the trace point or we really need
> > another trace point to note the start. The later has an advantage that
> > we do not add unnecessary load for jiffies when the tracepoint is
> > disabled.
> 
> The function graph tracer can tell us how long alloc_pages_nodemask() took.
> Cana??t that, combined with the context information given by trace_mm_page_alloc
> give us what we want? Correct me if I am wrong.

yes, function_graph tracer will give you _some_ information but it will
not have the context you are looking for, right? See the following
example

 ------------------------------------------
 0) x-www-b-22756  =>  x-termi-4083 
 ------------------------------------------

 0)               |  __alloc_pages_nodemask() {
 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   3.328 us    |  }
 3)               |  __alloc_pages_nodemask() {
 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.011 us    |  }
 0)               |  __alloc_pages_nodemask() {
 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 0)   0.587 us    |  }
 3)               |  __alloc_pages_nodemask() {
 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
 3)   1.125 us    |  }

How do I know which process has performed those allocations? I know that
CPU0 should be running x-termi-4083 but what is running on other CPUs?

Let me explain my usecase I am very interested in. Say I that a usespace
application is not performing well. I would like to see some statistics
about memory allocations performed for that app - are there few outliers
or the allocation stalls increase gradually? Where do we spend time during
that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?

To answer those questions I need to track particular events (alocation,
reclaim, compaction) to the process and know how long each step
took. Maybe we can reconstruct something from the above output but it is
a major PITA.  If we either hard start/stop pairs for each step (which
we already do have for reclaim, compaction AFAIR) then this is an easy
scripting. Another option would be to have only a single tracepoint for
each step with a timing information.

See my point?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-10-18 13:13                       ` Michal Hocko
@ 2016-10-20 23:10                         ` Janani Ravichandran
  -1 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-20 23:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Janani Ravichandran, linux-mm, linux-kernel

Michal,

> On Oct 18, 2016, at 8:13 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
>> 
> 
> yes, function_graph tracer will give you _some_ information but it will
> not have the context you are looking for, right? See the following
> example
> 
> ------------------------------------------
> 0) x-www-b-22756  =>  x-termi-4083 
> ------------------------------------------
> 
> 0)               |  __alloc_pages_nodemask() {
> 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   3.328 us    |  }
> 3)               |  __alloc_pages_nodemask() {
> 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.011 us    |  }
> 0)               |  __alloc_pages_nodemask() {
> 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   0.587 us    |  }
> 3)               |  __alloc_pages_nodemask() {
> 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.125 us    |  }
> 
> How do I know which process has performed those allocations? I know that
> CPU0 should be running x-termi-4083 but what is running on other CPUs?
> 
> Let me explain my usecase I am very interested in. Say I that a usespace
> application is not performing well. I would like to see some statistics
> about memory allocations performed for that app - are there few outliers
> or the allocation stalls increase gradually? Where do we spend time during
> that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> 
> To answer those questions I need to track particular events (alocation,
> reclaim, compaction) to the process and know how long each step
> took. Maybe we can reconstruct something from the above output but it is
> a major PITA.  If we either hard start/stop pairs for each step (which
> we already do have for reclaim, compaction AFAIR) then this is an easy
> scripting. Another option would be to have only a single tracepoint for
> each step with a timing information.
> 
> See my point?

Yes, if we want to know what processes are running on what CPUs,
echo funcgraph-proc > trace_options in the tracing directory should give us
what we want.

The bash script which is part of this patch does this kind of setup for you.
As a result, the output you get is something like what you see here:

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt

Does this answer your question? Let me know if otherwise.

Janani.

> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-20 23:10                         ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-20 23:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Janani Ravichandran, linux-mm, linux-kernel

Michal,

> On Oct 18, 2016, at 8:13 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
>> 
> 
> yes, function_graph tracer will give you _some_ information but it will
> not have the context you are looking for, right? See the following
> example
> 
> ------------------------------------------
> 0) x-www-b-22756  =>  x-termi-4083 
> ------------------------------------------
> 
> 0)               |  __alloc_pages_nodemask() {
> 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   3.328 us    |  }
> 3)               |  __alloc_pages_nodemask() {
> 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.011 us    |  }
> 0)               |  __alloc_pages_nodemask() {
> 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 0)   0.587 us    |  }
> 3)               |  __alloc_pages_nodemask() {
> 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> 3)   1.125 us    |  }
> 
> How do I know which process has performed those allocations? I know that
> CPU0 should be running x-termi-4083 but what is running on other CPUs?
> 
> Let me explain my usecase I am very interested in. Say I that a usespace
> application is not performing well. I would like to see some statistics
> about memory allocations performed for that app - are there few outliers
> or the allocation stalls increase gradually? Where do we spend time during
> that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> 
> To answer those questions I need to track particular events (alocation,
> reclaim, compaction) to the process and know how long each step
> took. Maybe we can reconstruct something from the above output but it is
> a major PITA.  If we either hard start/stop pairs for each step (which
> we already do have for reclaim, compaction AFAIR) then this is an easy
> scripting. Another option would be to have only a single tracepoint for
> each step with a timing information.
> 
> See my point?

Yes, if we want to know what processes are running on what CPUs,
echo funcgraph-proc > trace_options in the tracing directory should give us
what we want.

The bash script which is part of this patch does this kind of setup for you.
As a result, the output you get is something like what you see here:

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt

Does this answer your question? Let me know if otherwise.

Janani.

> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-10-20 23:10                         ` Janani Ravichandran
@ 2016-10-21  7:08                           ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-21  7:08 UTC (permalink / raw)
  To: Janani Ravichandran; +Cc: linux-mm, linux-kernel

On Thu 20-10-16 18:10:37, Janani Ravichandran wrote:
> Michal,
> 
> > On Oct 18, 2016, at 8:13 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> >> 
> > 
> > yes, function_graph tracer will give you _some_ information but it will
> > not have the context you are looking for, right? See the following
> > example
> > 
> > ------------------------------------------
> > 0) x-www-b-22756  =>  x-termi-4083 
> > ------------------------------------------
> > 
> > 0)               |  __alloc_pages_nodemask() {
> > 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   3.328 us    |  }
> > 3)               |  __alloc_pages_nodemask() {
> > 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.011 us    |  }
> > 0)               |  __alloc_pages_nodemask() {
> > 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   0.587 us    |  }
> > 3)               |  __alloc_pages_nodemask() {
> > 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.125 us    |  }
> > 
> > How do I know which process has performed those allocations? I know that
> > CPU0 should be running x-termi-4083 but what is running on other CPUs?
> > 
> > Let me explain my usecase I am very interested in. Say I that a usespace
> > application is not performing well. I would like to see some statistics
> > about memory allocations performed for that app - are there few outliers
> > or the allocation stalls increase gradually? Where do we spend time during
> > that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> > 
> > To answer those questions I need to track particular events (alocation,
> > reclaim, compaction) to the process and know how long each step
> > took. Maybe we can reconstruct something from the above output but it is
> > a major PITA.  If we either hard start/stop pairs for each step (which
> > we already do have for reclaim, compaction AFAIR) then this is an easy
> > scripting. Another option would be to have only a single tracepoint for
> > each step with a timing information.
> > 
> > See my point?
> 
> Yes, if we want to know what processes are running on what CPUs,
> echo funcgraph-proc > trace_options in the tracing directory should give us
> what we want.

Interesting.
$ cat /debug/tracing/available_tracers 
function_graph preemptirqsoff preemptoff irqsoff function nop

Do I have to configure anything specially? And if I do why isn't it any
better to simply add a start tracepoint and make this available also to
older kernels?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-21  7:08                           ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-10-21  7:08 UTC (permalink / raw)
  To: Janani Ravichandran; +Cc: linux-mm, linux-kernel

On Thu 20-10-16 18:10:37, Janani Ravichandran wrote:
> Michal,
> 
> > On Oct 18, 2016, at 8:13 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> >> 
> > 
> > yes, function_graph tracer will give you _some_ information but it will
> > not have the context you are looking for, right? See the following
> > example
> > 
> > ------------------------------------------
> > 0) x-www-b-22756  =>  x-termi-4083 
> > ------------------------------------------
> > 
> > 0)               |  __alloc_pages_nodemask() {
> > 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   3.328 us    |  }
> > 3)               |  __alloc_pages_nodemask() {
> > 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.011 us    |  }
> > 0)               |  __alloc_pages_nodemask() {
> > 0)               |  /* mm_page_alloc: page=ffffea000411b380 pfn=1066702 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 0)   0.587 us    |  }
> > 3)               |  __alloc_pages_nodemask() {
> > 3)               |  /* mm_page_alloc: page=ffffea0008f1f6c0 pfn=2344923 order=0 migratetype=0 gfp_flags=GFP_KERNEL */
> > 3)   1.125 us    |  }
> > 
> > How do I know which process has performed those allocations? I know that
> > CPU0 should be running x-termi-4083 but what is running on other CPUs?
> > 
> > Let me explain my usecase I am very interested in. Say I that a usespace
> > application is not performing well. I would like to see some statistics
> > about memory allocations performed for that app - are there few outliers
> > or the allocation stalls increase gradually? Where do we spend time during
> > that allocation? Reclaim LRU pages? Compaction or the slab shrinkers?
> > 
> > To answer those questions I need to track particular events (alocation,
> > reclaim, compaction) to the process and know how long each step
> > took. Maybe we can reconstruct something from the above output but it is
> > a major PITA.  If we either hard start/stop pairs for each step (which
> > we already do have for reclaim, compaction AFAIR) then this is an easy
> > scripting. Another option would be to have only a single tracepoint for
> > each step with a timing information.
> > 
> > See my point?
> 
> Yes, if we want to know what processes are running on what CPUs,
> echo funcgraph-proc > trace_options in the tracing directory should give us
> what we want.

Interesting.
$ cat /debug/tracing/available_tracers 
function_graph preemptirqsoff preemptoff irqsoff function nop

Do I have to configure anything specially? And if I do why isn't it any
better to simply add a start tracepoint and make this available also to
older kernels?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
  2016-10-21  7:08                           ` Michal Hocko
@ 2016-10-27 15:42                             ` Janani Ravichandran
  -1 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-27 15:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Janani Ravichandran, linux-mm, linux-kernel


> On Oct 21, 2016, at 3:08 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> Interesting.
> $ cat /debug/tracing/available_tracers 
> function_graph preemptirqsoff preemptoff irqsoff function nop
> 
> Do I have to configure anything specially? And if I do why isn't it any
> better to simply add a start tracepoint and make this available also to
> older kernels?

Well, you just need to enable the function_graph tracer in the tracing directory:
echo function_graph > current_tracer,

set funcgraph-proc in trace_options to get process information:
echo funcgraph-proc > trace_options,

set funcgraph-abstime in trace_options to get timestamp,
echo funcgraph-abstime > trace_options

set all the functions we’d like to observe as filters. For e.g.
echo __alloc_pages_nodemask > set_ftrace_filter

and enable the tracepoints we would like to get information from.

I didn’t add a begin tracepoint for this as Steven Rostedt had suggested
using function graph instead of begin/end tracepoints (we already have
a tracepoint in __alloc_pages_nodemask - trace_mm_page_alloc to get 
some information about the allocation and we can just use function graph 
to see how long __alloc_pages_nodemask() takes).

Janani

> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-10-27 15:42                             ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-10-27 15:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Janani Ravichandran, linux-mm, linux-kernel


> On Oct 21, 2016, at 3:08 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> Interesting.
> $ cat /debug/tracing/available_tracers 
> function_graph preemptirqsoff preemptoff irqsoff function nop
> 
> Do I have to configure anything specially? And if I do why isn't it any
> better to simply add a start tracepoint and make this available also to
> older kernels?

Well, you just need to enable the function_graph tracer in the tracing directory:
echo function_graph > current_tracer,

set funcgraph-proc in trace_options to get process information:
echo funcgraph-proc > trace_options,

set funcgraph-abstime in trace_options to get timestamp,
echo funcgraph-abstime > trace_options

set all the functions we’d like to observe as filters. For e.g.
echo __alloc_pages_nodemask > set_ftrace_filter

and enable the tracepoints we would like to get information from.

I didn’t add a begin tracepoint for this as Steven Rostedt had suggested
using function graph instead of begin/end tracepoints (we already have
a tracepoint in __alloc_pages_nodemask - trace_mm_page_alloc to get 
some information about the allocation and we can just use function graph 
to see how long __alloc_pages_nodemask() takes).

Janani

> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-08-19 11:38 ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-08-19 11:38 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: riel, akpm, vdavydov, mhocko, vbabka, mgorman, kirill.shutemov, rostedt

The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
---
 scripts/tracing/allocation_postprocess.py | 267 ++++++++++++++++++++++++++++++
 scripts/tracing/setup_alloc_trace.sh      |  88 ++++++++++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 0000000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN        = 1
+DIRECT_RECLAIM_END          = 2
+SHRINK_SLAB_START           = 3
+SHRINK_INACTIVE_LIST        = 5
+TRY_TO_COMPACT              = 6
+COMPACTION_BEGIN            = 7
+COMPACTION_END              = 8
+
+SECS_TO_US                  = 1000000
+US_TO_MS                    = 1000
+
+# Parse command line arguments
+parser = argparse.ArgumentParser(description='Parser for latency analyzer')
+
+parser.add_argument('-s', '--source', action='store',
+                    default='/sys/kernel/debug/tracing',
+                    dest='source_path',
+                    help='Specify source file to read trace output')
+parser.add_argument('-t', '--threshold', action='store', default=0.0,
+                    dest='threshold', type=float)
+args = parser.parse_args()
+
+source_path = ''.join((args.source_path,'/trace_pipe'))
+threshold = args.threshold
+
+# Regexes
+line_pattern = re.compile(r'\s*(\d+\.\d+)\s+\|\s+\d*\)*\s+([\w-]+)\s+\|\s+.*\s+(\d*\.*\d*)\s+[us]{0,2}\s+\|\s+(.*)')
+tracepoint_pattern = re.compile(r'\/\*\s*([\w]*):\s*(.*)\s*\*\/')
+shrinker_pattern = re.compile(r'\s*([\w]*)\+(.*)\s*')
+function_end_pattern = re.compile(r'.*\/\*\s*([\w]*)\s*\*\/')
+
+# The dictionary which holds tracepoint information for all processes
+all_information = defaultdict(dict)
+
+# The dictionary which holds shrinker latencies
+shrinker_latencies = defaultdict(float)
+shrinker_max_latencies = defaultdict(float)
+
+def print_shrinker_latencies(signum, frame):
+    """ This function prints the time spent in each shrinker, when CTRL+C is 
+    pressed.
+    """
+
+    signal.signal(signal.SIGINT, original_sigint)
+    for key, value in shrinker_latencies.iteritems():
+        print '%s : total time = %f ms, max latency = %f ms' %(key,
+                                        value*US_TO_MS,
+                                        shrinker_max_latencies[key]*US_TO_MS)
+    sys.exit(0)
+
+original_sigint = signal.getsignal(signal.SIGINT)
+signal.signal(signal.SIGINT, print_shrinker_latencies)
+
+
+def set_begin_info(process, EVENT, timestamp, info):
+    """ This function sets information associated with mm_shrink_slab_start.
+    It sets the entire tracepoint info, along with the timestamp, which will
+    be used to calculate latencies when corresponding mm_ shrink_slab_end
+    tracepoints are encountered.
+    """
+
+    per_process_dict = all_information[process]
+    begin_info = {}
+    begin_info["data"] = info
+    begin_info["time"] = timestamp
+    per_process_dict[EVENT] = begin_info
+
+
+def set_trace_info(process, EVENT, info):
+    """ This function sets trace information associated with specific events.
+    """
+
+    per_process_dict = all_information[process]
+    per_process_dict[EVENT] = info
+
+
+def store_max_latency(shrinker_name, latency):
+    """ This function stores the maximum latency encountered in a shrinker. """
+
+    max_latency = shrinker_max_latencies[shrinker_name]
+    if latency > max_latency:
+        shrinker_max_latencies[shrinker_name] = latency
+
+
+def find_latency(process, BEGIN_EVENT, timestamp):
+    """ This function calculates shrinker latencies."""
+
+    per_process_dict = all_information.get(process, None)
+    if per_process_dict:
+        begin_info = per_process_dict.get(BEGIN_EVENT, None)
+        if begin_info:
+            begin_data = begin_info.get("data", None)
+            begin_time = begin_info.get("time", None)
+            if begin_time:
+                time_elapsed = float(timestamp) - float(begin_time)
+                if time_elapsed*SECS_TO_US > threshold:
+                    return (True, begin_data, time_elapsed)
+                return (False, begin_data, time_elapsed)
+    return (False, None, 0.0)
+
+
+def print_line(line_info):
+    print line_info,
+
+
+def print_tracepoint(process, EVENT, info):
+    if info:
+        print info,
+    else:
+        per_process_dict = all_information.get(process, None)
+        TP_info = per_process_dict.get(EVENT, None)
+        if TP_info:
+            print TP_info,
+        per_process_dict.pop(EVENT, None)
+
+with open(source_path) as f:
+    for line in f:
+        line_match = re.match(line_pattern, line)
+        if line_match:
+            timestamp = line_match.group(1)
+            process_info = line_match.group(2)
+            function_match = re.match(function_end_pattern, line_match.group(4))
+            tracepoint_match = re.match(tracepoint_pattern, line_match.group(4))
+            if tracepoint_match:
+                TP_whole = line_match.group(4)
+                TP_name = tracepoint_match.group(1)
+                TP_info = tracepoint_match.group(2)
+
+
+                def call_set_trace_info(EVENT):
+                    set_trace_info(process_info, EVENT, line)
+
+
+                def direct_reclaim_b():
+                    call_set_trace_info(DIRECT_RECLAIM_BEGIN)
+
+
+                def direct_reclaim_e():
+                    call_set_trace_info(DIRECT_RECLAIM_END)
+
+
+                def shrink_inactive_list():
+                    call_set_trace_info(SHRINK_INACTIVE_LIST)
+
+
+                def shrink_slab_b():
+                    set_begin_info(process_info, SHRINK_SLAB_START, timestamp,
+                                    line)
+
+
+                def shrink_slab_e():
+                    delay_status, begin_data, time_elapsed = find_latency(
+                                                                process_info,
+                                                                SHRINK_SLAB_START,
+                                                                timestamp)
+                    shrinker_match = re.match(shrinker_pattern, TP_info)
+                    if shrinker_match:
+                        shrinker_name = shrinker_match.group(1)
+                        shrinker_latencies[shrinker_name] += time_elapsed
+                        store_max_latency(shrinker_name, time_elapsed)
+
+                    if delay_status:
+                        print_tracepoint(process_info,
+                                         SHRINK_SLAB_START,
+                                         begin_data)
+                        print_tracepoint(process_info,
+                                         None,
+                                         line)
+
+
+                def try_to_compact():
+                    call_set_trace_info(TRY_TO_COMPACT)
+
+
+                def compact_b():
+                    call_set_trace_info(COMPACTION_BEGIN)
+
+
+                def compact_e():
+                    call_set_trace_info(COMPACTION_END)
+
+
+                trace_match = {'mm_vmscan_direct_reclaim_begin' : direct_reclaim_b,
+                               'mm_vmscan_direct_reclaim_end'   : direct_reclaim_e,
+                               'mm_shrink_slab_start'           : shrink_slab_b,
+                               'mm_shrink_slab_end'             : shrink_slab_e,
+                               'mm_vmscan_lru_shrink_inactive'  :
+                                                              shrink_inactive_list,
+                               'mm_compaction_try_to_compact_pages':
+                                                              try_to_compact,
+                               'mm_compaction_begin'            : compact_b,
+                               'mm_compaction_end'              : compact_e}
+
+                if TP_name in trace_match:
+                    trace_match[TP_name]()
+                else:
+                    pass
+
+            else:
+                function_match = re.match(function_end_pattern,
+                                          line_match.group(4))
+                if function_match:
+                    function_name = function_match.group(1)
+
+
+                    def alloc_pages():
+                        print_line(line)
+                        all_information.pop(process_info, None)
+
+
+                    def try_to_free_pages():
+                        print_tracepoint(process_info, DIRECT_RECLAIM_BEGIN, None)
+                        print_tracepoint(process_info, DIRECT_RECLAIM_END, None)
+                        print_line(line)
+
+
+                    def shrink_inactive_list():
+                        print_tracepoint(process_info, SHRINK_INACTIVE_LIST, None)
+                        print_line(line)
+
+
+                    def try_to_compact():
+                        print_tracepoint(process_info, TRY_TO_COMPACT, None)
+                        print_line(line)
+
+
+                    def compact_zone():
+                        print_tracepoint(process_info, COMPACTION_BEGIN, None)
+                        print_tracepoint(process_info, COMPACTION_END, None)
+                        print_line(line)
+
+
+                    f_match = {'__alloc_pages_nodemask' : alloc_pages,
+                               'try_to_free_pages'      : try_to_free_pages,
+                               'shrink_inactive_list'   : shrink_inactive_list,
+                               'try_to_compact'         : try_to_compact,
+                               'compact_zone'           : compact_zone}
+
+                    if function_name in f_match:
+                        f_match[function_name]()
+                    else:
+                        print_line(line)
diff --git a/scripts/tracing/setup_alloc_trace.sh b/scripts/tracing/setup_alloc_trace.sh
new file mode 100755
index 0000000..9a558b0
--- /dev/null
+++ b/scripts/tracing/setup_alloc_trace.sh
@@ -0,0 +1,88 @@
+#! /bin/bash
+
+# This script does all the basic setup necessary for allocation_postprocess.py
+# and then invokes the script. All the setup that is done at the beginning
+# is cleared on exit.
+
+# Usage: # ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s
+# path/to/tracing/directory > path/to/output/file.
+
+while getopts :t:s: name
+do
+	case $name in
+		t)threshold=$OPTARG;;
+		s)trace_dir=$OPTARG;;
+		*) echo "Usage: ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s path/to/tracing/directory"
+		esac
+done
+
+if [[ -z $threshold ]]
+then
+	echo "Must specify threshold."
+	exit 1
+fi
+
+if [[ -z $trace_dir ]]
+then
+	trace_dir="/sys/kernel/debug/tracing"
+fi
+
+pwd=`pwd`
+cd $trace_dir
+echo 0 > tracing_on
+
+echo function_graph > current_tracer
+echo funcgraph-proc > trace_options
+echo funcgraph-abstime > trace_options
+
+# set filter functions
+echo __alloc_pages_nodemask > set_ftrace_filter
+echo try_to_free_pages >> set_ftrace_filter
+echo mem_cgroup_soft_limit_reclaim >> set_ftrace_filter
+echo shrink_node >> set_ftrace_filter
+echo shrink_node_memcg >> set_ftrace_filter
+echo shrink_slab >> set_ftrace_filter
+echo shrink_active_list >> set_ftrace_filter
+echo shrink_inactive_list >> set_ftrace_filter
+echo compact_zone >> set_ftrace_filter
+echo try_to_compact_pages >> set_ftrace_filter
+
+echo $threshold > tracing_thresh
+
+# set tracepoints
+echo 1 > events/vmscan/mm_shrink_slab_start/enable
+echo 1 > events/vmscan/mm_shrink_slab_end/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+echo 1 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+echo 1 > events/compaction/mm_compaction_begin/enable
+echo 1 > events/compaction/mm_compaction_end/enable
+echo 1 > events/compaction/mm_compaction_try_to_compact_pages/enable
+echo 1 > tracing_on
+
+cd $pwd
+
+./allocation_postprocess.py -t $threshold -s $trace_dir
+
+function cleanup {
+	cd $trace_dir
+	echo 0 > tracing_on
+	echo nop > current_tracer
+	echo > set_ftrace_filter
+	echo 0 > tracing_thresh
+
+	echo 0 > events/vmscan/mm_shrink_slab_start/enable
+	echo 0 > events/vmscan/mm_shrink_slab_end/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+	echo 0 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+	echo 0 > events/compaction/mm_compaction_begin/enable
+	echo 0 > events/compaction/mm_compaction_end/enable
+	echo 0 > events/compaction/mm_compaction_try_to_compact_pages/enable
+
+	exit $?
+}
+
+trap cleanup SIGINT
+trap cleanup SIGTERM
+trap cleanup EXIT
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC] scripts: Include postprocessing script for memory allocation tracing
@ 2016-08-19 11:38 ` Janani Ravichandran
  0 siblings, 0 replies; 31+ messages in thread
From: Janani Ravichandran @ 2016-08-19 11:38 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: riel, akpm, vdavydov, mhocko, vbabka, mgorman, kirill.shutemov, rostedt

The objective of this patch is to help users observe latencies in memory
allocation.
The function graph tracer is great for seeing how long functions took to
execute. And often, tracepoints, along with the tracer help understand
situations better. However, while it is possible to set a threshold for
function graph and have only the functions that exceed a certain
threshold appear in the output (by echoing the threshold value in
tracing_thresh in the tracing directory), there is no method to filter
out tracepoints that are not associated with functions whose latencies
exceed the threshold.

When the threshold is set high, it is possible that a lot of information
that is of little interest to the user is printed from the tracepoints.
Limiting this information can help reduce disk I/O significantly.

This patch deals with latencies in memory allocation and more
specifically, direct reclaim and compaction.

setup_alloc_trace.sh is a bash script which handles the initial the setup of
function graph, specifies which functions to include in the output and
enables some tracepoints of interest. Upon exit, it clears all the
values set.

The functions traced currently are __alloc_pages_nodemask,
try_to_free_pages, mem_cgroup_soft_limit_reclaim, shrink_node,
shrink_node_memcg, shrink_slab, shrink_active_list,
shrink_inactive_list, compact_zone and try_to_compact_pages.

The tracepoints enabled are mm_shrink_slab_start,
mm_slab_slab_end, mm_vmscan_direct_reclaim_begin,
mm_vmscan_direct_reclaim_end, mm_vmscan_lru_shrink_inactive,
mm_compaction_begin, mm_compation_end,
mm_compaction_try_to_compact_pages.

More functions can be traced as desired by making changes to
setup_alloc_trace.sh accordingly.

allocation_postprocess.py is a script which reads from trace_pipe. It
does the following to filter out info from tracepoints that may not
be important:

1. Displays mm_vmscan_direct_reclaim_begin and
mm_vmscan_direct_reclaim_end only when try_to_free_pages has
exceeded the threshold.
2. Displays mm_compaction_begin and mm_compaction_end only when
compact_zone has exceeded the threshold.
3. Displays mm_compaction_try_to_compat_pages only when
try_to_compact_pages has exceeded the threshold.
4. Displays mm_shrink_slab_start and mm_shrink_slab_end only when
the time elapsed between them exceeds the threshold.
5. Displays mm_vmscan_lru_shrink_inactive only when shrink_inactive_list
has exceeded the threshold.

When CTRL+C is pressed, the script shows the times taken by the
shrinkers. However, currently it is not possible to differentiate among the
superblock shrinkers.

Sample output:
^Ci915_gem_shrinker_scan : total time = 8.731000 ms, max latency =
0.278000 ms
ext4_es_scan : total time = 0.970000 ms, max latency = 0.129000 ms
scan_shadow_nodes : total time = 1.150000 ms, max latency = 0.175000 ms
super_cache_scan : total time = 8.455000 ms, max latency = 0.466000 ms
deferred_split_scan : total time = 25.767000 ms, max latency = 25.485000
ms

Usage:

# ./setup_alloc_trace.sh -t 134 -s /sys/kernel/debug/tracing > file.dat

Where -t represents threshold (134 us in this case) and -s represents
the path to the tracing diretory. The default is
/sys/kernel/debug/tracing, which is used when no path is specified.
The threshold on the other hand, must be set.

Ideas/ comments/ suggestions are welcome, escpecially on adherence to
the python coding style followed by the Linux community and the
functions enabled to be traced.

Thanks.

Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
---
 scripts/tracing/allocation_postprocess.py | 267 ++++++++++++++++++++++++++++++
 scripts/tracing/setup_alloc_trace.sh      |  88 ++++++++++
 2 files changed, 355 insertions(+)
 create mode 100755 scripts/tracing/allocation_postprocess.py
 create mode 100755 scripts/tracing/setup_alloc_trace.sh

diff --git a/scripts/tracing/allocation_postprocess.py b/scripts/tracing/allocation_postprocess.py
new file mode 100755
index 0000000..2f65457
--- /dev/null
+++ b/scripts/tracing/allocation_postprocess.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python
+# python 2.7
+
+"""
+This script uses function graph and some of the existing
+tracepoints to help people observe how long page allocations and some
+functions in the direct reclaim and compaction paths take.
+The functions and tracepoints enabled can be seen in setup_alloc_trace.sh.
+It reads from trace_pipe of the tracing directory and prints
+only those tracepoints and functions which are associated
+with latencies greater than the threshold specified.
+When CTRL+C is pressed, the times spent in the various shrinkers are displayed.
+The setup of trace is done in setup_alloc_trace.sh, from where this script is
+invoked.
+"""
+
+import argparse
+import re
+import sys
+import signal
+
+from collections import defaultdict
+
+# Constants for tracepoints
+
+DIRECT_RECLAIM_BEGIN        = 1
+DIRECT_RECLAIM_END          = 2
+SHRINK_SLAB_START           = 3
+SHRINK_INACTIVE_LIST        = 5
+TRY_TO_COMPACT              = 6
+COMPACTION_BEGIN            = 7
+COMPACTION_END              = 8
+
+SECS_TO_US                  = 1000000
+US_TO_MS                    = 1000
+
+# Parse command line arguments
+parser = argparse.ArgumentParser(description='Parser for latency analyzer')
+
+parser.add_argument('-s', '--source', action='store',
+                    default='/sys/kernel/debug/tracing',
+                    dest='source_path',
+                    help='Specify source file to read trace output')
+parser.add_argument('-t', '--threshold', action='store', default=0.0,
+                    dest='threshold', type=float)
+args = parser.parse_args()
+
+source_path = ''.join((args.source_path,'/trace_pipe'))
+threshold = args.threshold
+
+# Regexes
+line_pattern = re.compile(r'\s*(\d+\.\d+)\s+\|\s+\d*\)*\s+([\w-]+)\s+\|\s+.*\s+(\d*\.*\d*)\s+[us]{0,2}\s+\|\s+(.*)')
+tracepoint_pattern = re.compile(r'\/\*\s*([\w]*):\s*(.*)\s*\*\/')
+shrinker_pattern = re.compile(r'\s*([\w]*)\+(.*)\s*')
+function_end_pattern = re.compile(r'.*\/\*\s*([\w]*)\s*\*\/')
+
+# The dictionary which holds tracepoint information for all processes
+all_information = defaultdict(dict)
+
+# The dictionary which holds shrinker latencies
+shrinker_latencies = defaultdict(float)
+shrinker_max_latencies = defaultdict(float)
+
+def print_shrinker_latencies(signum, frame):
+    """ This function prints the time spent in each shrinker, when CTRL+C is 
+    pressed.
+    """
+
+    signal.signal(signal.SIGINT, original_sigint)
+    for key, value in shrinker_latencies.iteritems():
+        print '%s : total time = %f ms, max latency = %f ms' %(key,
+                                        value*US_TO_MS,
+                                        shrinker_max_latencies[key]*US_TO_MS)
+    sys.exit(0)
+
+original_sigint = signal.getsignal(signal.SIGINT)
+signal.signal(signal.SIGINT, print_shrinker_latencies)
+
+
+def set_begin_info(process, EVENT, timestamp, info):
+    """ This function sets information associated with mm_shrink_slab_start.
+    It sets the entire tracepoint info, along with the timestamp, which will
+    be used to calculate latencies when corresponding mm_ shrink_slab_end
+    tracepoints are encountered.
+    """
+
+    per_process_dict = all_information[process]
+    begin_info = {}
+    begin_info["data"] = info
+    begin_info["time"] = timestamp
+    per_process_dict[EVENT] = begin_info
+
+
+def set_trace_info(process, EVENT, info):
+    """ This function sets trace information associated with specific events.
+    """
+
+    per_process_dict = all_information[process]
+    per_process_dict[EVENT] = info
+
+
+def store_max_latency(shrinker_name, latency):
+    """ This function stores the maximum latency encountered in a shrinker. """
+
+    max_latency = shrinker_max_latencies[shrinker_name]
+    if latency > max_latency:
+        shrinker_max_latencies[shrinker_name] = latency
+
+
+def find_latency(process, BEGIN_EVENT, timestamp):
+    """ This function calculates shrinker latencies."""
+
+    per_process_dict = all_information.get(process, None)
+    if per_process_dict:
+        begin_info = per_process_dict.get(BEGIN_EVENT, None)
+        if begin_info:
+            begin_data = begin_info.get("data", None)
+            begin_time = begin_info.get("time", None)
+            if begin_time:
+                time_elapsed = float(timestamp) - float(begin_time)
+                if time_elapsed*SECS_TO_US > threshold:
+                    return (True, begin_data, time_elapsed)
+                return (False, begin_data, time_elapsed)
+    return (False, None, 0.0)
+
+
+def print_line(line_info):
+    print line_info,
+
+
+def print_tracepoint(process, EVENT, info):
+    if info:
+        print info,
+    else:
+        per_process_dict = all_information.get(process, None)
+        TP_info = per_process_dict.get(EVENT, None)
+        if TP_info:
+            print TP_info,
+        per_process_dict.pop(EVENT, None)
+
+with open(source_path) as f:
+    for line in f:
+        line_match = re.match(line_pattern, line)
+        if line_match:
+            timestamp = line_match.group(1)
+            process_info = line_match.group(2)
+            function_match = re.match(function_end_pattern, line_match.group(4))
+            tracepoint_match = re.match(tracepoint_pattern, line_match.group(4))
+            if tracepoint_match:
+                TP_whole = line_match.group(4)
+                TP_name = tracepoint_match.group(1)
+                TP_info = tracepoint_match.group(2)
+
+
+                def call_set_trace_info(EVENT):
+                    set_trace_info(process_info, EVENT, line)
+
+
+                def direct_reclaim_b():
+                    call_set_trace_info(DIRECT_RECLAIM_BEGIN)
+
+
+                def direct_reclaim_e():
+                    call_set_trace_info(DIRECT_RECLAIM_END)
+
+
+                def shrink_inactive_list():
+                    call_set_trace_info(SHRINK_INACTIVE_LIST)
+
+
+                def shrink_slab_b():
+                    set_begin_info(process_info, SHRINK_SLAB_START, timestamp,
+                                    line)
+
+
+                def shrink_slab_e():
+                    delay_status, begin_data, time_elapsed = find_latency(
+                                                                process_info,
+                                                                SHRINK_SLAB_START,
+                                                                timestamp)
+                    shrinker_match = re.match(shrinker_pattern, TP_info)
+                    if shrinker_match:
+                        shrinker_name = shrinker_match.group(1)
+                        shrinker_latencies[shrinker_name] += time_elapsed
+                        store_max_latency(shrinker_name, time_elapsed)
+
+                    if delay_status:
+                        print_tracepoint(process_info,
+                                         SHRINK_SLAB_START,
+                                         begin_data)
+                        print_tracepoint(process_info,
+                                         None,
+                                         line)
+
+
+                def try_to_compact():
+                    call_set_trace_info(TRY_TO_COMPACT)
+
+
+                def compact_b():
+                    call_set_trace_info(COMPACTION_BEGIN)
+
+
+                def compact_e():
+                    call_set_trace_info(COMPACTION_END)
+
+
+                trace_match = {'mm_vmscan_direct_reclaim_begin' : direct_reclaim_b,
+                               'mm_vmscan_direct_reclaim_end'   : direct_reclaim_e,
+                               'mm_shrink_slab_start'           : shrink_slab_b,
+                               'mm_shrink_slab_end'             : shrink_slab_e,
+                               'mm_vmscan_lru_shrink_inactive'  :
+                                                              shrink_inactive_list,
+                               'mm_compaction_try_to_compact_pages':
+                                                              try_to_compact,
+                               'mm_compaction_begin'            : compact_b,
+                               'mm_compaction_end'              : compact_e}
+
+                if TP_name in trace_match:
+                    trace_match[TP_name]()
+                else:
+                    pass
+
+            else:
+                function_match = re.match(function_end_pattern,
+                                          line_match.group(4))
+                if function_match:
+                    function_name = function_match.group(1)
+
+
+                    def alloc_pages():
+                        print_line(line)
+                        all_information.pop(process_info, None)
+
+
+                    def try_to_free_pages():
+                        print_tracepoint(process_info, DIRECT_RECLAIM_BEGIN, None)
+                        print_tracepoint(process_info, DIRECT_RECLAIM_END, None)
+                        print_line(line)
+
+
+                    def shrink_inactive_list():
+                        print_tracepoint(process_info, SHRINK_INACTIVE_LIST, None)
+                        print_line(line)
+
+
+                    def try_to_compact():
+                        print_tracepoint(process_info, TRY_TO_COMPACT, None)
+                        print_line(line)
+
+
+                    def compact_zone():
+                        print_tracepoint(process_info, COMPACTION_BEGIN, None)
+                        print_tracepoint(process_info, COMPACTION_END, None)
+                        print_line(line)
+
+
+                    f_match = {'__alloc_pages_nodemask' : alloc_pages,
+                               'try_to_free_pages'      : try_to_free_pages,
+                               'shrink_inactive_list'   : shrink_inactive_list,
+                               'try_to_compact'         : try_to_compact,
+                               'compact_zone'           : compact_zone}
+
+                    if function_name in f_match:
+                        f_match[function_name]()
+                    else:
+                        print_line(line)
diff --git a/scripts/tracing/setup_alloc_trace.sh b/scripts/tracing/setup_alloc_trace.sh
new file mode 100755
index 0000000..9a558b0
--- /dev/null
+++ b/scripts/tracing/setup_alloc_trace.sh
@@ -0,0 +1,88 @@
+#! /bin/bash
+
+# This script does all the basic setup necessary for allocation_postprocess.py
+# and then invokes the script. All the setup that is done at the beginning
+# is cleared on exit.
+
+# Usage: # ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s
+# path/to/tracing/directory > path/to/output/file.
+
+while getopts :t:s: name
+do
+	case $name in
+		t)threshold=$OPTARG;;
+		s)trace_dir=$OPTARG;;
+		*) echo "Usage: ./setup_alloc_trace.sh -t THRESHOLD_IN_US -s path/to/tracing/directory"
+		esac
+done
+
+if [[ -z $threshold ]]
+then
+	echo "Must specify threshold."
+	exit 1
+fi
+
+if [[ -z $trace_dir ]]
+then
+	trace_dir="/sys/kernel/debug/tracing"
+fi
+
+pwd=`pwd`
+cd $trace_dir
+echo 0 > tracing_on
+
+echo function_graph > current_tracer
+echo funcgraph-proc > trace_options
+echo funcgraph-abstime > trace_options
+
+# set filter functions
+echo __alloc_pages_nodemask > set_ftrace_filter
+echo try_to_free_pages >> set_ftrace_filter
+echo mem_cgroup_soft_limit_reclaim >> set_ftrace_filter
+echo shrink_node >> set_ftrace_filter
+echo shrink_node_memcg >> set_ftrace_filter
+echo shrink_slab >> set_ftrace_filter
+echo shrink_active_list >> set_ftrace_filter
+echo shrink_inactive_list >> set_ftrace_filter
+echo compact_zone >> set_ftrace_filter
+echo try_to_compact_pages >> set_ftrace_filter
+
+echo $threshold > tracing_thresh
+
+# set tracepoints
+echo 1 > events/vmscan/mm_shrink_slab_start/enable
+echo 1 > events/vmscan/mm_shrink_slab_end/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+echo 1 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+echo 1 > events/compaction/mm_compaction_begin/enable
+echo 1 > events/compaction/mm_compaction_end/enable
+echo 1 > events/compaction/mm_compaction_try_to_compact_pages/enable
+echo 1 > tracing_on
+
+cd $pwd
+
+./allocation_postprocess.py -t $threshold -s $trace_dir
+
+function cleanup {
+	cd $trace_dir
+	echo 0 > tracing_on
+	echo nop > current_tracer
+	echo > set_ftrace_filter
+	echo 0 > tracing_thresh
+
+	echo 0 > events/vmscan/mm_shrink_slab_start/enable
+	echo 0 > events/vmscan/mm_shrink_slab_end/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
+	echo 0 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
+	echo 0 > events/vmscan/mm_vmscan_lru_shrink_inactive/enable
+	echo 0 > events/compaction/mm_compaction_begin/enable
+	echo 0 > events/compaction/mm_compaction_end/enable
+	echo 0 > events/compaction/mm_compaction_try_to_compact_pages/enable
+
+	exit $?
+}
+
+trap cleanup SIGINT
+trap cleanup SIGTERM
+trap cleanup EXIT
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-10-27 16:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-11 22:24 [RFC] scripts: Include postprocessing script for memory allocation tracing Janani Ravichandran
2016-09-11 22:24 ` Janani Ravichandran
2016-09-12 12:16 ` Michal Hocko
2016-09-12 12:16   ` Michal Hocko
2016-09-13 18:04   ` Janani Ravichandran
2016-09-13 18:04     ` Janani Ravichandran
2016-09-19  9:42     ` Michal Hocko
2016-09-19  9:42       ` Michal Hocko
2016-09-22 15:30       ` Janani Ravichandran
2016-09-22 15:30         ` Janani Ravichandran
2016-09-23  8:07         ` Michal Hocko
2016-09-23  8:07           ` Michal Hocko
2016-10-06 12:00           ` Michal Hocko
2016-10-06 12:00             ` Michal Hocko
2016-10-11 14:43           ` Janani Ravichandran
2016-10-11 14:43             ` Janani Ravichandran
2016-10-15 23:31             ` Janani Ravichandran
2016-10-15 23:31               ` Janani Ravichandran
2016-10-16  7:33               ` Michal Hocko
2016-10-16  7:33                 ` Michal Hocko
     [not found]                 ` <CANnt6X=RpSnuxGXZfF6Qa5mJpzC8gL3wkKJi3tQMZJBZJVWF3w@mail.gmail.com>
2016-10-17 17:31                   ` Janani Ravichandran
2016-10-18 13:13                     ` Michal Hocko
2016-10-18 13:13                       ` Michal Hocko
2016-10-20 23:10                       ` Janani Ravichandran
2016-10-20 23:10                         ` Janani Ravichandran
2016-10-21  7:08                         ` Michal Hocko
2016-10-21  7:08                           ` Michal Hocko
2016-10-27 15:42                           ` Janani Ravichandran
2016-10-27 15:42                             ` Janani Ravichandran
  -- strict thread matches above, loose matches on Subject: below --
2016-08-19 11:38 Janani Ravichandran
2016-08-19 11:38 ` Janani Ravichandran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.