All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] buildstats.py: enable collection of /proc/pressure data
@ 2022-06-21 16:14 Aryaman Gupta
  2022-06-21 16:14 ` [PATCH v2 2/3] pybootchartgui: render cpu and io pressure Aryaman Gupta
  2022-06-21 16:14 ` [PATCH v2 3/3] buildstats.bbclass: correct sampling of system stats Aryaman Gupta
  0 siblings, 2 replies; 3+ messages in thread
From: Aryaman Gupta @ 2022-06-21 16:14 UTC (permalink / raw)
  To: openembedded-core

The Linux pressure monitoring system helps determine when system resources
are being overutilized by measuring how contended the CPU, IO and memory are.
This information can be found under /proc/pressure/ which contains 3 files -
cpu, memory and io. In each of the files, the format is as follows:
	some avg10=70.24 avg60=68.52 avg300=69.91 total=3559632828
	full avg10=57.59 avg60=58.06 avg300=60.38 total=3300487258

The "some" state of a given resource represents when one or more tasks are delayed
on that resource whereas the "full" state represents when all the tasks are
delayed. Currently, we only collect data from the "some" state but the
"full" data can simply be appended to the log files if neccessary.
The "avg10", "avg60" and "avg300" fields represent the average percentage
of time runnable tasks were delayed in the last 10, 60 or 300 seconds
respectively. The "total" field represents the total time, in microseconds,
that some runnable task was delayed on a resource.
More information can be found at:
	https://www.kernel.org/doc/html/latest/accounting/psi.html
and in the source code under kernel/sched/psi.c

This commit adds functionality to collect and log the "some" CPU, memory and IO
pressure. The "avg10", "avg60" and "avg300" fields are logged without change.
In place of the "total" field, the difference between the current "total" and
the previous sample's "total" is logged, allowing the measurement of pressure
in between each polling interval, as was done for /proc/stat data. The log files
are stored in:
	<build_name>/tmp/buildstats/<build_time>/reduced_proc_pressure/{cpu,io,memory}.log
mirroring the directory structure of /proc/pressure. If the /proc/pressure
directory does not exist or the resource files can't be read/opened, the
reduced_proc_pressure directory is not created.

Signed-off-by: Aryaman Gupta <aryaman.gupta@windriver.com>
Signed-off-by: Randy MacLeod <randy.macleod@windriver.com>
---
 meta/lib/buildstats.py | 57 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 47 insertions(+), 10 deletions(-)

diff --git a/meta/lib/buildstats.py b/meta/lib/buildstats.py
index c52b6c3b72..64ad3ef40e 100644
--- a/meta/lib/buildstats.py
+++ b/meta/lib/buildstats.py
@@ -14,13 +14,27 @@ class SystemStats:
         bn = d.getVar('BUILDNAME')
         bsdir = os.path.join(d.getVar('BUILDSTATS_BASE'), bn)
         bb.utils.mkdirhier(bsdir)
+        file_handlers =  [('diskstats', self._reduce_diskstats),
+                            ('meminfo', self._reduce_meminfo),
+                            ('stat', self._reduce_stat)]
+
+        # Some hosts like openSUSE have readable /proc/pressure files
+        # but throw errors when these files are opened. Catch these error
+        # and ensure that the reduce_proc_pressure directory is not created.
+        if os.path.exists("/proc/pressure"):
+            try:
+                source = open('/proc/pressure/cpu', 'rb')
+                source.read()
+                pressuredir = os.path.join(bsdir, 'reduced_proc_pressure')
+                bb.utils.mkdirhier(pressuredir)
+                file_handlers.extend([('pressure/cpu', self._reduce_pressure),
+                                     ('pressure/io', self._reduce_pressure),
+                                     ('pressure/memory', self._reduce_pressure)])
+            except Exception:
+                pass
 
         self.proc_files = []
-        for filename, handler in (
-                ('diskstats', self._reduce_diskstats),
-                ('meminfo', self._reduce_meminfo),
-                ('stat', self._reduce_stat),
-        ):
+        for filename, handler in (file_handlers):
             # The corresponding /proc files might not exist on the host.
             # For example, /proc/diskstats is not available in virtualized
             # environments like Linux-VServer. Silently skip collecting
@@ -48,13 +62,15 @@ class SystemStats:
         self.diskstats_ltime = None
         self.diskstats_data = None
         self.stat_ltimes = None
+        # Last time we sampled /proc/pressure. All resources stored in a single dict with the key as filename
+        self.last_pressure = {"pressure/cpu": None, "pressure/io": None, "pressure/memory": None}
 
     def close(self):
         self.monitor_disk.close()
         for _, output, _ in self.proc_files:
             output.close()
 
-    def _reduce_meminfo(self, time, data):
+    def _reduce_meminfo(self, time, data, filename):
         """
         Extracts 'MemTotal', 'MemFree', 'Buffers', 'Cached', 'SwapTotal', 'SwapFree'
         and writes their values into a single line, in that order.
@@ -75,7 +91,7 @@ class SystemStats:
         disk = linetokens[2]
         return self.diskstats_regex.match(disk)
 
-    def _reduce_diskstats(self, time, data):
+    def _reduce_diskstats(self, time, data, filename):
         relevant_tokens = filter(self._diskstats_is_relevant_line, map(lambda x: x.split(), data.split(b'\n')))
         diskdata = [0] * 3
         reduced = None
@@ -104,10 +120,10 @@ class SystemStats:
         return reduced
 
 
-    def _reduce_nop(self, time, data):
+    def _reduce_nop(self, time, data, filename):
         return (time, data)
 
-    def _reduce_stat(self, time, data):
+    def _reduce_stat(self, time, data, filename):
         if not data:
             return None
         # CPU times {user, nice, system, idle, io_wait, irq, softirq} from first line
@@ -126,6 +142,27 @@ class SystemStats:
         self.stat_ltimes = times
         return reduced
 
+    def _reduce_pressure(self, time, data, filename):
+        """
+        Return reduced pressure: {avg10, avg60, avg300} and delta total compared to the previous sample
+        for the cpu, io and memory resources. A common function is used for all 3 resources since the
+        format of the /proc/pressure file is the same in each case.
+        """
+        if not data:
+            return None
+        tokens = data.split(b'\n', 1)[0].split()
+        avg10 = float(tokens[1].split(b'=')[1])
+        avg60 = float(tokens[2].split(b'=')[1])
+        avg300 = float(tokens[3].split(b'=')[1])
+        total = int(tokens[4].split(b'=')[1])
+
+        reduced = None
+        if self.last_pressure[filename]:
+            delta = total - self.last_pressure[filename]
+            reduced = (time, (avg10, avg60, avg300, delta))
+        self.last_pressure[filename] = total
+        return reduced
+
     def sample(self, event, force):
         now = time.time()
         if (now - self.last_proc > self.min_seconds) or force:
@@ -133,7 +170,7 @@ class SystemStats:
                 with open(os.path.join('/proc', filename), 'rb') as input:
                     data = input.read()
                     if handler:
-                        reduced = handler(now, data)
+                        reduced = handler(now, data, filename)
                     else:
                         reduced = (now, data)
                     if reduced:
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 2/3] pybootchartgui: render cpu and io pressure
  2022-06-21 16:14 [PATCH v2 1/3] buildstats.py: enable collection of /proc/pressure data Aryaman Gupta
@ 2022-06-21 16:14 ` Aryaman Gupta
  2022-06-21 16:14 ` [PATCH v2 3/3] buildstats.bbclass: correct sampling of system stats Aryaman Gupta
  1 sibling, 0 replies; 3+ messages in thread
From: Aryaman Gupta @ 2022-06-21 16:14 UTC (permalink / raw)
  To: openembedded-core

Add two new, separate charts showing the avg10 and delta
total pressure over time for the CPU and IO resources. The height of
the avg10 data in each chart represents the percentage of time "some"
task was delayed over the specific resource during the last 10
seconds of the build. The height of the delta total data in each chart
represents the total time "some" task was delayed since the last sample
was collected. If the reduced_proc_pressure data is not present in the
buildstats log, then the new charts are not shown at all rather than
being present but unpopulated.

Note that the delta total graphs may appear "spikey",
oscillating from high values to low. This behaviour is fixed in a
subsequent commit.

Signed-off-by: Aryaman Gupta <aryaman.gupta@windriver.com>
Signed-off-by: Randy MacLeod <randy.macleod@windriver.com>
---
 scripts/pybootchartgui/pybootchartgui/draw.py | 77 +++++++++++++++++++
 .../pybootchartgui/pybootchartgui/parsing.py  | 28 +++++++
 .../pybootchartgui/pybootchartgui/samples.py  | 17 ++++
 3 files changed, 122 insertions(+)

diff --git a/scripts/pybootchartgui/pybootchartgui/draw.py b/scripts/pybootchartgui/pybootchartgui/draw.py
index fc708b55c3..a13df3a3fa 100644
--- a/scripts/pybootchartgui/pybootchartgui/draw.py
+++ b/scripts/pybootchartgui/pybootchartgui/draw.py
@@ -80,6 +80,18 @@ MEM_BUFFERS_COLOR = (0.4, 0.4, 0.4, 0.3)
 # Swap color
 MEM_SWAP_COLOR = DISK_TPUT_COLOR
 
+# avg10 CPU pressure color
+CPU_PRESSURE_AVG10_COLOR = (0.0, 0.0, 0.0, 1.0)
+# delta total CPU pressure color
+CPU_PRESSURE_TOTAL_COLOR = CPU_COLOR
+# avg10 IO pressure color
+IO_PRESSURE_AVG10_COLOR = (0.0, 0.0, 0.0, 1.0)
+# delta total IO pressure color
+IO_PRESSURE_TOTAL_COLOR = IO_COLOR
+
+
+
+
 # Process border color.
 PROC_BORDER_COLOR = (0.71, 0.71, 0.71, 1.0)
 # Waiting process color.
@@ -415,6 +427,71 @@ def render_charts(ctx, options, clip, trace, curr_y, w, h, sec_w):
 
         curr_y = curr_y + 30 + bar_h
 
+    # render CPU pressure chart
+    if trace.cpu_pressure:
+        draw_legend_line(ctx, "avg10 CPU Pressure", CPU_PRESSURE_AVG10_COLOR, off_x, curr_y+20, leg_s)
+        draw_legend_box(ctx, "delta total CPU Pressure", CPU_PRESSURE_TOTAL_COLOR, off_x + 140, curr_y+20, leg_s)
+
+        # render delta total cpu
+        chart_rect = (off_x, curr_y+30, w, bar_h)
+        if clip_visible (clip, chart_rect):
+            draw_box_ticks (ctx, chart_rect, sec_w)
+            draw_annotations (ctx, proc_tree, trace.times, chart_rect)
+            draw_chart (ctx, CPU_PRESSURE_TOTAL_COLOR, True, chart_rect, \
+                    [(sample.time, sample.deltaTotal) for sample in trace.cpu_pressure], \
+                    proc_tree, None)
+
+        # render avg10 cpu
+        max_sample = max (trace.cpu_pressure, key = lambda s: s.avg10)
+        if clip_visible (clip, chart_rect):
+            draw_chart (ctx, CPU_PRESSURE_AVG10_COLOR, False, chart_rect, \
+                    [(sample.time, sample.avg10) for sample in trace.cpu_pressure], \
+                    proc_tree, None)
+
+        pos_x = off_x + ((max_sample.time - proc_tree.start_time) * w / proc_tree.duration)
+
+        shift_x, shift_y = -20, 20
+        if (pos_x < off_x + 245):
+            shift_x, shift_y = 5, 40
+
+
+        label = "%d%%" % (max_sample.avg10)
+        draw_text (ctx, label, CPU_PRESSURE_AVG10_COLOR, pos_x + shift_x, curr_y + shift_y)
+
+        curr_y = curr_y + 30 + bar_h
+
+    # render delta total io
+    if trace.io_pressure:
+        draw_legend_line(ctx, "avg10 I/O Pressure", IO_PRESSURE_AVG10_COLOR, off_x, curr_y+20, leg_s)
+        draw_legend_box(ctx, "delta total I/O Pressure", IO_PRESSURE_TOTAL_COLOR, off_x + 140, curr_y+20, leg_s)
+
+        # render avg10 io
+        chart_rect = (off_x, curr_y+30, w, bar_h)
+        if clip_visible (clip, chart_rect):
+            draw_box_ticks (ctx, chart_rect, sec_w)
+            draw_annotations (ctx, proc_tree, trace.times, chart_rect)
+            draw_chart (ctx, IO_PRESSURE_TOTAL_COLOR, True, chart_rect, \
+                    [(sample.time, sample.deltaTotal) for sample in trace.io_pressure], \
+                    proc_tree, None)
+
+        # render io pressure
+        max_sample = max (trace.io_pressure, key = lambda s: s.avg10)
+        if clip_visible (clip, chart_rect):
+            draw_chart (ctx, IO_PRESSURE_AVG10_COLOR, False, chart_rect, \
+                    [(sample.time, sample.avg10) for sample in trace.io_pressure], \
+                    proc_tree, None)
+
+        pos_x = off_x + ((max_sample.time - proc_tree.start_time) * w / proc_tree.duration)
+
+        shift_x, shift_y = -20, 20
+        if (pos_x < off_x + 245):
+            shift_x, shift_y = 5, 40
+
+        label = "%d%%" % (max_sample.avg10)
+        draw_text (ctx, label, IO_PRESSURE_AVG10_COLOR, pos_x + shift_x, curr_y + shift_y)
+
+        curr_y = curr_y + 30 + bar_h
+
     # render disk space usage
     #
     # Draws the amount of disk space used on each volume relative to the
diff --git a/scripts/pybootchartgui/pybootchartgui/parsing.py b/scripts/pybootchartgui/pybootchartgui/parsing.py
index b42dac6b88..004d6fb953 100644
--- a/scripts/pybootchartgui/pybootchartgui/parsing.py
+++ b/scripts/pybootchartgui/pybootchartgui/parsing.py
@@ -49,6 +49,8 @@ class Trace:
         self.parent_map = None
         self.mem_stats = []
         self.monitor_disk = None
+        self.cpu_pressure = []
+        self.io_pressure = []
         self.times = [] # Always empty, but expected by draw.py when drawing system charts.
 
         if len(paths):
@@ -554,6 +556,27 @@ def _parse_monitor_disk_log(file):
 
     return disk_stats
 
+def _parse_pressure_logs(file, filename):
+    """
+    Parse file for "some" pressure with 'avg10', 'avg60' 'avg300' and delta total values
+    (in that order) directly stored on one line for both CPU and IO, based on filename.
+    """
+    pressure_stats = []
+    if filename == "cpu.log":
+        SamplingClass = CPUPressureSample
+    else:
+        SamplingClass = IOPressureSample
+    for time, lines in _parse_timed_blocks(file):
+        for line in lines:
+            if not line: continue
+            tokens = line.split()
+            avg10 = float(tokens[0])
+            avg60 = float(tokens[1])
+            avg300 = float(tokens[2])
+            delta = float(tokens[3])
+            pressure_stats.append(SamplingClass(time, avg10, avg60, avg300, delta))
+
+    return pressure_stats
 
 # if we boot the kernel with: initcall_debug printk.time=1 we can
 # get all manner of interesting data from the dmesg output
@@ -741,6 +764,11 @@ def _do_parse(writer, state, filename, file):
         state.cmdline = _parse_cmdline_log(writer, file)
     elif name == "monitor_disk.log":
         state.monitor_disk = _parse_monitor_disk_log(file)
+    #pressure logs are in a subdirectory
+    elif name == "cpu.log":
+        state.cpu_pressure = _parse_pressure_logs(file, name)
+    elif name == "io.log":
+        state.io_pressure = _parse_pressure_logs(file, name)
     elif not filename.endswith('.log'):
         _parse_bitbake_buildstats(writer, state, filename, file)
     t2 = time.process_time()
diff --git a/scripts/pybootchartgui/pybootchartgui/samples.py b/scripts/pybootchartgui/pybootchartgui/samples.py
index 9fc309b3ab..472dc27be0 100644
--- a/scripts/pybootchartgui/pybootchartgui/samples.py
+++ b/scripts/pybootchartgui/pybootchartgui/samples.py
@@ -37,6 +37,23 @@ class CPUSample:
         return str(self.time) + "\t" + str(self.user) + "\t" + \
                str(self.sys) + "\t" + str(self.io) + "\t" + str (self.swap)
 
+class CPUPressureSample:
+    def __init__(self, time, avg10, avg60, avg300, deltaTotal):
+        self.time = time
+        self.avg10 = avg10
+        self.avg60 = avg60
+        self.avg300 = avg300
+        self.deltaTotal = deltaTotal
+
+class IOPressureSample:
+    def __init__(self, time, avg10, avg60, avg300, deltaTotal):
+        self.time = time
+        self.avg10 = avg10
+        self.avg60 = avg60
+        self.avg300 = avg300
+        self.deltaTotal = deltaTotal
+
+
 class MemSample:
     used_values = ('MemTotal', 'MemFree', 'Buffers', 'Cached', 'SwapTotal', 'SwapFree',)
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 3/3] buildstats.bbclass: correct sampling of system stats
  2022-06-21 16:14 [PATCH v2 1/3] buildstats.py: enable collection of /proc/pressure data Aryaman Gupta
  2022-06-21 16:14 ` [PATCH v2 2/3] pybootchartgui: render cpu and io pressure Aryaman Gupta
@ 2022-06-21 16:14 ` Aryaman Gupta
  1 sibling, 0 replies; 3+ messages in thread
From: Aryaman Gupta @ 2022-06-21 16:14 UTC (permalink / raw)
  To: openembedded-core

The last time of sampling would be updated within the SystemStats class
but not re-recorded into the datastore, leading to multiple samples being
collected in the same second in the sample function of buildstats.py due
to the runQueueTaskStarted events. Fix this to collect and store only one
sample per second, as originally intended, by only recording samples
on instances of the Heartbeat and BuildCompleted events.

This fix elimates the spikiness of sampled data, in cases where the difference
between the current and the last sample is taken. Previously, since many
samples per second were recorded, certain types of data would result in a
very small elapsed time and hence a small numerical difference. For example,
the CPU usage from /proc/stat is a running total of usage and taking the
difference between data collected 0.1 seconds apart would result in usage
appearing lower than it actually was.

Signed-off-by: Aryaman Gupta <aryaman.gupta@windriver.com>
Signed-off-by: Randy MacLeod <randy.macleod@windriver.com>
---
 meta/classes/buildstats.bbclass | 3 ++-
 meta/lib/buildstats.py          | 7 ++++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/meta/classes/buildstats.bbclass b/meta/classes/buildstats.bbclass
index 0de605200a..9e25a42a5c 100644
--- a/meta/classes/buildstats.bbclass
+++ b/meta/classes/buildstats.bbclass
@@ -282,10 +282,11 @@ python runqueue_stats () {
     if not system_stats and isinstance(e, (bb.runqueue.sceneQueueTaskStarted, bb.runqueue.runQueueTaskStarted)):
         system_stats = buildstats.SystemStats(d)
         d.setVar('_buildstats_system_stats', system_stats)
-    if system_stats:
+    if system_stats and isinstance(e, (bb.event.HeartbeatEvent, bb.event.BuildCompleted)):
         # Ensure that we sample at important events.
         done = isinstance(e, bb.event.BuildCompleted)
         system_stats.sample(e, force=done)
+        d.setVar('_buildstats_system_stats', system_stats)
         if done:
             system_stats.close()
             d.delVar('_buildstats_system_stats')
diff --git a/meta/lib/buildstats.py b/meta/lib/buildstats.py
index 64ad3ef40e..9829a0ff65 100644
--- a/meta/lib/buildstats.py
+++ b/meta/lib/buildstats.py
@@ -55,7 +55,12 @@ class SystemStats:
         # becames relevant when we get called very often while many
         # short tasks get started. Sampling during quiet periods
         # depends on the heartbeat event, which fires less often.
-        self.min_seconds = 1
+        # The Heartbeat events occur roughly once every second but the actual time
+        # between these events deviates by a few milliseconds, in most cases. Hence
+        # pick a somewhat arbitary threshold such that we sample a large majority
+        # of the Heartbeat events. This ignores rare events that fall outside the minimum
+        # but allows for fairly consistent intervals between samples.
+        self.min_seconds = 0.990
 
         self.meminfo_regex = re.compile(rb'^(MemTotal|MemFree|Buffers|Cached|SwapTotal|SwapFree):\s*(\d+)')
         self.diskstats_regex = re.compile(rb'^([hsv]d.|mtdblock\d|mmcblk\d|cciss/c\d+d\d+.*)$')
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-21 16:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-21 16:14 [PATCH v2 1/3] buildstats.py: enable collection of /proc/pressure data Aryaman Gupta
2022-06-21 16:14 ` [PATCH v2 2/3] pybootchartgui: render cpu and io pressure Aryaman Gupta
2022-06-21 16:14 ` [PATCH v2 3/3] buildstats.bbclass: correct sampling of system stats Aryaman Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.