All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCHv5 0/4] Generate package size statistics
@ 2015-09-02 21:15 Thomas Petazzoni
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-02 21:15 UTC (permalink / raw)
  To: buildroot

Hello,

Here is the fifth iteration of the patches adding a 'size-stats'
feature, which mainly allows to generate a pie chart of the size
contribution of each package to the overall root filesystem size.

This code is also available at:

  http://git.free-electrons.com/users/thomas-petazzoni/buildroot/log/?h=size-stats-v5

Changes since v4:

 - Rebased on top of the latest master.

 - Now that the skeleton logic has been moved to a regular Buildroot
   package, remove all the skeleton specific handling in the
   size-stats script, as well as the passing of the --skeleton-path
   from the main Makefile to the size-stats script.

 - Added Reviewed-by/Tested-by from Ryan Barnett on "pkg-generic: add
   step_pkg_size global instrumentation hook".

 - Rework the commit log of "pkg-generic: add step_pkg_size global
   instrumentation hook" to not mention the Config.in option that no
   longer exists, and instead explain why we are doing the data
   collection unconditionally. Reported by Romain Naour.

 - Display the total filesystem on the generated graph. Suggested by
   Ryan Barnett.

 - Call matplotlib.use('Agg') to make sure matplotlib doesn't try to
   connect to a X server. Suggested by Claython Shotwell.

 - Added Reviewed-by/Tested-by from Ryan Barnett on "Makefile:
   implement a size-stats target".

 - Minor tweaks in the manual, suggested by Ryan Barnett. Also added
   Reviewed-by/Tested-by from Ryan on the manual patch.

Changes since v3:

 - Remove the BR2_COLLECT_FILE_SIZE_STATS option, and collect the
   necessary data unconditionally. This was suggested by Yann
   E. Morin, who did some testing of the performance impact, and found
   that it was so insignificant that it is negligible. See Yann's
   analysis at
   http://lists.busybox.net/pipermail/buildroot/2015-February/119431.html.

 - Store the temporary files listing the files before/after inside
   each package build directory, rather than in $(BUILD_DIR)
   directly. This was suggested by Yann E. Morin.

   They are now named .br_filelist_before and .br_filelist_after. They
   are also no longer removed after the step_pkg_size_end hook, as
   they don't consume that much space and could be useful for
   analysis. If anyone finds that annoying, I'll be happy to change
   this back to remove them.

 - As suggested by Arnout, make a number of improvements to the
   size-stats Python script:

   * Use collections.defaultdict()

   * Use a dict of tuples (package name, file size) for the filesdict,
     instead of a dict of dict containing just two entries.

   * Factorize code between build_skeleton_dict() and
     build_package_dict() into a add_file() function.

   * Use [2:] instead of .replace("./", "") to remove the leading "./"
     in relative file paths returned by find.

   * Many other small things: one digit for percentages, use sum()
     when possible, rename pkgdict to filesdict where it makes sense,
     etc.

 - In the main Makefile:

   * Use mkdir -p instead of install -d, as suggested by Arnout.

   * Use $(Q) instead of @, as suggested by Arnout

   * Do not cd into $(CONFIG_DIR), as it would break out of tree
     build, since the path to the skeleton is passed as a relative
     path.

 - Add a new patch updating the manual with a new section describing
   this feature.

Changes since v2:

 - Use print0 in find and -0 in xargs to support spaces in filenames,
   just in case. Suggested by J?r?me Pouiller.

 - Support custom skeleton locations, instead of assuming the skeleton
   comes from system/skeleton.

 - Fix the size-stats target to actually call the size-stats script,
   and not graph-size, which was the old name of the script in
   previous versions of the series.

 - Fix usage of the size-stats target in out of tree build
   situations. For consistency reasons, we chose to mimic what
   graph-depends already does, even if size-stats does not necessarily
   need to be executed from within CONFIG_DIR.

 - Add one patch to make sure that 'make clean' removes the graphs/
   subdirectory in the output directory, which wasn't done today. This
   issue isn't specific to size-stats, and already existed with
   graph-build and graph-depends.

Changes since v1:

 - Improve the logic to properly take into account packages that
   overwrite files installed by other packages.

 - Make the new pkg_step hook optional.

 - Extend the facility to not only generate a graph, but also CSV
   files. As a consequence, everything was renamed from 'graph-size'
   to 'size-stats'.

Best regards,

Thomas

Thomas Petazzoni (4):
  pkg-generic: add step_pkg_size global instrumentation hook
  support/scripts: add size-stats script
  Makefile: implement a size-stats target
  docs/manual: add section about size graphing

 Makefile                     |   8 ++
 docs/manual/common-usage.txt |  40 ++++++++
 package/pkg-generic.mk       |  32 +++++++
 support/scripts/size-stats   | 217 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 297 insertions(+)
 create mode 100755 support/scripts/size-stats

-- 
2.5.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook
  2015-09-02 21:15 [Buildroot] [PATCHv5 0/4] Generate package size statistics Thomas Petazzoni
@ 2015-09-02 21:15 ` Thomas Petazzoni
  2015-09-03 10:25   ` Yann E. MORIN
  2015-09-09 13:11   ` Vicente Olivert Riera
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 2/4] support/scripts: add size-stats script Thomas Petazzoni
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-02 21:15 UTC (permalink / raw)
  To: buildroot

This patch adds a global instrumentation hook that collects the list
of files installed in $(TARGET_DIR) by each package, and stores this
list into a file called $(BUILD_DIR)/packages-file-list.txt. It can
later be used to determine the size contribution of each package to
the target root filesystem.

Note that in order to detect if a file installed by one package is
later overriden by another package, we calculate the md5 of installed
files and compare them at each installation of a new package.

Collecting the list of files installed by each package is done
unconditionally, as tests have shown that the performance impact of
doing this is negligible.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Ryan Barnett <ryanbarnett3@gmail.com>
Tested-by: Ryan Barnett <ryanbarnett3@gmail.com>
---
 package/pkg-generic.mk | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 6a7d97e..2c8fe29 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -55,6 +55,38 @@ define step_time
 endef
 GLOBAL_INSTRUMENTATION_HOOKS += step_time
 
+# Hooks to collect statistics about installed files
+
+# This hook will be called before the target installation of a
+# package. We store in a file named .br_filelist_before the list of
+# files currently installed in the target. Note that the MD5 is also
+# stored, in order to identify if the files are overwritten.
+define step_pkg_size_start
+	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
+		$($(PKG)_DIR)/.br_filelist_before
+endef
+
+# This hook will be called after the target installation of a
+# package. We store in a file named .br_filelist_after the list of
+# files (and their MD5) currently installed in the target. We then do
+# a diff with the .br_filelist_before to compute the list of files
+# installed by this package.
+define step_pkg_size_end
+	(cd $(TARGET_DIR); find . -type f -print0 | xargs -0 md5sum) | sort > \
+		$($(PKG)_DIR)/.br_filelist_after
+	comm -13 $($(PKG)_DIR)/.br_filelist_before $($(PKG)_DIR)/.br_filelist_after | \
+		while read hash file ; do \
+			echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \
+		done
+endef
+
+define step_pkg_size
+	$(if $(filter install-target,$(2)),\
+		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
+		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
+endef
+GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
+
 # User-supplied script
 ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),)
 define step_user
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 2/4] support/scripts: add size-stats script
  2015-09-02 21:15 [Buildroot] [PATCHv5 0/4] Generate package size statistics Thomas Petazzoni
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
@ 2015-09-02 21:15 ` Thomas Petazzoni
  2015-09-03 10:27   ` Yann E. MORIN
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target Thomas Petazzoni
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing Thomas Petazzoni
  3 siblings, 1 reply; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-02 21:15 UTC (permalink / raw)
  To: buildroot

This new script uses the data collected by the step_pkg_size
instrumentation hook to generate a pie chart of the size contribution
of each package to the target root filesystem, and two CSV files with
statistics about the package size and file size. To achieve this, it
looks at each file in $(TARGET_DIR), and using the
packages-file-list.txt information collected by the step_pkg_size
hook, it determines to which package the file belongs. It is therefore
able to give the size installed by each package.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 support/scripts/size-stats | 217 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 217 insertions(+)
 create mode 100755 support/scripts/size-stats

diff --git a/support/scripts/size-stats b/support/scripts/size-stats
new file mode 100755
index 0000000..54685f6
--- /dev/null
+++ b/support/scripts/size-stats
@@ -0,0 +1,217 @@
+#!/usr/bin/env python
+
+# Copyright (C) 2014 by Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+import sys
+import os
+import os.path
+import argparse
+import csv
+import collections
+
+try:
+    import matplotlib
+    matplotlib.use('Agg')
+    import matplotlib.font_manager as fm
+    import matplotlib.pyplot as plt
+except ImportError:
+    sys.stderr.write("You need python-matplotlib to generate the size graph\n")
+    exit(1)
+
+colors = ['#e60004', '#009836', '#2e1d86', '#ffed00',
+          '#0068b5', '#f28e00', '#940084', '#97c000']
+
+#
+# This function adds a new file to 'filesdict', after checking its
+# size. The 'filesdict' contain the relative path of the file as the
+# key, and as the value a tuple containing the name of the package to
+# which the file belongs and the size of the file.
+#
+# filesdict: the dict to which  the file is added
+# relpath: relative path of the file
+# fullpath: absolute path to the file
+# pkg: package to which the file belongs
+#
+def add_file(filesdict, relpath, abspath, pkg):
+    if not os.path.exists(abspath):
+        return
+    if os.path.islink(abspath):
+        return
+    sz = os.stat(abspath).st_size
+    filesdict[relpath] = (pkg, sz)
+
+#
+# This function returns a dict where each key is the path of a file in
+# the root filesystem, and the value is a tuple containing two
+# elements: the name of the package to which this file belongs and the
+# size of the file.
+#
+# builddir: path to the Buildroot output directory
+#
+def build_package_dict(builddir):
+    filesdict = {}
+    with open(os.path.join(builddir, "build", "packages-file-list.txt")) as filelistf:
+        for l in filelistf.readlines():
+            pkg, fpath = l.split(",")
+            # remove the initial './' in each file path
+            fpath = fpath.strip()[2:]
+            fullpath = os.path.join(builddir, "target", fpath)
+            add_file(filesdict, fpath, fullpath, pkg)
+    return filesdict
+
+#
+# This function builds a dictionary that contains the name of a
+# package as key, and the size of the files installed by this package
+# as the value.
+#
+# filesdict: dictionary with the name of the files as key, and as
+# value a tuple containing the name of the package to which the files
+# belongs, and the size of the file. As returned by
+# build_package_dict.
+#
+# builddir: path to the Buildroot output directory
+#
+def build_package_size(filesdict, builddir):
+    pkgsize = collections.defaultdict(int)
+
+    for root, _, files in os.walk(os.path.join(builddir, "target")):
+        for f in files:
+            fpath = os.path.join(root, f)
+            if os.path.islink(fpath):
+                continue
+            frelpath = os.path.relpath(fpath, os.path.join(builddir, "target"))
+            if not frelpath in filesdict:
+                print("WARNING: %s is not part of any package" % frelpath)
+                pkg = "unknown"
+            else:
+                pkg = filesdict[frelpath][0]
+
+            pkgsize[pkg] += os.path.getsize(fpath)
+
+    return pkgsize
+
+#
+# Given a dict returned by build_package_size(), this function
+# generates a pie chart of the size installed by each package.
+#
+# pkgsize: dictionary with the name of the package as a key, and the
+# size as the value, as returned by build_package_size.
+#
+# outputf: output file for the graph
+#
+def draw_graph(pkgsize, outputf):
+    total = sum(pkgsize.values())
+    labels = []
+    values = []
+    other_value = 0
+    for (p, sz) in pkgsize.items():
+        if sz < (total * 0.01):
+            other_value += sz
+        else:
+            labels.append("%s (%d kB)" % (p, sz / 1000.))
+            values.append(sz)
+    labels.append("Other (%d kB)" % (other_value / 1000.))
+    values.append(other_value)
+
+    plt.figure()
+    patches, texts, autotexts = plt.pie(values, labels=labels,
+                                        autopct='%1.1f%%', shadow=True,
+                                        colors=colors)
+    # Reduce text size
+    proptease = fm.FontProperties()
+    proptease.set_size('xx-small')
+    plt.setp(autotexts, fontproperties=proptease)
+    plt.setp(texts, fontproperties=proptease)
+
+    plt.suptitle("Filesystem size per package", fontsize=18, y=.97)
+    plt.title("Total filesystem size: %d kB" % (total / 1000.), fontsize=10, y=.96)
+    plt.savefig(outputf)
+
+#
+# Generate a CSV file with statistics about the size of each file, its
+# size contribution to the package and to the overall system.
+#
+# filesdict: dictionary with the name of the files as key, and as
+# value a tuple containing the name of the package to which the files
+# belongs, and the size of the file. As returned by
+# build_package_dict.
+#
+# pkgsize: dictionary with the name of the package as a key, and the
+# size as the value, as returned by build_package_size.
+#
+# outputf: output CSV file
+#
+def gen_files_csv(filesdict, pkgsizes, outputf):
+    total = 0
+    for (p, sz) in pkgsizes.items():
+        total += sz
+    with open(outputf, 'w') as csvfile:
+        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
+        wr.writerow(["File name",
+                     "Package name",
+                     "File size",
+                     "Package size",
+                     "File size in package (%)",
+                     "File size in system (%)"])
+        for f, (pkgname, filesize) in filesdict.items():
+            pkgsize = pkgsizes[pkgname]
+            wr.writerow([f, pkgname, filesize, pkgsize,
+                         "%.1f" % (float(filesize) / pkgsize * 100),
+                         "%.1f" % (float(filesize) / total * 100)])
+
+
+#
+# Generate a CSV file with statistics about the size of each package,
+# and their size contribution to the overall system.
+#
+# pkgsize: dictionary with the name of the package as a key, and the
+# size as the value, as returned by build_package_size.
+#
+# outputf: output CSV file
+#
+def gen_packages_csv(pkgsizes, outputf):
+    total = sum(pkgsizes.values())
+    with open(outputf, 'w') as csvfile:
+        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
+        wr.writerow(["Package name", "Package size", "Package size in system (%)"])
+        for (pkg, size) in pkgsizes.items():
+            wr.writerow([pkg, size, "%.1f" % (float(size) / total * 100)])
+
+parser = argparse.ArgumentParser(description='Draw build time graphs')
+
+parser.add_argument("--builddir", '-i', metavar="BUILDDIR", required=True,
+                    help="Buildroot output directory")
+parser.add_argument("--graph", '-g', metavar="GRAPH",
+                    help="Graph output file (.pdf or .png extension)")
+parser.add_argument("--file-size-csv", '-f', metavar="FILE_SIZE_CSV",
+                    help="CSV output file with file size statistics")
+parser.add_argument("--package-size-csv", '-p', metavar="PKG_SIZE_CSV",
+                    help="CSV output file with package size statistics")
+args = parser.parse_args()
+
+# Find out which package installed what files
+pkgdict = build_package_dict(args.builddir)
+
+# Collect the size installed by each package
+pkgsize = build_package_size(pkgdict, args.builddir)
+
+if args.graph:
+    draw_graph(pkgsize, args.graph)
+if args.file_size_csv:
+    gen_files_csv(pkgdict, pkgsize, args.file_size_csv)
+if args.package_size_csv:
+    gen_packages_csv(pkgsize, args.package_size_csv)
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target
  2015-09-02 21:15 [Buildroot] [PATCHv5 0/4] Generate package size statistics Thomas Petazzoni
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 2/4] support/scripts: add size-stats script Thomas Petazzoni
@ 2015-09-02 21:15 ` Thomas Petazzoni
  2015-09-03 10:29   ` Yann E. MORIN
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing Thomas Petazzoni
  3 siblings, 1 reply; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-02 21:15 UTC (permalink / raw)
  To: buildroot

This commit implements a size-stats target that calls the script of
the same name to generate the graph and CSV files related to package
and file sizes.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Ryan Barnett <ryanbarnett3@gmail.com>
Tested-by: Ryan Barnett <ryanbarnett3@gmail.com>
---
 Makefile | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 9a6e9c0..4a33495 100644
--- a/Makefile
+++ b/Makefile
@@ -685,6 +685,13 @@ graph-depends: graph-depends-requirements
 	|tee $(GRAPHS_DIR)/$(@).dot \
 	|dot $(BR2_GRAPH_DOT_OPTS) -T$(BR_GRAPH_OUT) -o $(GRAPHS_DIR)/$(@).$(BR_GRAPH_OUT)
 
+size-stats:
+	$(Q)mkdir -p $(GRAPHS_DIR)
+	$(Q)$(TOPDIR)/support/scripts/size-stats --builddir $(BASE_DIR) \
+		--graph $(BASE_DIR)/graphs/graph-size.$(BR_GRAPH_OUT) \
+		--file-size-csv $(BASE_DIR)/build/file-size-stats.csv \
+		--package-size-csv $(BASE_DIR)/build/package-size-stats.csv
+
 else # ifeq ($(BR2_HAVE_DOT_CONFIG),y)
 
 all: menuconfig
@@ -903,6 +910,7 @@ endif
 	@echo '  manual-epub            - build manual in ePub'
 	@echo '  graph-build            - generate graphs of the build times'
 	@echo '  graph-depends          - generate graph of the dependency tree'
+	@echo '  size-stats             - generate stats of the filesystem size'
 	@echo '  list-defconfigs        - list all defconfigs (pre-configured minimal systems)'
 	@echo
 	@echo 'Miscellaneous:'
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing
  2015-09-02 21:15 [Buildroot] [PATCHv5 0/4] Generate package size statistics Thomas Petazzoni
                   ` (2 preceding siblings ...)
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target Thomas Petazzoni
@ 2015-09-02 21:15 ` Thomas Petazzoni
  2015-09-03 10:37   ` Yann E. MORIN
  3 siblings, 1 reply; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-02 21:15 UTC (permalink / raw)
  To: buildroot

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 docs/manual/common-usage.txt | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/docs/manual/common-usage.txt b/docs/manual/common-usage.txt
index 5b27b1f..24f1d0b 100644
--- a/docs/manual/common-usage.txt
+++ b/docs/manual/common-usage.txt
@@ -273,6 +273,46 @@ only other format supported is PNG:
 BR2_GRAPH_OUT=png make graph-build
 ----------------
 
+=== Graphing the filesystem size contribution of packages
+
+When your target system grows, it is sometimes useful to understand
+how much each Buildroot package is contributing to the overall root
+filesystem size. To help with such an analysis, Buildroot collects
+data about the files each package install and generate graphs and CSVs
+files detailing the size contribution of each package.
+
+To generate these data after a build, run:
+
+----------------
+make size-stats
+----------------
+
+This will generate:
+
+* +output/graphs/graph-size+, a pie chart of the contribution of each
+  package to the overall root filesystem size
+
+* +output/build/package-size-stats.csv+, a CSV file giving the size
+  contribution of each package to the overall root filesystem size
+
+* +output/build/file-size-stats.csv+, a CSV file giving the size
+  contribution of each installed file to the package it belongs, and
+  to the overall filesystem size.
+
+This +size-stats+ target requires the Python Matplotlib library to be
+installed (+python-matplotlib+ on most distributions), and also the
++argpase+ module if you're using a Python version older than 2.7
+(+python-argparse+ on most distributions).
+
+Just like for the duration graph, a +BR2_GRAPH_OUT+ environment is
+supported to adjust the output file format. See xref:graph-depends[]
+for details about this environment variable.
+
+.Note
+The collected filesystem size data is only meaningful after a complete
+clean rebuild. Be sure to run +make clean all+ before using +make
+size-stats+.
+
 include::eclipse-integration.txt[]
 
 include::advanced.txt[]
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
@ 2015-09-03 10:25   ` Yann E. MORIN
  2015-09-09 13:11   ` Vicente Olivert Riera
  1 sibling, 0 replies; 17+ messages in thread
From: Yann E. MORIN @ 2015-09-03 10:25 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-09-02 23:15 +0200, Thomas Petazzoni spake thusly:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/packages-file-list.txt. It can
> later be used to determine the size contribution of each package to
> the target root filesystem.
> 
> Note that in order to detect if a file installed by one package is
> later overriden by another package, we calculate the md5 of installed
> files and compare them at each installation of a new package.
> 
> Collecting the list of files installed by each package is done
> unconditionally, as tests have shown that the performance impact of
> doing this is negligible.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Reviewed-by: Ryan Barnett <ryanbarnett3@gmail.com>
> Tested-by: Ryan Barnett <ryanbarnett3@gmail.com>

Tested-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Reviewed-by: "Yann E. MORIN" <yann.morin.1998@free.fr>

Regards,
Yann E. MORIN.

> ---
>  package/pkg-generic.mk | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 6a7d97e..2c8fe29 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,38 @@ define step_time
>  endef
>  GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  
> +# Hooks to collect statistics about installed files
> +
> +# This hook will be called before the target installation of a
> +# package. We store in a file named .br_filelist_before the list of
> +# files currently installed in the target. Note that the MD5 is also
> +# stored, in order to identify if the files are overwritten.
> +define step_pkg_size_start
> +	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
> +		$($(PKG)_DIR)/.br_filelist_before
> +endef
> +
> +# This hook will be called after the target installation of a
> +# package. We store in a file named .br_filelist_after the list of
> +# files (and their MD5) currently installed in the target. We then do
> +# a diff with the .br_filelist_before to compute the list of files
> +# installed by this package.
> +define step_pkg_size_end
> +	(cd $(TARGET_DIR); find . -type f -print0 | xargs -0 md5sum) | sort > \
> +		$($(PKG)_DIR)/.br_filelist_after
> +	comm -13 $($(PKG)_DIR)/.br_filelist_before $($(PKG)_DIR)/.br_filelist_after | \
> +		while read hash file ; do \
> +			echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \
> +		done
> +endef
> +
> +define step_pkg_size
> +	$(if $(filter install-target,$(2)),\
> +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef
> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> +
>  # User-supplied script
>  ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),)
>  define step_user
> -- 
> 2.5.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 2/4] support/scripts: add size-stats script
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 2/4] support/scripts: add size-stats script Thomas Petazzoni
@ 2015-09-03 10:27   ` Yann E. MORIN
  0 siblings, 0 replies; 17+ messages in thread
From: Yann E. MORIN @ 2015-09-03 10:27 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-09-02 23:15 +0200, Thomas Petazzoni spake thusly:
> This new script uses the data collected by the step_pkg_size
> instrumentation hook to generate a pie chart of the size contribution
> of each package to the target root filesystem, and two CSV files with
> statistics about the package size and file size. To achieve this, it
> looks at each file in $(TARGET_DIR), and using the
> packages-file-list.txt information collected by the step_pkg_size
> hook, it determines to which package the file belongs. It is therefore
> able to give the size installed by each package.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

Tested-by: "Yann E. MORIN" <yann.morin.1998@free.fr>

I'm not in a position to review the code, however. I'll entrust our
Python experts in their reviews. ;-)

Regards,
Yann E. MORIN.

> ---
>  support/scripts/size-stats | 217 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 217 insertions(+)
>  create mode 100755 support/scripts/size-stats
> 
> diff --git a/support/scripts/size-stats b/support/scripts/size-stats
> new file mode 100755
> index 0000000..54685f6
> --- /dev/null
> +++ b/support/scripts/size-stats
> @@ -0,0 +1,217 @@
> +#!/usr/bin/env python
> +
> +# Copyright (C) 2014 by Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> +
> +import sys
> +import os
> +import os.path
> +import argparse
> +import csv
> +import collections
> +
> +try:
> +    import matplotlib
> +    matplotlib.use('Agg')
> +    import matplotlib.font_manager as fm
> +    import matplotlib.pyplot as plt
> +except ImportError:
> +    sys.stderr.write("You need python-matplotlib to generate the size graph\n")
> +    exit(1)
> +
> +colors = ['#e60004', '#009836', '#2e1d86', '#ffed00',
> +          '#0068b5', '#f28e00', '#940084', '#97c000']
> +
> +#
> +# This function adds a new file to 'filesdict', after checking its
> +# size. The 'filesdict' contain the relative path of the file as the
> +# key, and as the value a tuple containing the name of the package to
> +# which the file belongs and the size of the file.
> +#
> +# filesdict: the dict to which  the file is added
> +# relpath: relative path of the file
> +# fullpath: absolute path to the file
> +# pkg: package to which the file belongs
> +#
> +def add_file(filesdict, relpath, abspath, pkg):
> +    if not os.path.exists(abspath):
> +        return
> +    if os.path.islink(abspath):
> +        return
> +    sz = os.stat(abspath).st_size
> +    filesdict[relpath] = (pkg, sz)
> +
> +#
> +# This function returns a dict where each key is the path of a file in
> +# the root filesystem, and the value is a tuple containing two
> +# elements: the name of the package to which this file belongs and the
> +# size of the file.
> +#
> +# builddir: path to the Buildroot output directory
> +#
> +def build_package_dict(builddir):
> +    filesdict = {}
> +    with open(os.path.join(builddir, "build", "packages-file-list.txt")) as filelistf:
> +        for l in filelistf.readlines():
> +            pkg, fpath = l.split(",")
> +            # remove the initial './' in each file path
> +            fpath = fpath.strip()[2:]
> +            fullpath = os.path.join(builddir, "target", fpath)
> +            add_file(filesdict, fpath, fullpath, pkg)
> +    return filesdict
> +
> +#
> +# This function builds a dictionary that contains the name of a
> +# package as key, and the size of the files installed by this package
> +# as the value.
> +#
> +# filesdict: dictionary with the name of the files as key, and as
> +# value a tuple containing the name of the package to which the files
> +# belongs, and the size of the file. As returned by
> +# build_package_dict.
> +#
> +# builddir: path to the Buildroot output directory
> +#
> +def build_package_size(filesdict, builddir):
> +    pkgsize = collections.defaultdict(int)
> +
> +    for root, _, files in os.walk(os.path.join(builddir, "target")):
> +        for f in files:
> +            fpath = os.path.join(root, f)
> +            if os.path.islink(fpath):
> +                continue
> +            frelpath = os.path.relpath(fpath, os.path.join(builddir, "target"))
> +            if not frelpath in filesdict:
> +                print("WARNING: %s is not part of any package" % frelpath)
> +                pkg = "unknown"
> +            else:
> +                pkg = filesdict[frelpath][0]
> +
> +            pkgsize[pkg] += os.path.getsize(fpath)
> +
> +    return pkgsize
> +
> +#
> +# Given a dict returned by build_package_size(), this function
> +# generates a pie chart of the size installed by each package.
> +#
> +# pkgsize: dictionary with the name of the package as a key, and the
> +# size as the value, as returned by build_package_size.
> +#
> +# outputf: output file for the graph
> +#
> +def draw_graph(pkgsize, outputf):
> +    total = sum(pkgsize.values())
> +    labels = []
> +    values = []
> +    other_value = 0
> +    for (p, sz) in pkgsize.items():
> +        if sz < (total * 0.01):
> +            other_value += sz
> +        else:
> +            labels.append("%s (%d kB)" % (p, sz / 1000.))
> +            values.append(sz)
> +    labels.append("Other (%d kB)" % (other_value / 1000.))
> +    values.append(other_value)
> +
> +    plt.figure()
> +    patches, texts, autotexts = plt.pie(values, labels=labels,
> +                                        autopct='%1.1f%%', shadow=True,
> +                                        colors=colors)
> +    # Reduce text size
> +    proptease = fm.FontProperties()
> +    proptease.set_size('xx-small')
> +    plt.setp(autotexts, fontproperties=proptease)
> +    plt.setp(texts, fontproperties=proptease)
> +
> +    plt.suptitle("Filesystem size per package", fontsize=18, y=.97)
> +    plt.title("Total filesystem size: %d kB" % (total / 1000.), fontsize=10, y=.96)
> +    plt.savefig(outputf)
> +
> +#
> +# Generate a CSV file with statistics about the size of each file, its
> +# size contribution to the package and to the overall system.
> +#
> +# filesdict: dictionary with the name of the files as key, and as
> +# value a tuple containing the name of the package to which the files
> +# belongs, and the size of the file. As returned by
> +# build_package_dict.
> +#
> +# pkgsize: dictionary with the name of the package as a key, and the
> +# size as the value, as returned by build_package_size.
> +#
> +# outputf: output CSV file
> +#
> +def gen_files_csv(filesdict, pkgsizes, outputf):
> +    total = 0
> +    for (p, sz) in pkgsizes.items():
> +        total += sz
> +    with open(outputf, 'w') as csvfile:
> +        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
> +        wr.writerow(["File name",
> +                     "Package name",
> +                     "File size",
> +                     "Package size",
> +                     "File size in package (%)",
> +                     "File size in system (%)"])
> +        for f, (pkgname, filesize) in filesdict.items():
> +            pkgsize = pkgsizes[pkgname]
> +            wr.writerow([f, pkgname, filesize, pkgsize,
> +                         "%.1f" % (float(filesize) / pkgsize * 100),
> +                         "%.1f" % (float(filesize) / total * 100)])
> +
> +
> +#
> +# Generate a CSV file with statistics about the size of each package,
> +# and their size contribution to the overall system.
> +#
> +# pkgsize: dictionary with the name of the package as a key, and the
> +# size as the value, as returned by build_package_size.
> +#
> +# outputf: output CSV file
> +#
> +def gen_packages_csv(pkgsizes, outputf):
> +    total = sum(pkgsizes.values())
> +    with open(outputf, 'w') as csvfile:
> +        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
> +        wr.writerow(["Package name", "Package size", "Package size in system (%)"])
> +        for (pkg, size) in pkgsizes.items():
> +            wr.writerow([pkg, size, "%.1f" % (float(size) / total * 100)])
> +
> +parser = argparse.ArgumentParser(description='Draw build time graphs')
> +
> +parser.add_argument("--builddir", '-i', metavar="BUILDDIR", required=True,
> +                    help="Buildroot output directory")
> +parser.add_argument("--graph", '-g', metavar="GRAPH",
> +                    help="Graph output file (.pdf or .png extension)")
> +parser.add_argument("--file-size-csv", '-f', metavar="FILE_SIZE_CSV",
> +                    help="CSV output file with file size statistics")
> +parser.add_argument("--package-size-csv", '-p', metavar="PKG_SIZE_CSV",
> +                    help="CSV output file with package size statistics")
> +args = parser.parse_args()
> +
> +# Find out which package installed what files
> +pkgdict = build_package_dict(args.builddir)
> +
> +# Collect the size installed by each package
> +pkgsize = build_package_size(pkgdict, args.builddir)
> +
> +if args.graph:
> +    draw_graph(pkgsize, args.graph)
> +if args.file_size_csv:
> +    gen_files_csv(pkgdict, pkgsize, args.file_size_csv)
> +if args.package_size_csv:
> +    gen_packages_csv(pkgsize, args.package_size_csv)
> -- 
> 2.5.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target Thomas Petazzoni
@ 2015-09-03 10:29   ` Yann E. MORIN
  2015-09-03 12:21     ` Thomas Petazzoni
  0 siblings, 1 reply; 17+ messages in thread
From: Yann E. MORIN @ 2015-09-03 10:29 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-09-02 23:15 +0200, Thomas Petazzoni spake thusly:
> This commit implements a size-stats target that calls the script of
> the same name to generate the graph and CSV files related to package
> and file sizes.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Reviewed-by: Ryan Barnett <ryanbarnett3@gmail.com>
> Tested-by: Ryan Barnett <ryanbarnett3@gmail.com>

Tested-by: "Yann E. MORIN" <yann.morin.1998@free.fr>

I would however have called this something like graph-sizes, so it is in
line with the other graph-generating targets.

Otherwise:

Reviewed-by: "Yann E. MORIN" <yann.morin.1998@free.fr>

Regards,
Yann E. MORIN.

> ---
>  Makefile | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index 9a6e9c0..4a33495 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -685,6 +685,13 @@ graph-depends: graph-depends-requirements
>  	|tee $(GRAPHS_DIR)/$(@).dot \
>  	|dot $(BR2_GRAPH_DOT_OPTS) -T$(BR_GRAPH_OUT) -o $(GRAPHS_DIR)/$(@).$(BR_GRAPH_OUT)
>  
> +size-stats:
> +	$(Q)mkdir -p $(GRAPHS_DIR)
> +	$(Q)$(TOPDIR)/support/scripts/size-stats --builddir $(BASE_DIR) \
> +		--graph $(BASE_DIR)/graphs/graph-size.$(BR_GRAPH_OUT) \
> +		--file-size-csv $(BASE_DIR)/build/file-size-stats.csv \
> +		--package-size-csv $(BASE_DIR)/build/package-size-stats.csv
> +
>  else # ifeq ($(BR2_HAVE_DOT_CONFIG),y)
>  
>  all: menuconfig
> @@ -903,6 +910,7 @@ endif
>  	@echo '  manual-epub            - build manual in ePub'
>  	@echo '  graph-build            - generate graphs of the build times'
>  	@echo '  graph-depends          - generate graph of the dependency tree'
> +	@echo '  size-stats             - generate stats of the filesystem size'
>  	@echo '  list-defconfigs        - list all defconfigs (pre-configured minimal systems)'
>  	@echo
>  	@echo 'Miscellaneous:'
> -- 
> 2.5.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing Thomas Petazzoni
@ 2015-09-03 10:37   ` Yann E. MORIN
  2015-09-03 10:43     ` Yann E. MORIN
  0 siblings, 1 reply; 17+ messages in thread
From: Yann E. MORIN @ 2015-09-03 10:37 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-09-02 23:15 +0200, Thomas Petazzoni spake thusly:
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  docs/manual/common-usage.txt | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/docs/manual/common-usage.txt b/docs/manual/common-usage.txt
> index 5b27b1f..24f1d0b 100644
> --- a/docs/manual/common-usage.txt
> +++ b/docs/manual/common-usage.txt
> @@ -273,6 +273,46 @@ only other format supported is PNG:
>  BR2_GRAPH_OUT=png make graph-build
>  ----------------
>  
> +=== Graphing the filesystem size contribution of packages
> +
> +When your target system grows, it is sometimes useful to understand
> +how much each Buildroot package is contributing to the overall root
> +filesystem size. To help with such an analysis, Buildroot collects
> +data about the files each package install and generate graphs and CSVs

generates

> +files detailing the size contribution of each package.
> +
> +To generate these data after a build, run:
> +
> +----------------
> +make size-stats

If you change the name of the target in the previous patch, do not
forget to update the manual accordingly.

> +----------------
> +
> +This will generate:
> +
> +* +output/graphs/graph-size+, a pie chart of the contribution of each
> +  package to the overall root filesystem size

Nope, it did not generate that file. I guess you meant 'graph-size.pdf'.

> +* +output/build/package-size-stats.csv+, a CSV file giving the size
> +  contribution of each package to the overall root filesystem size

Ditto, that was not generated.

> +* +output/build/file-size-stats.csv+, a CSV file giving the size
> +  contribution of each installed file to the package it belongs, and
> +  to the overall filesystem size.

Ditto, that file was not generated.

Regards,
Yann E. MORIN.

> +This +size-stats+ target requires the Python Matplotlib library to be
> +installed (+python-matplotlib+ on most distributions), and also the
> ++argpase+ module if you're using a Python version older than 2.7
> +(+python-argparse+ on most distributions).
> +
> +Just like for the duration graph, a +BR2_GRAPH_OUT+ environment is
> +supported to adjust the output file format. See xref:graph-depends[]
> +for details about this environment variable.
> +
> +.Note
> +The collected filesystem size data is only meaningful after a complete
> +clean rebuild. Be sure to run +make clean all+ before using +make
> +size-stats+.
> +
>  include::eclipse-integration.txt[]
>  
>  include::advanced.txt[]
> -- 
> 2.5.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing
  2015-09-03 10:37   ` Yann E. MORIN
@ 2015-09-03 10:43     ` Yann E. MORIN
  2015-09-15  2:54       ` Ryan Barnett
  0 siblings, 1 reply; 17+ messages in thread
From: Yann E. MORIN @ 2015-09-03 10:43 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-09-03 12:37 +0200, Yann E. MORIN spake thusly:
> Thomas, All,
> 
> On 2015-09-02 23:15 +0200, Thomas Petazzoni spake thusly:
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> > ---
> >  docs/manual/common-usage.txt | 40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> > 
> > diff --git a/docs/manual/common-usage.txt b/docs/manual/common-usage.txt
> > index 5b27b1f..24f1d0b 100644
> > --- a/docs/manual/common-usage.txt
> > +++ b/docs/manual/common-usage.txt
> > @@ -273,6 +273,46 @@ only other format supported is PNG:
> >  BR2_GRAPH_OUT=png make graph-build
> >  ----------------
> >  
> > +=== Graphing the filesystem size contribution of packages
> > +
> > +When your target system grows, it is sometimes useful to understand
> > +how much each Buildroot package is contributing to the overall root
> > +filesystem size. To help with such an analysis, Buildroot collects
> > +data about the files each package install and generate graphs and CSVs
> 
> generates
> 
> > +files detailing the size contribution of each package.
> > +
> > +To generate these data after a build, run:
> > +
> > +----------------
> > +make size-stats
> 
> If you change the name of the target in the previous patch, do not
> forget to update the manual accordingly.
> 
> > +----------------
> > +
> > +This will generate:
> > +
> > +* +output/graphs/graph-size+, a pie chart of the contribution of each
> > +  package to the overall root filesystem size
> 
> Nope, it did not generate that file. I guess you meant 'graph-size.pdf'.
> 
> > +* +output/build/package-size-stats.csv+, a CSV file giving the size
> > +  contribution of each package to the overall root filesystem size
> 
> Ditto, that was not generated.

Sorry, ditch that comment. It *was* generated.

It's just that I only looked at the base filename, and expected it in
the graphs/ sub-dir, not in build/ . That's by looking at patch 3 that I
noticed they were put in there.

I think it makes more sense to put those .csv files in the graphs/
sub-dir.

> > +* +output/build/file-size-stats.csv+, a CSV file giving the size
> > +  contribution of each installed file to the package it belongs, and
> > +  to the overall filesystem size.
> 
> Ditto, that file was not generated.

Ditto.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target
  2015-09-03 10:29   ` Yann E. MORIN
@ 2015-09-03 12:21     ` Thomas Petazzoni
  2015-09-03 12:36       ` Yann E. MORIN
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-03 12:21 UTC (permalink / raw)
  To: buildroot

Dear Yann E. MORIN,

On Thu, 3 Sep 2015 12:29:05 +0200, Yann E. MORIN wrote:

> Tested-by: "Yann E. MORIN" <yann.morin.1998@free.fr>

Thanks!

> I would however have called this something like graph-sizes, so it is in
> line with the other graph-generating targets.

It used to be called graph-size, until I changed the code to also
generate CSV files. So now it doesn't generate just a graph, but also
raw CSV data, which is why I changed to size-stats. If despite that
you think "make graph-size" remains a better naming, I'm fine with
changing back.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target
  2015-09-03 12:21     ` Thomas Petazzoni
@ 2015-09-03 12:36       ` Yann E. MORIN
  0 siblings, 0 replies; 17+ messages in thread
From: Yann E. MORIN @ 2015-09-03 12:36 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-09-03 14:21 +0200, Thomas Petazzoni spake thusly:
> On Thu, 3 Sep 2015 12:29:05 +0200, Yann E. MORIN wrote:
[--SNIP--]
> > I would however have called this something like graph-sizes, so it is in
> > line with the other graph-generating targets.
> 
> It used to be called graph-size, until I changed the code to also
> generate CSV files. So now it doesn't generate just a graph, but also
> raw CSV data, which is why I changed to size-stats. If despite that
> you think "make graph-size" remains a better naming, I'm fine with
> changing back.

Well, can't we consider the csv files as by-products? The main and
primary purpose is to generate the graphs, so I'd still call that
graph-size. Having 'graph-' as a prefix also helps classify the rule
with the other graph-generating rules.

But I don't really mind...

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook
  2015-09-02 21:15 ` [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
  2015-09-03 10:25   ` Yann E. MORIN
@ 2015-09-09 13:11   ` Vicente Olivert Riera
  2015-09-09 13:46     ` Thomas Petazzoni
  1 sibling, 1 reply; 17+ messages in thread
From: Vicente Olivert Riera @ 2015-09-09 13:11 UTC (permalink / raw)
  To: buildroot

Dear Thomas Petazzoni,

On 09/02/2015 10:15 PM, Thomas Petazzoni wrote:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/packages-file-list.txt. It can
> later be used to determine the size contribution of each package to
> the target root filesystem.
> 
> Note that in order to detect if a file installed by one package is
> later overriden by another package, we calculate the md5 of installed
> files and compare them at each installation of a new package.
> 
> Collecting the list of files installed by each package is done
> unconditionally, as tests have shown that the performance impact of
> doing this is negligible.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Reviewed-by: Ryan Barnett <ryanbarnett3@gmail.com>
> Tested-by: Ryan Barnett <ryanbarnett3@gmail.com>
> ---
>  package/pkg-generic.mk | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 6a7d97e..2c8fe29 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,38 @@ define step_time
>  endef
>  GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  
> +# Hooks to collect statistics about installed files
> +
> +# This hook will be called before the target installation of a
> +# package. We store in a file named .br_filelist_before the list of
> +# files currently installed in the target. Note that the MD5 is also
> +# stored, in order to identify if the files are overwritten.
> +define step_pkg_size_start
> +	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \

why only "-type f"? Packages can also install other type of files such
as links, for instance. Don't you want to track them as well?

> +		$($(PKG)_DIR)/.br_filelist_before
> +endef
> +
> +# This hook will be called after the target installation of a
> +# package. We store in a file named .br_filelist_after the list of
> +# files (and their MD5) currently installed in the target. We then do
> +# a diff with the .br_filelist_before to compute the list of files
> +# installed by this package.
> +define step_pkg_size_end
> +	(cd $(TARGET_DIR); find . -type f -print0 | xargs -0 md5sum) | sort > \

Same here.

Regards,

Vincent.

> +		$($(PKG)_DIR)/.br_filelist_after
> +	comm -13 $($(PKG)_DIR)/.br_filelist_before $($(PKG)_DIR)/.br_filelist_after | \
> +		while read hash file ; do \
> +			echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \
> +		done
> +endef
> +
> +define step_pkg_size
> +	$(if $(filter install-target,$(2)),\
> +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef
> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> +
>  # User-supplied script
>  ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),)
>  define step_user
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook
  2015-09-09 13:11   ` Vicente Olivert Riera
@ 2015-09-09 13:46     ` Thomas Petazzoni
  2015-09-09 13:50       ` Vicente Olivert Riera
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-09 13:46 UTC (permalink / raw)
  To: buildroot

Dear Vicente Olivert Riera,

On Wed, 9 Sep 2015 14:11:08 +0100, Vicente Olivert Riera wrote:

> > +# This hook will be called before the target installation of a
> > +# package. We store in a file named .br_filelist_before the list of
> > +# files currently installed in the target. Note that the MD5 is also
> > +# stored, in order to identify if the files are overwritten.
> > +define step_pkg_size_start
> > +	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
> 
> why only "-type f"? Packages can also install other type of files such
> as links, for instance. Don't you want to track them as well?

It doesn't make sense to "md5sum" a symbolic link: what you will md5 is
the target of the link, i.e another file of the file system that is
already taken into account by the size statistics mechanism.

We are really only interested in calculating the size of the
filesystem, so the only thing of interest are real file. We assume that
the size of things like device files, directories or symbolic links is
negligible and not relevant for our analysis.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook
  2015-09-09 13:46     ` Thomas Petazzoni
@ 2015-09-09 13:50       ` Vicente Olivert Riera
  2015-09-09 14:10         ` Thomas Petazzoni
  0 siblings, 1 reply; 17+ messages in thread
From: Vicente Olivert Riera @ 2015-09-09 13:50 UTC (permalink / raw)
  To: buildroot


Dear Thomas Petazzoni,

On 09/09/2015 02:46 PM, Thomas Petazzoni wrote:
> Dear Vicente Olivert Riera,
> 
> On Wed, 9 Sep 2015 14:11:08 +0100, Vicente Olivert Riera wrote:
> 
>>> +# This hook will be called before the target installation of a
>>> +# package. We store in a file named .br_filelist_before the list of
>>> +# files currently installed in the target. Note that the MD5 is also
>>> +# stored, in order to identify if the files are overwritten.
>>> +define step_pkg_size_start
>>> +	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
>>
>> why only "-type f"? Packages can also install other type of files such
>> as links, for instance. Don't you want to track them as well?
> 
> It doesn't make sense to "md5sum" a symbolic link: what you will md5 is
> the target of the link, i.e another file of the file system that is
> already taken into account by the size statistics mechanism.
> 
> We are really only interested in calculating the size of the
> filesystem, so the only thing of interest are real file. We assume that
> the size of things like device files, directories or symbolic links is
> negligible and not relevant for our analysis.

Ok, I was looking at the packages-file-list.txt file as a way to know
which files are installed by each package (which is useful IMHO), and
then I noticed that packages like busybox only had a few files when it
actually installs lots of links. But, if this is only to calculate the
size of the file system, then it's fine.

Regards,

Vincent.

> Best regards,
> 
> Thomas
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook
  2015-09-09 13:50       ` Vicente Olivert Riera
@ 2015-09-09 14:10         ` Thomas Petazzoni
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Petazzoni @ 2015-09-09 14:10 UTC (permalink / raw)
  To: buildroot

Vicente,

On Wed, 9 Sep 2015 14:50:15 +0100, Vicente Olivert Riera wrote:

> Ok, I was looking at the packages-file-list.txt file as a way to know
> which files are installed by each package (which is useful IMHO), and
> then I noticed that packages like busybox only had a few files when it
> actually installs lots of links. But, if this is only to calculate the
> size of the file system, then it's fine.

Yes, it's really only to calculate the size. We could decide to extend
the packages-file-list.txt to have all file types (normal files,
symlinks, directories), and then have a way of calculating the md5 only
for the normal files. But it was unnecessary to do this for the
size-stats feature. I am not sure it is worth the effort to do that,
but if people want me to add this, I can have a look.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing
  2015-09-03 10:43     ` Yann E. MORIN
@ 2015-09-15  2:54       ` Ryan Barnett
  0 siblings, 0 replies; 17+ messages in thread
From: Ryan Barnett @ 2015-09-15  2:54 UTC (permalink / raw)
  To: buildroot

Thomas, Yann,

On Thu, Sep 3, 2015 at 5:43 AM, Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> Thomas, All,
>
> On 2015-09-03 12:37 +0200, Yann E. MORIN spake thusly:
>> Thomas, All,
>>
>> On 2015-09-02 23:15 +0200, Thomas Petazzoni spake thusly:
>> > +* +output/build/package-size-stats.csv+, a CSV file giving the size
>> > +  contribution of each package to the overall root filesystem size
>>
>> Ditto, that was not generated.
>
> Sorry, ditch that comment. It *was* generated.
>
> It's just that I only looked at the base filename, and expected it in
> the graphs/ sub-dir, not in build/ . That's by looking at patch 3 that I
> noticed they were put in there.
>
> I think it makes more sense to put those .csv files in the graphs/
> sub-dir.

I agree with Yann in that the CSV files make the most sense to be put
in graphs/ sub-dir. In fact, I have modified this locally while carry
this patch to place them there.

On a side note - it would be great to see this feature merged :)

Thanks,
-Ryan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-09-15  2:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-02 21:15 [Buildroot] [PATCHv5 0/4] Generate package size statistics Thomas Petazzoni
2015-09-02 21:15 ` [Buildroot] [PATCHv5 1/4] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
2015-09-03 10:25   ` Yann E. MORIN
2015-09-09 13:11   ` Vicente Olivert Riera
2015-09-09 13:46     ` Thomas Petazzoni
2015-09-09 13:50       ` Vicente Olivert Riera
2015-09-09 14:10         ` Thomas Petazzoni
2015-09-02 21:15 ` [Buildroot] [PATCHv5 2/4] support/scripts: add size-stats script Thomas Petazzoni
2015-09-03 10:27   ` Yann E. MORIN
2015-09-02 21:15 ` [Buildroot] [PATCHv5 3/4] Makefile: implement a size-stats target Thomas Petazzoni
2015-09-03 10:29   ` Yann E. MORIN
2015-09-03 12:21     ` Thomas Petazzoni
2015-09-03 12:36       ` Yann E. MORIN
2015-09-02 21:15 ` [Buildroot] [PATCHv5 4/4] docs/manual: add section about size graphing Thomas Petazzoni
2015-09-03 10:37   ` Yann E. MORIN
2015-09-03 10:43     ` Yann E. MORIN
2015-09-15  2:54       ` Ryan Barnett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.