All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCHv3 0/5] Graph about installed size per package
@ 2015-02-05 21:19 Thomas Petazzoni
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean' Thomas Petazzoni
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Thomas Petazzoni @ 2015-02-05 21:19 UTC (permalink / raw)
  To: buildroot

Hello,

Here is the third iteration of the patches adding a 'size-stats'
feature, which mainly allows to generate a pie chart of the size
contribution of each package to the overall root filesystem size.

Changes since v2:

 - Use print0 in find and -0 in xargs to support spaces in filenames,
   just in case. Suggested by J?r?me Pouiller.

 - Support custom skeleton locations, instead of assuming the skeleton
   comes from system/skeleton.

 - Fix the size-stats target to actually call the size-stats script,
   and not graph-size, which was the old name of the script in
   previous versions of the series.

 - Fix usage of the size-stats target in out of tree build
   situations. For consistency reasons, we chose to mimic what
   graph-depends already does, even if size-stats does not necessarily
   need to be executed from within CONFIG_DIR.

 - Add one patch to make sure that 'make clean' removes the graphs/
   subdirectory in the output directory, which wasn't done today. This
   issue isn't specific to size-stats, and already existed with
   graph-build and graph-depends.

Comments:

 - J?r?me Puiller suggested to add a warning for removed
   files. However, many files are removed in the target-finalize step
   (man pages, etc.) and we don't want to have gazillions of warnings
   about these completely normal file removals.

Thomas Petazzoni (5):
  Makefile: remove the graphs/ dir on 'make clean'
  pkg-generic: add step_pkg_size global instrumentation hook
  support/scripts: add size-stats script
  Makefile: implement a size-stats target
  Dummy testing packages

 Config.in                  |   9 ++
 Makefile                   |  19 +++-
 package/Config.in          |   2 +
 package/foo1/Config.in     |   2 +
 package/foo1/file.img      | Bin 0 -> 2097152 bytes
 package/foo1/foo1.mk       |   8 ++
 package/foo2/Config.in     |   2 +
 package/foo2/file.img      | Bin 0 -> 4194304 bytes
 package/foo2/foo2.mk       |   8 ++
 package/pkg-generic.mk     |  36 +++++++
 support/scripts/size-stats | 231 +++++++++++++++++++++++++++++++++++++++++++++
 11 files changed, 314 insertions(+), 3 deletions(-)
 create mode 100644 package/foo1/Config.in
 create mode 100644 package/foo1/file.img
 create mode 100644 package/foo1/foo1.mk
 create mode 100644 package/foo2/Config.in
 create mode 100644 package/foo2/file.img
 create mode 100644 package/foo2/foo2.mk
 create mode 100755 support/scripts/size-stats

-- 
2.1.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean'
  2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
@ 2015-02-05 21:19 ` Thomas Petazzoni
  2015-02-15 16:08   ` Yann E. MORIN
  2015-04-03 12:21   ` Thomas Petazzoni
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 12+ messages in thread
From: Thomas Petazzoni @ 2015-02-05 21:19 UTC (permalink / raw)
  To: buildroot

Currently, a 'make clean' leaves the graphs/ subdirectory in the
output directory. This commit defines a GRAPHS_DIR variable, used by
the different graph-generating targets, and which gets cleaned up in
the 'clean' target.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 Makefile | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 9c870d9..6d4ae38 100644
--- a/Makefile
+++ b/Makefile
@@ -161,6 +161,7 @@ TARGET_DIR := $(BASE_DIR)/target
 # initial definition so that 'make clean' works for most users, even without
 # .config. HOST_DIR will be overwritten later when .config is included.
 HOST_DIR := $(BASE_DIR)/host
+GRAPHS_DIR := $(BASE_DIR)/graphs
 
 LEGAL_INFO_DIR = $(BASE_DIR)/legal-info
 REDIST_SOURCES_DIR_TARGET = $(LEGAL_INFO_DIR)/sources
@@ -658,7 +659,7 @@ show-targets:
 	@echo $(HOST_DEPS) $(TARGETS_HOST_DEPS) $(TARGETS) $(TARGETS_ROOTFS)
 
 graph-build: $(O)/build/build-time.log
-	@install -d $(O)/graphs
+	@install -d $(GRAPHS_DIR)
 	$(foreach o,name build duration,./support/scripts/graph-build-time \
 					--type=histogram --order=$(o) --input=$(<) \
 					--output=$(O)/graphs/build.hist-$(o).$(BR_GRAPH_OUT) \
@@ -673,7 +674,7 @@ graph-depends-requirements:
 		{ echo "ERROR: The 'dot' program from Graphviz is needed for graph-depends" >&2; exit 1; }
 
 graph-depends: graph-depends-requirements
-	@$(INSTALL) -d $(O)/graphs
+	@$(INSTALL) -d $(GRAPHS_DIR)
 	@cd "$(CONFIG_DIR)"; \
 	$(TOPDIR)/support/scripts/graph-depends $(BR2_GRAPH_DEPS_OPTS) \
 	|tee $(BASE_DIR)/graphs/$(@).dot \
@@ -834,7 +835,7 @@ printvars:
 clean:
 	rm -rf $(TARGET_DIR) $(BINARIES_DIR) $(HOST_DIR) \
 		$(BUILD_DIR) $(BASE_DIR)/staging \
-		$(LEGAL_INFO_DIR)
+		$(LEGAL_INFO_DIR) $(GRAPHS_DIR)
 
 distclean: clean
 ifeq ($(DL_DIR),$(TOPDIR)/dl)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook
  2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean' Thomas Petazzoni
@ 2015-02-05 21:19 ` Thomas Petazzoni
  2015-02-15 16:59   ` Yann E. MORIN
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 3/5] support/scripts: add size-stats script Thomas Petazzoni
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Thomas Petazzoni @ 2015-02-05 21:19 UTC (permalink / raw)
  To: buildroot

This patch adds a global instrumentation hook that collects the list
of files installed in $(TARGET_DIR) by each package, and stores this
list into a file called $(BUILD_DIR)/packages-file-list.txt. It can
later be used to determine the size contribution of each package to
the target root filesystem.

Note that in order to detect if a file installed by one package is
later overriden by another package, we calculate the md5 of installed
files and compare them at each installation of a new package.

This commit also adds a Config.in option to enable the collection of
this data, as calculating the md5 of all installed files at the
beginning and end of the installation of each package can be
considered a time-consuming process which maybe some users will not be
willing to suffer from.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 Config.in              |  9 +++++++++
 package/pkg-generic.mk | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/Config.in b/Config.in
index f5b6c73..58a5085 100644
--- a/Config.in
+++ b/Config.in
@@ -613,6 +613,15 @@ config BR2_COMPILER_PARANOID_UNSAFE_PATH
 	  toolchain (through gcc and binutils patches) and external
 	  toolchain backends (through the external toolchain wrapper).
 
+config BR2_COLLECT_FILE_SIZE_STATS
+	bool "collect statistics about installed file size"
+	help
+	  Enable this option to let Buildroot collect data about the
+	  installed files. When this option is enabled, you will be
+	  able to use the 'size-stats' make target, which will
+	  generate a graph and CSV files giving statistics about the
+	  installed size of each file and each package.
+
 endmenu
 
 endmenu
diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 1b09955..db35a87 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -55,6 +55,42 @@ define step_time
 endef
 GLOBAL_INSTRUMENTATION_HOOKS += step_time
 
+# Hooks to collect statistics about installed files
+ifeq ($(BR2_COLLECT_FILE_SIZE_STATS),y)
+
+# This hook will be called before the target installation of a
+# package. We store in a file named $(1).filelist_before the list of
+# files currently installed in the target. Note that the MD5 is also
+# stored, in order to identify if the files are overwritten.
+define step_pkg_size_start
+	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
+		$(BUILD_DIR)/$(1).filelist_before
+endef
+
+# This hook will be called after the target installation of a
+# package. We store in a file named $(1).filelist_after the list
+# of files (and their MD5) currently installed in the target. We then
+# do a diff with the $(1).filelist_before to compute the list of
+# files installed by this package.
+define step_pkg_size_end
+	(cd $(TARGET_DIR); find . -type f -print0 | xargs -0 md5sum) | sort > \
+		$(BUILD_DIR)/$(1).filelist_after
+	comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \
+		while read hash file ; do \
+			echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \
+		done
+	$(RM) -f $(BUILD_DIR)/$(1).filelist_before \
+		$(BUILD_DIR)/$(1).filelist_after
+endef
+
+define step_pkg_size
+	$(if $(filter install-target,$(2)),\
+		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
+		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
+endef
+GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
+endif
+
 # User-supplied script
 ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),)
 define step_user
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 3/5] support/scripts: add size-stats script
  2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean' Thomas Petazzoni
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
@ 2015-02-05 21:19 ` Thomas Petazzoni
  2015-04-06 14:02   ` Arnout Vandecappelle
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 4/5] Makefile: implement a size-stats target Thomas Petazzoni
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Thomas Petazzoni @ 2015-02-05 21:19 UTC (permalink / raw)
  To: buildroot

This new script uses the data collected by the step_pkg_size
instrumentation hook to generate a pie chart of the size contribution
of each package to the target root filesystem, and two CSV files with
statistics about the package size and file size. To achieve this, it
looks at each file in $(TARGET_DIR), and using the
packages-file-list.txt information collected by the step_pkg_size
hook, it determines to which package the file belongs. It is therefore
able to give the size installed by each package.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 support/scripts/size-stats | 231 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 231 insertions(+)
 create mode 100755 support/scripts/size-stats

diff --git a/support/scripts/size-stats b/support/scripts/size-stats
new file mode 100755
index 0000000..a5c0c74
--- /dev/null
+++ b/support/scripts/size-stats
@@ -0,0 +1,231 @@
+#!/usr/bin/env python
+
+# Copyright (C) 2014 by Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+import sys
+import os
+import os.path
+import argparse
+import csv
+
+try:
+    import matplotlib.font_manager as fm
+    import matplotlib.pyplot as plt
+except ImportError:
+    sys.stderr.write("You need python-matplotlib to generate the size graph\n")
+    exit(1)
+
+colors = ['#e60004', '#009836', '#2e1d86', '#ffed00',
+          '#0068b5', '#f28e00', '#940084', '#97c000']
+
+#
+# This function returns a dict containing as keys the files present in
+# the filesystem skeleton, and as value, the string "skeleton". It is
+# used to simulate a fake "skeleton" package, to assign the files from
+# the skeleton to some package.
+#
+# builddir: path to the Buildroot output directory
+# skeleton_path: path to the rootfs skeleton
+#
+def build_skeleton_dict(builddir, skeleton_path):
+    skeleton_files = {}
+    for root, _, files in os.walk(skeleton_path):
+        for f in files:
+            if f == ".empty":
+                continue
+            frelpath = os.path.relpath(os.path.join(root, f), skeleton_path)
+            # Get the real size of the installed file
+            targetpath = os.path.join(builddir, "target", frelpath)
+            if os.path.islink(targetpath):
+                continue
+            sz = os.stat(targetpath).st_size
+            skeleton_files[frelpath] = { 'pkg': "skeleton", 'size': sz }
+    return skeleton_files
+
+#
+# This function returns a dict where each key is the path of a file in
+# the root filesystem, and the value is a dict containing two
+# elements: the name of the package to which this file belongs (key:
+# pkg) and the size of the file (key: size).
+#
+# builddir: path to the Buildroot output directory
+#
+def build_package_dict(builddir):
+    pkgdict = {}
+    with open(os.path.join(builddir, "build", "packages-file-list.txt")) as filelistf:
+        for l in filelistf.readlines():
+            f = l.split(",")
+            fpath = f[1].strip().replace("./", "")
+            fullpath = os.path.join(builddir, "target", fpath)
+            if not os.path.exists(fullpath):
+                continue
+            pkg = f[0]
+            sz = os.stat(fullpath).st_size
+            pkgdict[fpath] = { 'pkg': pkg, 'size': sz }
+    return pkgdict
+
+#
+# This function builds a dictionary that contains the name of a
+# package as key, and the size of the files installed by this package
+# as the value.
+#
+# pkgdict: dictionary with the name of the files as key, and as value
+# a dict containing the name of the package to which the files
+# belongs, and the size of the file. As returned by
+# build_package_dict.
+#
+# builddir: path to the Buildroot output directory
+#
+def build_package_size(pkgdict, builddir):
+    pkgsize = {}
+
+    for root, _, files in os.walk(os.path.join(builddir, "target")):
+        for f in files:
+            fpath = os.path.join(root, f)
+            if os.path.islink(fpath):
+                continue
+            frelpath = os.path.relpath(fpath, os.path.join(builddir, "target"))
+            if not frelpath in pkgdict:
+                print("WARNING: %s is not part of any package" % frelpath)
+                pkg = "unknown"
+            else:
+                pkg = pkgdict[frelpath]["pkg"]
+
+            if pkg in pkgsize:
+                pkgsize[pkg] += os.path.getsize(fpath)
+            else:
+                pkgsize[pkg] = os.path.getsize(fpath)
+
+    return pkgsize
+
+#
+# Given a dict returned by build_package_size(), this function
+# generates a pie chart of the size installed by each package.
+#
+# pkgsize: dictionary with the name of the package as a key, and the
+# size as the value, as returned by build_package_size.
+#
+# outputf: output file for the graph
+#
+def draw_graph(pkgsize, outputf):
+    total = 0
+    for (p, sz) in pkgsize.items():
+        total += sz
+    labels = []
+    values = []
+    other_value = 0
+    for (p, sz) in pkgsize.items():
+        if sz < (total * 0.01):
+            other_value += sz
+        else:
+            labels.append("%s (%d kB)" % (p, sz / 1000.))
+            values.append(sz)
+    labels.append("Other (%d kB)" % (other_value / 1000.))
+    values.append(other_value)
+
+    plt.figure()
+    patches, texts, autotexts = plt.pie(values, labels=labels,
+                                        autopct='%1.1f%%', shadow=True,
+                                        colors=colors)
+    # Reduce text size
+    proptease = fm.FontProperties()
+    proptease.set_size('xx-small')
+    plt.setp(autotexts, fontproperties=proptease)
+    plt.setp(texts, fontproperties=proptease)
+
+    plt.title('Size per package')
+    plt.savefig(outputf)
+
+#
+# Generate a CSV file with statistics about the size of each file, its
+# size contribution to the package and to the overall system.
+#
+# pkgdict: dictionary with the name of the files as key, and as value
+# a dict containing the name of the package to which the files
+# belongs, and the size of the file. As returned by
+# build_package_dict.
+#
+# pkgsize: dictionary with the name of the package as a key, and the
+# size as the value, as returned by build_package_size.
+#
+# outputf: output CSV file
+#
+def gen_files_csv(pkgdict, pkgsizes, outputf):
+    total = 0
+    for (p, sz) in pkgsizes.items():
+        total += sz
+    with open(outputf, 'w') as csvfile:
+        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
+        wr.writerow(["File name",
+                     "Package name",
+                     "File size",
+                     "Package size",
+                     "File size in package (%)",
+                     "File size in system (%)"])
+        for (f, info) in pkgdict.items():
+            pkgname = info["pkg"]
+            filesize = info["size"]
+            pkgsize = pkgsizes[pkgname]
+            wr.writerow([f, pkgname, filesize, pkgsize, "%.4f" % (float(filesize) / pkgsize * 100), "%.4f" % (float(filesize) / total * 100)])
+
+
+#
+# Generate a CSV file with statistics about the size of each package,
+# and their size contribution to the overall system.
+#
+# pkgsize: dictionary with the name of the package as a key, and the
+# size as the value, as returned by build_package_size.
+#
+# outputf: output CSV file
+#
+def gen_packages_csv(pkgsizes, outputf):
+    total = 0
+    for (p, sz) in pkgsizes.items():
+        total += sz
+    with open(outputf, 'w') as csvfile:
+        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
+        wr.writerow(["Package name", "Package size", "Package size in system (%)"])
+        for (pkg, size) in pkgsizes.items():
+            wr.writerow([pkg, size, "%.4f" % (float(size) / total * 100)])
+
+parser = argparse.ArgumentParser(description='Draw build time graphs')
+
+parser.add_argument("--builddir", '-i', metavar="BUILDDIR", required=True,
+                    help="Buildroot output directory")
+parser.add_argument("--graph", '-g', metavar="GRAPH",
+                    help="Graph output file (.pdf or .png extension)")
+parser.add_argument("--file-size-csv", '-f', metavar="FILE_SIZE_CSV",
+                    help="CSV output file with file size statistics")
+parser.add_argument("--package-size-csv", '-p', metavar="PKG_SIZE_CSV",
+                    help="CSV output file with package size statistics")
+parser.add_argument("--skeleton-path", '-s', metavar="SKELETON_PATH", required=True,
+                    help="Path to the skeleton used for the system")
+args = parser.parse_args()
+
+# Find out which package installed what files
+pkgdict = build_package_dict(args.builddir)
+pkgdict.update(build_skeleton_dict(args.builddir, args.skeleton_path))
+
+# Collect the size installed by each package
+pkgsize = build_package_size(pkgdict, args.builddir)
+
+if args.graph:
+    draw_graph(pkgsize, args.graph)
+if args.file_size_csv:
+    gen_files_csv(pkgdict, pkgsize, args.file_size_csv)
+if args.package_size_csv:
+    gen_packages_csv(pkgsize, args.package_size_csv)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 4/5] Makefile: implement a size-stats target
  2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
                   ` (2 preceding siblings ...)
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 3/5] support/scripts: add size-stats script Thomas Petazzoni
@ 2015-02-05 21:19 ` Thomas Petazzoni
  2015-04-06 14:09   ` Arnout Vandecappelle
  2015-02-05 21:37 ` [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
  2015-02-07 14:37 ` Romain Naour
  5 siblings, 1 reply; 12+ messages in thread
From: Thomas Petazzoni @ 2015-02-05 21:19 UTC (permalink / raw)
  To: buildroot

This commit implements a size-stats target that calls the script of
the same name to generate the graph and CSV files related to package
and file sizes.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 Makefile | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/Makefile b/Makefile
index 6d4ae38..faa2e38 100644
--- a/Makefile
+++ b/Makefile
@@ -680,6 +680,17 @@ graph-depends: graph-depends-requirements
 	|tee $(BASE_DIR)/graphs/$(@).dot \
 	|dot $(BR2_GRAPH_DOT_OPTS) -T$(BR_GRAPH_OUT) -o $(BASE_DIR)/graphs/$(@).$(BR_GRAPH_OUT)
 
+size-stats:
+	@[ -f $(O)/build/packages-file-list.txt ] || \
+		{ echo "ERROR: No package size information available, please rebuild with BR2_COLLECT_FILE_SIZE_STATS" ; exit 1; }
+	@$(INSTALL) -d $(GRAPHS_DIR)
+	@cd "$(CONFIG_DIR)"; \
+	$(TOPDIR)/support/scripts/size-stats --builddir $(BASE_DIR) \
+		--graph $(BASE_DIR)/graphs/graph-size.$(BR_GRAPH_OUT) \
+		--file-size-csv $(BASE_DIR)/build/file-size-stats.csv \
+		--package-size-csv $(BASE_DIR)/build/package-size-stats.csv \
+		--skeleton-path $(TARGET_SKELETON)
+
 else # ifeq ($(BR2_HAVE_DOT_CONFIG),y)
 
 all: menuconfig
@@ -896,6 +907,7 @@ endif
 	@echo '  manual-epub            - build manual in ePub'
 	@echo '  graph-build            - generate graphs of the build times'
 	@echo '  graph-depends          - generate graph of the dependency tree'
+	@echo '  size-stats             - generate stats of the filesystem size'
 	@echo
 	@echo 'Miscellaneous:'
 	@echo '  source                 - download all sources needed for offline-build'
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 0/5] Graph about installed size per package
  2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
                   ` (3 preceding siblings ...)
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 4/5] Makefile: implement a size-stats target Thomas Petazzoni
@ 2015-02-05 21:37 ` Thomas Petazzoni
  2015-02-07 14:37 ` Romain Naour
  5 siblings, 0 replies; 12+ messages in thread
From: Thomas Petazzoni @ 2015-02-05 21:37 UTC (permalink / raw)
  To: buildroot

Hello,

On Thu,  5 Feb 2015 22:19:55 +0100, Thomas Petazzoni wrote:

> Thomas Petazzoni (5):
>   Makefile: remove the graphs/ dir on 'make clean'
>   pkg-generic: add step_pkg_size global instrumentation hook
>   support/scripts: add size-stats script
>   Makefile: implement a size-stats target
>   Dummy testing packages

Oops, this last patch shouldn't have been sent. Fortunately, since it
is huge, it hasn't passed the filter of the list.

Peter, just reject this mail from me, as it doesn't make sense. The
series is really only the first four patches.

Sorry for the mess,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 0/5] Graph about installed size per package
  2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
                   ` (4 preceding siblings ...)
  2015-02-05 21:37 ` [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
@ 2015-02-07 14:37 ` Romain Naour
  5 siblings, 0 replies; 12+ messages in thread
From: Romain Naour @ 2015-02-07 14:37 UTC (permalink / raw)
  To: buildroot

Hi Thomas,

Le 05/02/2015 22:19, Thomas Petazzoni a ?crit :
> Hello,
> 
> Here is the third iteration of the patches adding a 'size-stats'
> feature, which mainly allows to generate a pie chart of the size
> contribution of each package to the overall root filesystem size.
> 
> Changes since v2:
> 
>  - Use print0 in find and -0 in xargs to support spaces in filenames,
>    just in case. Suggested by J?r?me Pouiller.
> 
>  - Support custom skeleton locations, instead of assuming the skeleton
>    comes from system/skeleton.
> 
>  - Fix the size-stats target to actually call the size-stats script,
>    and not graph-size, which was the old name of the script in
>    previous versions of the series.
> 
>  - Fix usage of the size-stats target in out of tree build
>    situations. For consistency reasons, we chose to mimic what
>    graph-depends already does, even if size-stats does not necessarily
>    need to be executed from within CONFIG_DIR.
> 
>  - Add one patch to make sure that 'make clean' removes the graphs/
>    subdirectory in the output directory, which wasn't done today. This
>    issue isn't specific to size-stats, and already existed with
>    graph-build and graph-depends.
> 
> Comments:
> 
>  - J?r?me Puiller suggested to add a warning for removed
>    files. However, many files are removed in the target-finalize step
>    (man pages, etc.) and we don't want to have gazillions of warnings
>    about these completely normal file removals.
> 
> Thomas Petazzoni (5):
>   Makefile: remove the graphs/ dir on 'make clean'
>   pkg-generic: add step_pkg_size global instrumentation hook
>   support/scripts: add size-stats script
>   Makefile: implement a size-stats target
>   Dummy testing packages

I reviewed and tested this series without encountering any particular problems.

However, I noticed that the files coming from the BR2_ROOTFS_OVERLAY produce a
warning. It is true that these files are not part of any package, but we can do
something like for the rootfs skeleton.

This requires to do something like that:

	$(call step_pkg_size_start,rootfs-overlay)
	@$(foreach d, $(call qstrip,$(BR2_ROOTFS_OVERLAY)), \
		$(call MESSAGE,"Copying overlay $(d)"); \
		rsync -a --ignore-times $(RSYNC_VCS_EXCLUSIONS) \
			--chmod=Du+w --exclude .empty --exclude '*~' \
			$(d)/ $(TARGET_DIR)$(sep))
	$(call step_pkg_size_end,rootfs-overlay)

Or call step_pkg_size-* for each overlay directories ?
This will allow to show the size contribution for each rootfs overlay.

What do you think ?

Best regards,
Romain

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean'
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean' Thomas Petazzoni
@ 2015-02-15 16:08   ` Yann E. MORIN
  2015-04-03 12:21   ` Thomas Petazzoni
  1 sibling, 0 replies; 12+ messages in thread
From: Yann E. MORIN @ 2015-02-15 16:08 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-02-05 22:19 +0100, Thomas Petazzoni spake thusly:
> Currently, a 'make clean' leaves the graphs/ subdirectory in the
> output directory. This commit defines a GRAPHS_DIR variable, used by
> the different graph-generating targets, and which gets cleaned up in
> the 'clean' target.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  Makefile | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 9c870d9..6d4ae38 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -161,6 +161,7 @@ TARGET_DIR := $(BASE_DIR)/target
>  # initial definition so that 'make clean' works for most users, even without
>  # .config. HOST_DIR will be overwritten later when .config is included.
>  HOST_DIR := $(BASE_DIR)/host
> +GRAPHS_DIR := $(BASE_DIR)/graphs
>  
>  LEGAL_INFO_DIR = $(BASE_DIR)/legal-info
>  REDIST_SOURCES_DIR_TARGET = $(LEGAL_INFO_DIR)/sources
> @@ -658,7 +659,7 @@ show-targets:
>  	@echo $(HOST_DEPS) $(TARGETS_HOST_DEPS) $(TARGETS) $(TARGETS_ROOTFS)
>  
>  graph-build: $(O)/build/build-time.log
> -	@install -d $(O)/graphs
> +	@install -d $(GRAPHS_DIR)
>  	$(foreach o,name build duration,./support/scripts/graph-build-time \
>  					--type=histogram --order=$(o) --input=$(<) \
>  					--output=$(O)/graphs/build.hist-$(o).$(BR_GRAPH_OUT) \

While at it, maybe you could take the ooportunity to change it here,
too?

> @@ -673,7 +674,7 @@ graph-depends-requirements:
>  		{ echo "ERROR: The 'dot' program from Graphviz is needed for graph-depends" >&2; exit 1; }
>  
>  graph-depends: graph-depends-requirements
> -	@$(INSTALL) -d $(O)/graphs
> +	@$(INSTALL) -d $(GRAPHS_DIR)
>  	@cd "$(CONFIG_DIR)"; \
>  	$(TOPDIR)/support/scripts/graph-depends $(BR2_GRAPH_DEPS_OPTS) \
>  	|tee $(BASE_DIR)/graphs/$(@).dot \

And here?

Also, you're missing some occurences in package/pkg-generic.mk:

  580 $(1)-graph-depends: graph-depends-requirements
  581             @$$(INSTALL) -d $$(O)/graphs
  582             @cd "$$(CONFIG_DIR)"; \
  583             $$(TOPDIR)/support/scripts/graph-depends -p $(1) $$(BR2_GRAPH_DEPS_OPTS) \
  584             |tee $$(O)/graphs/$$(@).dot \
  585             |dot $$(BR2_GRAPH_DOT_OPTS) -T$$(BR_GRAPH_OUT) -o $$(O)/graphs/$$(@).$$(BR_GRAPH_OUT)

Otherwise, looks good.

Regards,
Yann E. MORIN.

> @@ -834,7 +835,7 @@ printvars:
>  clean:
>  	rm -rf $(TARGET_DIR) $(BINARIES_DIR) $(HOST_DIR) \
>  		$(BUILD_DIR) $(BASE_DIR)/staging \
> -		$(LEGAL_INFO_DIR)
> +		$(LEGAL_INFO_DIR) $(GRAPHS_DIR)
>  
>  distclean: clean
>  ifeq ($(DL_DIR),$(TOPDIR)/dl)
> -- 
> 2.1.0
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
@ 2015-02-15 16:59   ` Yann E. MORIN
  0 siblings, 0 replies; 12+ messages in thread
From: Yann E. MORIN @ 2015-02-15 16:59 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2015-02-05 22:19 +0100, Thomas Petazzoni spake thusly:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/packages-file-list.txt. It can
> later be used to determine the size contribution of each package to
> the target root filesystem.
> 
> Note that in order to detect if a file installed by one package is
> later overriden by another package, we calculate the md5 of installed
> files and compare them at each installation of a new package.
> 
> This commit also adds a Config.in option to enable the collection of
> this data, as calculating the md5 of all installed files at the
> beginning and end of the installation of each package can be
> considered a time-consuming process which maybe some users will not be
> willing to suffer from.

Well, I'd like to challenge that assertion, so I did a pretty "big" build:
Kodi on the RPi, with all Kodi addons enabled, plus a few additional
usefull packages (connman, dropbear and the likes).

The config has about 148 packages (make show-targets |wc -w), and takes
roughly 1h and 20min on my machine.

Because I did not time md5sum after each package was installed, I simply
timed the md5sum at the end, on a completely-populated target/ . That
gives a pretty good upper-bound of the overhead each package would incur
(i.e. the very first packages would be so much faster as there are far
fewer files installed).

    $ du -hs target/
    247M    target/

    $ find target -type f |wc -l
    5150

    $ tar cf - target/ |wc -c
    242923520

    $ tar cf - target/ |time md5sum
    1d393aaf76ef6a7a462519f4b8b861e7  -
    0.36user 0.03system 0:00.41elapsed 96%CPU (0avgtext+0avgdata 748maxresident)k
    0inputs+0outputs (0major+233minor)pagefaults 0swaps

    $ date '+%s.%N'; \
      find target -type f -print0 2>/dev/null \
      | xargs -0 md5sum >/dev/null 2>&1; \
      date '+%s.%N'
    1424018960.577764719
    1424018960.994814894

So, the overhead of md5sum-ing each file independently (on a cache-hot
target/) is about less than 0.5s (only so-slightly bigger than md5sum-ing
the whole tarball thereof) .

Yes, 0.5s. Half-a-second. ;-)

That would give an upper-bound of the overhead for the whole build
somewhere in the 2-minute range (148*2*0.5). Out of a 1h 20min build.

Yes, md5 is a very fast hash. For reference, hashing a 512MiB blob takes
about less than a second.

I believe this overhead is negligible and we should unconditionally
enable that feature.

> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  Config.in              |  9 +++++++++
>  package/pkg-generic.mk | 36 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/Config.in b/Config.in
> index f5b6c73..58a5085 100644
> --- a/Config.in
> +++ b/Config.in
> @@ -613,6 +613,15 @@ config BR2_COMPILER_PARANOID_UNSAFE_PATH
>  	  toolchain (through gcc and binutils patches) and external
>  	  toolchain backends (through the external toolchain wrapper).
>  
> +config BR2_COLLECT_FILE_SIZE_STATS
> +	bool "collect statistics about installed file size"
> +	help
> +	  Enable this option to let Buildroot collect data about the
> +	  installed files. When this option is enabled, you will be
> +	  able to use the 'size-stats' make target, which will
> +	  generate a graph and CSV files giving statistics about the
> +	  installed size of each file and each package.
> +
>  endmenu
>  
>  endmenu
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 1b09955..db35a87 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,42 @@ define step_time
>  endef
>  GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  
> +# Hooks to collect statistics about installed files
> +ifeq ($(BR2_COLLECT_FILE_SIZE_STATS),y)
> +
> +# This hook will be called before the target installation of a
> +# package. We store in a file named $(1).filelist_before the list of
> +# files currently installed in the target. Note that the MD5 is also
> +# stored, in order to identify if the files are overwritten.
> +define step_pkg_size_start
> +	(cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
> +		$(BUILD_DIR)/$(1).filelist_before

Why don't you store that in the package's $(@D) ?

I don't really care, but if we're going to use $(BUILD_DIR) to store
temporary files, it might be time we introduce a better location
(probably somthing like BR2_TMP_DIR=$(BUILD_DIR)/.tmp/ )

We alreaduy ahve some temporary stuff written in there, and I find it
ugly (yes, I added some myself!).

Note: not related to your changes, of course, just prompted by them.

> +endef
> +
> +# This hook will be called after the target installation of a
> +# package. We store in a file named $(1).filelist_after the list
> +# of files (and their MD5) currently installed in the target. We then
> +# do a diff with the $(1).filelist_before to compute the list of
> +# files installed by this package.
> +define step_pkg_size_end
> +	(cd $(TARGET_DIR); find . -type f -print0 | xargs -0 md5sum) | sort > \
> +		$(BUILD_DIR)/$(1).filelist_after
> +	comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \
> +		while read hash file ; do \
> +			echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \
> +		done
> +	$(RM) -f $(BUILD_DIR)/$(1).filelist_before \
> +		$(BUILD_DIR)/$(1).filelist_after
> +endef
> +
> +define step_pkg_size
> +	$(if $(filter install-target,$(2)),\
> +		$(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> +		$(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef

When I introduced the instrumentation hooks, I did not envision they
would be used like that, directly as Makefile code.

What I expected is we would be using scripts (python, shell, whatever!)
somewhere in support/ , that would do their own filtering.

It's pretty fascinating how we all differ in reasoning! :-)

Regards,
Yann E. MORIN.

> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> +endif
> +
>  # User-supplied script
>  ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),)
>  define step_user
> -- 
> 2.1.0
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean'
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean' Thomas Petazzoni
  2015-02-15 16:08   ` Yann E. MORIN
@ 2015-04-03 12:21   ` Thomas Petazzoni
  1 sibling, 0 replies; 12+ messages in thread
From: Thomas Petazzoni @ 2015-04-03 12:21 UTC (permalink / raw)
  To: buildroot

Hello,

On Thu,  5 Feb 2015 22:19:56 +0100, Thomas Petazzoni wrote:
> Currently, a 'make clean' leaves the graphs/ subdirectory in the
> output directory. This commit defines a GRAPHS_DIR variable, used by
> the different graph-generating targets, and which gets cleaned up in
> the 'clean' target.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

As suggested by Yann, I've changed the patch to use GRAPHS_DIR in
various other places to replace $(O)/graphs, and applied.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 3/5] support/scripts: add size-stats script
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 3/5] support/scripts: add size-stats script Thomas Petazzoni
@ 2015-04-06 14:02   ` Arnout Vandecappelle
  0 siblings, 0 replies; 12+ messages in thread
From: Arnout Vandecappelle @ 2015-04-06 14:02 UTC (permalink / raw)
  To: buildroot

On 05/02/15 22:19, Thomas Petazzoni wrote:
> This new script uses the data collected by the step_pkg_size
> instrumentation hook to generate a pie chart of the size contribution
> of each package to the target root filesystem, and two CSV files with
> statistics about the package size and file size. To achieve this, it
> looks at each file in $(TARGET_DIR), and using the
> packages-file-list.txt information collected by the step_pkg_size
> hook, it determines to which package the file belongs. It is therefore
> able to give the size installed by each package.

 I'm reviewing purely the python coding style here. Found no major shortcomings,
just a few things that can be done in a nicer way, so

Reviewed-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>

> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  support/scripts/size-stats | 231 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 231 insertions(+)
>  create mode 100755 support/scripts/size-stats
> 
> diff --git a/support/scripts/size-stats b/support/scripts/size-stats
> new file mode 100755
> index 0000000..a5c0c74
> --- /dev/null
> +++ b/support/scripts/size-stats
> @@ -0,0 +1,231 @@
> +#!/usr/bin/env python
> +
> +# Copyright (C) 2014 by Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> +
> +import sys
> +import os
> +import os.path
> +import argparse
> +import csv
> +
> +try:
> +    import matplotlib.font_manager as fm
> +    import matplotlib.pyplot as plt
> +except ImportError:
> +    sys.stderr.write("You need python-matplotlib to generate the size graph\n")
> +    exit(1)
> +
> +colors = ['#e60004', '#009836', '#2e1d86', '#ffed00',
> +          '#0068b5', '#f28e00', '#940084', '#97c000']
> +
> +#
> +# This function returns a dict containing as keys the files present in
> +# the filesystem skeleton, and as value, the string "skeleton". It is
> +# used to simulate a fake "skeleton" package, to assign the files from
> +# the skeleton to some package.
> +#
> +# builddir: path to the Buildroot output directory
> +# skeleton_path: path to the rootfs skeleton
> +#
> +def build_skeleton_dict(builddir, skeleton_path):
> +    skeleton_files = {}
> +    for root, _, files in os.walk(skeleton_path):
> +        for f in files:
> +            if f == ".empty":
> +                continue
> +            frelpath = os.path.relpath(os.path.join(root, f), skeleton_path)
> +            # Get the real size of the installed file
> +            targetpath = os.path.join(builddir, "target", frelpath)
> +            if os.path.islink(targetpath):
> +                continue

 There should be a similar escape for files that don't exist in target (removed
by finalize script).

> +            sz = os.stat(targetpath).st_size
> +            skeleton_files[frelpath] = { 'pkg': "skeleton", 'size': sz }
> +    return skeleton_files
> +
> +#
> +# This function returns a dict where each key is the path of a file in
> +# the root filesystem, and the value is a dict containing two
> +# elements: the name of the package to which this file belongs (key:
> +# pkg) and the size of the file (key: size).
> +#
> +# builddir: path to the Buildroot output directory
> +#
> +def build_package_dict(builddir):
> +    pkgdict = {}
> +    with open(os.path.join(builddir, "build", "packages-file-list.txt")) as filelistf:
> +        for l in filelistf.readlines():
> +            f = l.split(",")

 I think a nicer way to write this is

pkg,fpath = l.split(",")

> +            fpath = f[1].strip().replace("./", "")

 Not likely to occur, but this will also replace foo./bar to foobar.

> +            fullpath = os.path.join(builddir, "target", fpath)
> +            if not os.path.exists(fullpath):
> +                continue
> +            pkg = f[0]
> +            sz = os.stat(fullpath).st_size
> +            pkgdict[fpath] = { 'pkg': pkg, 'size': sz }

 This entire part could be refactored with the skeleton handling. I think that
escaping symlinks is also relevant here.

> +    return pkgdict
> +
> +#
> +# This function builds a dictionary that contains the name of a
> +# package as key, and the size of the files installed by this package
> +# as the value.
> +#
> +# pkgdict: dictionary with the name of the files as key, and as value
> +# a dict containing the name of the package to which the files
> +# belongs, and the size of the file. As returned by
> +# build_package_dict.
> +#
> +# builddir: path to the Buildroot output directory
> +#
> +def build_package_size(pkgdict, builddir):
> +    pkgsize = {}
> +
> +    for root, _, files in os.walk(os.path.join(builddir, "target")):
> +        for f in files:
> +            fpath = os.path.join(root, f)
> +            if os.path.islink(fpath):
> +                continue

 Ah, you escape the link here... Makes sense of course otherwise it would become
unknown. But why is it different for the skeleton?

> +            frelpath = os.path.relpath(fpath, os.path.join(builddir, "target"))
> +            if not frelpath in pkgdict:
> +                print("WARNING: %s is not part of any package" % frelpath)
> +                pkg = "unknown"
> +            else:
> +                pkg = pkgdict[frelpath]["pkg"]
> +
> +            if pkg in pkgsize:
> +                pkgsize[pkg] += os.path.getsize(fpath)
> +            else:
> +                pkgsize[pkg] = os.path.getsize(fpath)

 Use a defaultdict [1]

import collections
pkgsize = collections.defaultdict(int)
pkgsize[pkg] += os.path.getsize(fpath)

> +
> +    return pkgsize
> +
> +#
> +# Given a dict returned by build_package_size(), this function
> +# generates a pie chart of the size installed by each package.
> +#
> +# pkgsize: dictionary with the name of the package as a key, and the
> +# size as the value, as returned by build_package_size.
> +#
> +# outputf: output file for the graph
> +#
> +def draw_graph(pkgsize, outputf):
> +    total = 0
> +    for (p, sz) in pkgsize.items():
> +        total += sz

total = sum(pkgsize.values())

> +    labels = []
> +    values = []
> +    other_value = 0
> +    for (p, sz) in pkgsize.items():
> +        if sz < (total * 0.01):
> +            other_value += sz
> +        else:
> +            labels.append("%s (%d kB)" % (p, sz / 1000.))
> +            values.append(sz)
> +    labels.append("Other (%d kB)" % (other_value / 1000.))
> +    values.append(other_value)
> +
> +    plt.figure()
> +    patches, texts, autotexts = plt.pie(values, labels=labels,
> +                                        autopct='%1.1f%%', shadow=True,
> +                                        colors=colors)
> +    # Reduce text size
> +    proptease = fm.FontProperties()
> +    proptease.set_size('xx-small')
> +    plt.setp(autotexts, fontproperties=proptease)
> +    plt.setp(texts, fontproperties=proptease)
> +
> +    plt.title('Size per package')
> +    plt.savefig(outputf)
> +
> +#
> +# Generate a CSV file with statistics about the size of each file, its
> +# size contribution to the package and to the overall system.

 I don't think this file is very useful, but obviously it doesn't hurt to
generate it.

> +#
> +# pkgdict: dictionary with the name of the files as key, and as value
> +# a dict containing the name of the package to which the files
> +# belongs, and the size of the file. As returned by
> +# build_package_dict.

 Since it has files as dict, I would think it makes more sense to call it filesdict.

> +#
> +# pkgsize: dictionary with the name of the package as a key, and the
> +# size as the value, as returned by build_package_size.
> +#
> +# outputf: output CSV file
> +#
> +def gen_files_csv(pkgdict, pkgsizes, outputf):
> +    total = 0
> +    for (p, sz) in pkgsizes.items():
> +        total += sz

 total = sum(pkgsizes.values())

> +    with open(outputf, 'w') as csvfile:
> +        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
> +        wr.writerow(["File name",
> +                     "Package name",
> +                     "File size",
> +                     "Package size",
> +                     "File size in package (%)",
> +                     "File size in system (%)"])
> +        for (f, info) in pkgdict.items():
> +            pkgname = info["pkg"]
> +            filesize = info["size"]

 Given the way that this is used, I would define the pkgdict as a dict of pairs
instead of a dict of dicts. So this would become

for f, (pkgname, filesize) in pkgdict.items():

> +            pkgsize = pkgsizes[pkgname]
> +            wr.writerow([f, pkgname, filesize, pkgsize, "%.4f" % (float(filesize) / pkgsize * 100), "%.4f" % (float(filesize) / total * 100)])

 Can this long line be split? Also, I find .4f a bit too many digits - .1 ought
to be enough. Anything below .1% is really 0, no?

> +
> +
> +#
> +# Generate a CSV file with statistics about the size of each package,
> +# and their size contribution to the overall system.
> +#
> +# pkgsize: dictionary with the name of the package as a key, and the
> +# size as the value, as returned by build_package_size.
> +#
> +# outputf: output CSV file
> +#
> +def gen_packages_csv(pkgsizes, outputf):

 In the other functions it's called pkgsize instead of pkgsizes.

> +    total = 0
> +    for (p, sz) in pkgsizes.items():
> +        total += sz

 total = sum(well you get the drift :-)

> +    with open(outputf, 'w') as csvfile:
> +        wr = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
> +        wr.writerow(["Package name", "Package size", "Package size in system (%)"])
> +        for (pkg, size) in pkgsizes.items():
> +            wr.writerow([pkg, size, "%.4f" % (float(size) / total * 100)])
> +
> +parser = argparse.ArgumentParser(description='Draw build time graphs')
> +
> +parser.add_argument("--builddir", '-i', metavar="BUILDDIR", required=True,
> +                    help="Buildroot output directory")
> +parser.add_argument("--graph", '-g', metavar="GRAPH",
> +                    help="Graph output file (.pdf or .png extension)")
> +parser.add_argument("--file-size-csv", '-f', metavar="FILE_SIZE_CSV",
> +                    help="CSV output file with file size statistics")
> +parser.add_argument("--package-size-csv", '-p', metavar="PKG_SIZE_CSV",
> +                    help="CSV output file with package size statistics")
> +parser.add_argument("--skeleton-path", '-s', metavar="SKELETON_PATH", required=True,
> +                    help="Path to the skeleton used for the system")
> +args = parser.parse_args()
> +
> +# Find out which package installed what files
> +pkgdict = build_package_dict(args.builddir)
> +pkgdict.update(build_skeleton_dict(args.builddir, args.skeleton_path))

 No, skeleton should be first, packages afterwards. If a package overwrites a
skeleton file, the package should win.

 Also, there should be something similar like skeleton for the overlay(s), and
that should of course come after the packages.


 Regards,
 Arnout

> +
> +# Collect the size installed by each package
> +pkgsize = build_package_size(pkgdict, args.builddir)
> +
> +if args.graph:
> +    draw_graph(pkgsize, args.graph)
> +if args.file_size_csv:
> +    gen_files_csv(pkgdict, pkgsize, args.file_size_csv)
> +if args.package_size_csv:
> +    gen_packages_csv(pkgsize, args.package_size_csv)
> 


[1] http://jtauber.com/blog/2008/02/27/evolution_of_default_dictionaries_in_python/



-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7CB5 E4CC 6C2E EFD4 6E3D A754 F963 ECAB 2450 2F1F

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCHv3 4/5] Makefile: implement a size-stats target
  2015-02-05 21:19 ` [Buildroot] [PATCHv3 4/5] Makefile: implement a size-stats target Thomas Petazzoni
@ 2015-04-06 14:09   ` Arnout Vandecappelle
  0 siblings, 0 replies; 12+ messages in thread
From: Arnout Vandecappelle @ 2015-04-06 14:09 UTC (permalink / raw)
  To: buildroot

On 05/02/15 22:19, Thomas Petazzoni wrote:
> This commit implements a size-stats target that calls the script of
> the same name to generate the graph and CSV files related to package
> and file sizes.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
>  Makefile | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index 6d4ae38..faa2e38 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -680,6 +680,17 @@ graph-depends: graph-depends-requirements
>  	|tee $(BASE_DIR)/graphs/$(@).dot \
>  	|dot $(BR2_GRAPH_DOT_OPTS) -T$(BR_GRAPH_OUT) -o $(BASE_DIR)/graphs/$(@).$(BR_GRAPH_OUT)
>  
> +size-stats:
> +	@[ -f $(O)/build/packages-file-list.txt ] || \
> +		{ echo "ERROR: No package size information available, please rebuild with BR2_COLLECT_FILE_SIZE_STATS" ; exit 1; }

 BR2_COLLECT_FILE_SIZE_STATS=y

 Also the line is too long. And perhaps instead of "rebuild" it should say "do a
clean build", otherwise stupid people will just do another make with the option
enabled but nothing will actually be done.


> +	@$(INSTALL) -d $(GRAPHS_DIR)

 graph-depends is the only one that uses install -d, all the others use mkdir
-p. So let's stick to that.

> +	@cd "$(CONFIG_DIR)"; \

 Why is this needed? In fact, it's wrong, because TARGET_SKELETON will be
relative to TOPDIR if it's given as a relative path. All the other paths are
absolute paths, so there's no reason to cd AFAICS.

 Also, I'd prefer $(Q) instead of @.


 Regards,
 Arnout


> +	$(TOPDIR)/support/scripts/size-stats --builddir $(BASE_DIR) \
> +		--graph $(BASE_DIR)/graphs/graph-size.$(BR_GRAPH_OUT) \
> +		--file-size-csv $(BASE_DIR)/build/file-size-stats.csv \
> +		--package-size-csv $(BASE_DIR)/build/package-size-stats.csv \
> +		--skeleton-path $(TARGET_SKELETON)
> +
>  else # ifeq ($(BR2_HAVE_DOT_CONFIG),y)
>  
>  all: menuconfig
> @@ -896,6 +907,7 @@ endif
>  	@echo '  manual-epub            - build manual in ePub'
>  	@echo '  graph-build            - generate graphs of the build times'
>  	@echo '  graph-depends          - generate graph of the dependency tree'
> +	@echo '  size-stats             - generate stats of the filesystem size'
>  	@echo
>  	@echo 'Miscellaneous:'
>  	@echo '  source                 - download all sources needed for offline-build'
> 


-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7CB5 E4CC 6C2E EFD4 6E3D A754 F963 ECAB 2450 2F1F

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-06 14:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-05 21:19 [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
2015-02-05 21:19 ` [Buildroot] [PATCHv3 1/5] Makefile: remove the graphs/ dir on 'make clean' Thomas Petazzoni
2015-02-15 16:08   ` Yann E. MORIN
2015-04-03 12:21   ` Thomas Petazzoni
2015-02-05 21:19 ` [Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook Thomas Petazzoni
2015-02-15 16:59   ` Yann E. MORIN
2015-02-05 21:19 ` [Buildroot] [PATCHv3 3/5] support/scripts: add size-stats script Thomas Petazzoni
2015-04-06 14:02   ` Arnout Vandecappelle
2015-02-05 21:19 ` [Buildroot] [PATCHv3 4/5] Makefile: implement a size-stats target Thomas Petazzoni
2015-04-06 14:09   ` Arnout Vandecappelle
2015-02-05 21:37 ` [Buildroot] [PATCHv3 0/5] Graph about installed size per package Thomas Petazzoni
2015-02-07 14:37 ` Romain Naour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.