All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 02/19] infra/pkg-generic: create $(@D) " Yann E. MORIN
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

... so that we have a consistent behaviour throughout all steps.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index f5cab2b9c2..7602765139 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -133,7 +133,6 @@ endif
 # Retrieve the archive
 $(BUILD_DIR)/%/.stamp_downloaded:
 	@$(call step_start,download)
-	$(foreach hook,$($(PKG)_PRE_DOWNLOAD_HOOKS),$(call $(hook))$(sep))
 # Only show the download message if it isn't already downloaded
 	$(Q)for p in $($(PKG)_ALL_DOWNLOADS); do \
 		if test ! -e $($(PKG)_DL_DIR)/`basename $$p` ; then \
@@ -141,6 +140,7 @@ $(BUILD_DIR)/%/.stamp_downloaded:
 			break ; \
 		fi ; \
 	done
+	$(foreach hook,$($(PKG)_PRE_DOWNLOAD_HOOKS),$(call $(hook))$(sep))
 	$(foreach p,$($(PKG)_ALL_DOWNLOADS),$(call DOWNLOAD,$(p))$(sep))
 	$(foreach hook,$($(PKG)_POST_DOWNLOAD_HOOKS),$(call $(hook))$(sep))
 	$(Q)mkdir -p $(@D)
@@ -302,9 +302,9 @@ $(BUILD_DIR)/%/.stamp_staging_installed:
 # Install to images dir
 $(BUILD_DIR)/%/.stamp_images_installed:
 	@$(call step_start,install-image)
+	@$(call MESSAGE,"Installing to images directory")
 	@mkdir -p $(BINARIES_DIR)
 	$(foreach hook,$($(PKG)_PRE_INSTALL_IMAGES_HOOKS),$(call $(hook))$(sep))
-	@$(call MESSAGE,"Installing to images directory")
 	+$($(PKG)_INSTALL_IMAGES_CMDS)
 	$(foreach hook,$($(PKG)_POST_INSTALL_IMAGES_HOOKS),$(call $(hook))$(sep))
 	@$(call step_end,install-image)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 02/19] infra/pkg-generic: create $(@D) before running PRE_HOOKS
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 03/19] infra/pkg-generic: introduce new stampfile at the beginning of all steps Yann E. MORIN
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 7602765139..6581ffa5e8 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -140,18 +140,18 @@ $(BUILD_DIR)/%/.stamp_downloaded:
 			break ; \
 		fi ; \
 	done
+	$(Q)mkdir -p $(@D)
 	$(foreach hook,$($(PKG)_PRE_DOWNLOAD_HOOKS),$(call $(hook))$(sep))
 	$(foreach p,$($(PKG)_ALL_DOWNLOADS),$(call DOWNLOAD,$(p))$(sep))
 	$(foreach hook,$($(PKG)_POST_DOWNLOAD_HOOKS),$(call $(hook))$(sep))
-	$(Q)mkdir -p $(@D)
 	@$(call step_end,download)
 	$(Q)touch $@
 
 # Retrieve actual source archive, e.g. for prebuilt external toolchains
 $(BUILD_DIR)/%/.stamp_actual_downloaded:
 	@$(call step_start,actual-download)
+	$(Q)mkdir -p $(@D)
 	$(call DOWNLOAD,$($(PKG)_ACTUAL_SOURCE_SITE)/$($(PKG)_ACTUAL_SOURCE_TARBALL))
-	$(Q)mkdir -p $(@D)
 	@$(call step_end,actual-download)
 	$(Q)touch $@
 
@@ -159,8 +159,8 @@ $(BUILD_DIR)/%/.stamp_actual_downloaded:
 $(BUILD_DIR)/%/.stamp_extracted:
 	@$(call step_start,extract)
 	@$(call MESSAGE,"Extracting")
+	$(Q)mkdir -p $(@D)
 	$(foreach hook,$($(PKG)_PRE_EXTRACT_HOOKS),$(call $(hook))$(sep))
-	$(Q)mkdir -p $(@D)
 	$($(PKG)_EXTRACT_CMDS)
 # some packages have messed up permissions inside
 	$(Q)chmod -R +rw $(@D)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 03/19] infra/pkg-generic: introduce new stampfile at the beginning of all steps
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 02/19] infra/pkg-generic: create $(@D) " Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 04/19] infra/pkg-generic: use \0 to separate .la files as they are found Yann E. MORIN
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Currently, we use the existing stampfiles as a standard make feature to
organise the dependency tree between steps and packages, and so that a
step that successfully finished is not re-done in a later build.

Additionally, there is one stampfile that gained extra use in the recent
past: since 7fb6e782542f (core/instrumentation: shave minutes off the
build time) the built stampfile is now used as a reference to detect
files installed by a package.

However, this falls short during development, when a user may want to
re-install a built-early package without rebuilding it (i.e. make
foo-reinstall). In this case, the built stampfile is not touched, and is
still dated from way back when the package was first built. As such,
almost all files in target (or staging or host) are newer than that, and
so those files are all now accounted for to that package, when in fact
only a minor subset may be accountable to it.

So, we need to introduce a new way to memorise the beginning of a step,
so that we can properly find files installed by a package, even during
development.

To that goal, we introduce a new per-step stampfile, which gets touched
early in the step, and which can now serve as a starting point in time
for that step.

This new stampfile is named after the existing stampfile, with an
appended '_before' tag to it, and thus is always $@_before (note that,
in Makefiles, variables are always 1-char, unless they are in
parenthesis, so $@_before is exactly $(@)_before, but since we use $@
everywhere, we keep using it here, even though it can be slightly
confusing).

Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 6581ffa5e8..1b8febe8c5 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -141,6 +141,7 @@ $(BUILD_DIR)/%/.stamp_downloaded:
 		fi ; \
 	done
 	$(Q)mkdir -p $(@D)
+	$(Q)touch $@_before
 	$(foreach hook,$($(PKG)_PRE_DOWNLOAD_HOOKS),$(call $(hook))$(sep))
 	$(foreach p,$($(PKG)_ALL_DOWNLOADS),$(call DOWNLOAD,$(p))$(sep))
 	$(foreach hook,$($(PKG)_POST_DOWNLOAD_HOOKS),$(call $(hook))$(sep))
@@ -160,6 +161,7 @@ $(BUILD_DIR)/%/.stamp_extracted:
 	@$(call step_start,extract)
 	@$(call MESSAGE,"Extracting")
 	$(Q)mkdir -p $(@D)
+	$(Q)touch $@_before
 	$(foreach hook,$($(PKG)_PRE_EXTRACT_HOOKS),$(call $(hook))$(sep))
 	$($(PKG)_EXTRACT_CMDS)
 # some packages have messed up permissions inside
@@ -174,6 +176,7 @@ $(BUILD_DIR)/%/.stamp_rsynced:
 	@$(call step_start,rsync)
 	@$(call MESSAGE,"Syncing from source dir $(SRCDIR)")
 	@mkdir -p $(@D)
+	$(Q)touch $@_before
 	$(foreach hook,$($(PKG)_PRE_RSYNC_HOOKS),$(call $(hook))$(sep))
 	@test -d $(SRCDIR) || (echo "ERROR: $(SRCDIR) does not exist" ; exit 1)
 	rsync -au --chmod=u=rwX,go=rX $(RSYNC_VCS_EXCLUSIONS) $($(PKG)_OVERRIDE_SRCDIR_RSYNC_EXCLUSIONS) $(call qstrip,$(SRCDIR))/ $(@D)
@@ -193,6 +196,7 @@ $(BUILD_DIR)/%/.stamp_patched: PATCH_BASE_DIRS += $(addsuffix /$(RAWNAME),$(call
 $(BUILD_DIR)/%/.stamp_patched:
 	@$(call step_start,patch)
 	@$(call MESSAGE,"Patching")
+	$(Q)touch $@_before
 	$(foreach hook,$($(PKG)_PRE_PATCH_HOOKS),$(call $(hook))$(sep))
 	$(foreach p,$($(PKG)_PATCH),$(APPLY_PATCHES) $(@D) $($(PKG)_DL_DIR) $(notdir $(p))$(sep))
 	$(Q)( \
@@ -219,6 +223,7 @@ $(foreach dir,$(call qstrip,$(BR2_GLOBAL_PATCH_DIR)),\
 $(BUILD_DIR)/%/.stamp_configured:
 	@$(call step_start,configure)
 	@$(call MESSAGE,"Configuring")
+	$(Q)touch $@_before
 	$(foreach hook,$($(PKG)_PRE_CONFIGURE_HOOKS),$(call $(hook))$(sep))
 	$($(PKG)_CONFIGURE_CMDS)
 	$(foreach hook,$($(PKG)_POST_CONFIGURE_HOOKS),$(call $(hook))$(sep))
@@ -229,6 +234,7 @@ $(BUILD_DIR)/%/.stamp_configured:
 $(BUILD_DIR)/%/.stamp_built::
 	@$(call step_start,build)
 	@$(call MESSAGE,"Building")
+	$(Q)touch $@_before
 	$(foreach hook,$($(PKG)_PRE_BUILD_HOOKS),$(call $(hook))$(sep))
 	+$($(PKG)_BUILD_CMDS)
 	$(foreach hook,$($(PKG)_POST_BUILD_HOOKS),$(call $(hook))$(sep))
@@ -239,6 +245,7 @@ $(BUILD_DIR)/%/.stamp_built::
 $(BUILD_DIR)/%/.stamp_host_installed:
 	@$(call step_start,install-host)
 	@$(call MESSAGE,"Installing to host directory")
+	$(Q)touch $@_before
 	@mkdir -p $(HOST_DIR)
 	$(foreach hook,$($(PKG)_PRE_INSTALL_HOOKS),$(call $(hook))$(sep))
 	+$($(PKG)_INSTALL_CMDS)
@@ -269,6 +276,7 @@ $(BUILD_DIR)/%/.stamp_host_installed:
 $(BUILD_DIR)/%/.stamp_staging_installed:
 	@$(call step_start,install-staging)
 	@$(call MESSAGE,"Installing to staging directory")
+	$(Q)touch $@_before
 	@mkdir -p $(STAGING_DIR)
 	$(foreach hook,$($(PKG)_PRE_INSTALL_STAGING_HOOKS),$(call $(hook))$(sep))
 	+$($(PKG)_INSTALL_STAGING_CMDS)
@@ -303,6 +311,7 @@ $(BUILD_DIR)/%/.stamp_staging_installed:
 $(BUILD_DIR)/%/.stamp_images_installed:
 	@$(call step_start,install-image)
 	@$(call MESSAGE,"Installing to images directory")
+	$(Q)touch $@_before
 	@mkdir -p $(BINARIES_DIR)
 	$(foreach hook,$($(PKG)_PRE_INSTALL_IMAGES_HOOKS),$(call $(hook))$(sep))
 	+$($(PKG)_INSTALL_IMAGES_CMDS)
@@ -314,6 +323,7 @@ $(BUILD_DIR)/%/.stamp_images_installed:
 $(BUILD_DIR)/%/.stamp_target_installed:
 	@$(call step_start,install-target)
 	@$(call MESSAGE,"Installing to target")
+	$(Q)touch $@_before
 	@mkdir -p $(TARGET_DIR)
 	$(foreach hook,$($(PKG)_PRE_INSTALL_TARGET_HOOKS),$(call $(hook))$(sep))
 	+$($(PKG)_INSTALL_TARGET_CMDS)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 04/19] infra/pkg-generic: use \0 to separate .la files as they are found
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (2 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 03/19] infra/pkg-generic: introduce new stampfile at the beginning of all steps Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 05/19] infra/pkg-generic: tweak only .la files installed by the current package Yann E. MORIN
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Just in case some package is wild enough to install a .la file with a
space in its name (or worse, a \n).

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 1b8febe8c5..eb41f97241 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -294,7 +294,8 @@ $(BUILD_DIR)/%/.stamp_staging_installed:
 				$(addprefix $(STAGING_DIR)/usr/bin/,$($(PKG)_CONFIG_SCRIPTS)) ;\
 	fi
 	@$(call MESSAGE,"Fixing libtool files")
-	$(Q)find $(STAGING_DIR)/usr/lib* -name "*.la" | xargs --no-run-if-empty \
+	$(Q)find $(STAGING_DIR)/usr/lib* -name "*.la" -print0 \
+	| xargs -0 --no-run-if-empty \
 		$(SED) "s:$(BASE_DIR):@BASE_DIR@:g" \
 			-e "s:$(STAGING_DIR):@STAGING_DIR@:g" \
 			$(if $(TOOLCHAIN_EXTERNAL_INSTALL_DIR),\
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 05/19] infra/pkg-generic: tweak only .la files installed by the current package
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (3 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 04/19] infra/pkg-generic: use \0 to separate .la files as they are found Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 06/19] infra/pkg-generic: only list " Yann E. MORIN
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Currently, when we tweak the .la files, we do so unconditionally on all
.la files, even those we already fixed in a previous run.

This has the nasty side effect that each .la file will be reported as
being touched by all packages that are installed after the package that
actually installed said .la file.

We fix that by limiting the search for .la files to those that have been
actually touched during the install step, now that we have the proper
timestamp for it.

Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index eb41f97241..3de8a99675 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -294,7 +294,7 @@ $(BUILD_DIR)/%/.stamp_staging_installed:
 				$(addprefix $(STAGING_DIR)/usr/bin/,$($(PKG)_CONFIG_SCRIPTS)) ;\
 	fi
 	@$(call MESSAGE,"Fixing libtool files")
-	$(Q)find $(STAGING_DIR)/usr/lib* -name "*.la" -print0 \
+	$(Q)find $(STAGING_DIR)/usr/lib* -name "*.la" -print0 -newer $@_before \
 	| xargs -0 --no-run-if-empty \
 		$(SED) "s:$(BASE_DIR):@BASE_DIR@:g" \
 			-e "s:$(STAGING_DIR):@STAGING_DIR@:g" \
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 06/19] infra/pkg-generic: only list files installed by the current package
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (4 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 05/19] infra/pkg-generic: tweak only .la files installed by the current package Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 13:07   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 07/19] infra/pkg-generic: offload same-package filtering to check-uniq-file Yann E. MORIN
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Since 7fb6e782542f (core/instrumentation: shave minutes off the build
time), the built stampfile is used as a reference to detect files
installed by a package.

However, this falls short during development, when a user may want to
re-install a built-early package without rebuilding it (i.e. make
foo-reinstall). In this case, the built stampfile is not touched, and is
still dated from way back when the package was first built. As such,
almost all files in target (or staging or host) are newer than that, and
so those files are all now accounted for that package, when in fact only
a minor subset may be accountable to it.

We fix that by limiting the search for files that have been actually
touched during the install step, now that we have the proper timestamp
for it.

Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
---
 package/pkg-generic.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 3de8a99675..42aebeb49d 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -67,7 +67,7 @@ define step_pkg_size_inner
 	$(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
 	cd $(2); \
 	find . \( -type f -o -type l \) \
-		-newer $($(PKG)_DIR)/.stamp_built \
+		-newer $@_before \
 		-exec printf '$(1),%s\n' {} + \
 		>> $(BUILD_DIR)/packages-file-list$(3).txt
 endef
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 07/19] infra/pkg-generic: offload same-package filtering to check-uniq-file
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (5 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 06/19] infra/pkg-generic: only list " Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible Yann E. MORIN
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Commit d3dca1e9936b (core/pkg-generic: only save latest package list)
added filtering so that, upon a "pkg-rebuild", the check-uniq-file test
would not report the same package as installing the same file.

Additionally, it also ensured that the file-list file did not grow with
each rebuild, by removing all references of a package-installed files
before listing them again.

However, that second part also meant that the accounting may get wrong
upon a rebuild, in case the new installation step installs less files
than the previous, in which case previously installed files are no
longer assigned to the package.

It also hides the fact that the package has been re-installed.

Furthermore, the combined size of those three files lists is almost
negligible when compared to the rest of the build tree, so the growing
argument does not hold.

The only interesting feature of d3dca1e9936b, is thus to not report a
package that re-installs the same file.

So, we revert d3dca1e9936b but use a python set() instead of a list, to
store the packages that installed each file. Adding to a set, an entry
which is already in that set, does not create a new entry; it will be
there only once.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: John Keeping <john@metanate.com>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk           | 2 --
 support/scripts/check-uniq-files | 4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 42aebeb49d..7daea190a6 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -63,8 +63,6 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
 # $(2): base directory to search in
 # $(3): suffix of file  (optional)
 define step_pkg_size_inner
-	@touch $(BUILD_DIR)/packages-file-list$(3).txt
-	$(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
 	cd $(2); \
 	find . \( -type f -o -type l \) \
 		-newer $@_before \
diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index fbc6b5d6e7..eb92724e42 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -24,11 +24,11 @@ def main():
         sys.stderr.write('No type was provided\n')
         return False
 
-    file_to_pkg = defaultdict(list)
+    file_to_pkg = defaultdict(set)
     with open(args.packages_file_list[0], 'rb') as pkg_file_list:
         for line in pkg_file_list.readlines():
             pkg, _, file = line.rstrip(b'\n').partition(b',')
-            file_to_pkg[file].append(pkg)
+            file_to_pkg[file].add(pkg)
 
     for file in file_to_pkg:
         if len(file_to_pkg[file]) > 1:
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (6 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 07/19] infra/pkg-generic: offload same-package filtering to check-uniq-file Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-02-07 23:40   ` Arnout Vandecappelle
  2019-01-07 22:05 ` [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files Yann E. MORIN
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Currently, when there is at least one string we can't decode when
reporting the file and the packages that touched it, we fallback to not
decoding any string at all, which generates a report like:

    Warning: target file "b'/some/file'" is touched by more than one package: [b'toolchain', b'busybox']

This is not very nice, though, so we introduce a decoder that returns
the decoded string if possible, and falls back to returning the repr() of
the un-decoded string.

Also, using a set as argument to format() further yields a not-so-nice
output either (even if the decoding was OK):
    [u'toolchain', u'busybox']

So, we just join together all the elements of the set into a string,
which is what we pass to format().

Now the output is much nicer to look at:

    Warning: file "/some/file" is touched by more than one package: busybox, toolchain

and even in the case of an un-decodable string (with a manually tweaked
list, \xbd being ? in iso8859-15, and not a valid UTF-8 encoding):

    Warning: file "/some/file" is touched by more than one package: 'busyb\xbdx', toolchain

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/check-uniq-files | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index eb92724e42..e95a134168 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -7,6 +7,16 @@ from collections import defaultdict
 warn = 'Warning: {0} file "{1}" is touched by more than one package: {2}\n'
 
 
+# If possible, try to decode the binary string s with the user's locale.
+# If s contains characters that can't be decoded with that locale, return
+# the representation (in the user's locale) of the un-decoded string.
+def str_decode(s):
+    try:
+        return s.decode()
+    except UnicodeDecodeError:
+        return repr(s)
+
+
 def main():
     parser = argparse.ArgumentParser()
     parser.add_argument('packages_file_list', nargs='*',
@@ -32,16 +42,9 @@ def main():
 
     for file in file_to_pkg:
         if len(file_to_pkg[file]) > 1:
-            # If possible, try to decode the binary strings with
-            # the default user's locale
-            try:
-                sys.stderr.write(warn.format(args.type, file.decode(),
-                                             [p.decode() for p in file_to_pkg[file]]))
-            except UnicodeDecodeError:
-                # ... but fallback to just dumping them raw if they
-                # contain non-representable chars
-                sys.stderr.write(warn.format(args.type, file,
-                                             file_to_pkg[file]))
+            sys.stderr.write(warn.format(args.type, str_decode(file),
+                                         ", ".join([str_decode(p)
+                                                    for p in file_to_pkg[file]])))
 
 
 if __name__ == "__main__":
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (7 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 13:35   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python Yann E. MORIN
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Currently, each script that want to parse the package-files lists has to
invent its own parsing. So far, this was acceptable, as the format of
those files was relatively easy (line-based records, comma-separated
fields).

However, that format is not very resilient against weird filenames (e.g.
filenames with commas in the, or even with \n chars in them.

Furthermore, that format is not easily extensible.

So, introduce a parser, in python, that abstracts the format of these
files, and returns a dictionaries. Using dictionaries makes it easy for
callers to just ignore the fields they are not interested in, or even
are not aware of. Consequently, it will make it easy for us to introduce
new fields in the future.

Convert the two existing python scripts that parse those files.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>

---
Note: a shell-script also parses those files, it will be handled in a
subsequent change.
---
 support/scripts/brpkgutil.py     | 16 ++++++++++++++++
 support/scripts/check-uniq-files |  7 +++----
 support/scripts/size-stats       | 14 +++++++-------
 3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/support/scripts/brpkgutil.py b/support/scripts/brpkgutil.py
index e70d525353..d15b18845b 100644
--- a/support/scripts/brpkgutil.py
+++ b/support/scripts/brpkgutil.py
@@ -5,6 +5,22 @@ import sys
 import subprocess
 
 
+# Iterate on all records of the packages-file-list file passed as filename
+# Returns an iterator over a list of dictionaries. Each dictionary contains
+# these keys (others maybe added in the future):
+# 'file': the path of the file,
+# 'pkg':  the last package that installed that file
+def parse_pkg_file_list(path):
+    with open(path, 'rb') as f:
+        for rec in f.readlines():
+            l = rec.split(',0')
+            d = {
+                  'file': l[0],
+                  'pkg':  l[1],
+                }
+            yield d
+
+
 # Execute the "make <pkg>-show-version" command to get the version of a given
 # list of packages, and return the version formatted as a Python dictionary.
 def get_version(pkgs):
diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index e95a134168..7020462981 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -3,6 +3,7 @@
 import sys
 import argparse
 from collections import defaultdict
+from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
 
 warn = 'Warning: {0} file "{1}" is touched by more than one package: {2}\n'
 
@@ -35,10 +36,8 @@ def main():
         return False
 
     file_to_pkg = defaultdict(set)
-    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
-        for line in pkg_file_list.readlines():
-            pkg, _, file = line.rstrip(b'\n').partition(b',')
-            file_to_pkg[file].add(pkg)
+    for record in parse_pkg_file_list(args.packages_file_list[0]):
+        file_to_pkg[record['file']].add(record['pkg'])
 
     for file in file_to_pkg:
         if len(file_to_pkg[file]) > 1:
diff --git a/support/scripts/size-stats b/support/scripts/size-stats
index deea92e278..48cd834ab4 100755
--- a/support/scripts/size-stats
+++ b/support/scripts/size-stats
@@ -22,6 +22,7 @@ import os.path
 import argparse
 import csv
 import collections
+from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
 
 try:
     import matplotlib
@@ -66,13 +67,12 @@ def add_file(filesdict, relpath, abspath, pkg):
 #
 def build_package_dict(builddir):
     filesdict = {}
-    with open(os.path.join(builddir, "build", "packages-file-list.txt")) as filelistf:
-        for l in filelistf.readlines():
-            pkg, fpath = l.split(",", 1)
-            # remove the initial './' in each file path
-            fpath = fpath.strip()[2:]
-            fullpath = os.path.join(builddir, "target", fpath)
-            add_file(filesdict, fpath, fullpath, pkg)
+    fname = os.path.join(builddir, "build", "packages-file-list.txt")
+    for record in parse_pkg_file_list(fname):
+        # remove the initial './' in each file path
+        fpath = record['file'].strip()[2:]
+        fullpath = os.path.join(builddir, "target", fpath)
+        add_file(filesdict, fpath, fullpath, record['pkg'])
     return filesdict
 
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (8 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 14:56   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files Yann E. MORIN
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

The existing check-bin-arch script is written in shell, so it can't make
use of all the helpers we have in python, especially the parser for the
package-files lists.

Although this script is relatively clean as it is, it is not totally
fool-proof either, especially against weird filenames (e.g.
specially-crafted filenames with \n, or \b, etc...).

Also, shell scripts are often frowned upon but for the most mundane
processing, and this script is definitely not mundane.

Finally, shell scripts are slow, as all the processing the have to do is
more often than not done by spawning programs, and that is relatively
expensive.

Rewrite that script in python.

This allows to do cleaner processing and reuse the package-files list
parser.

There's however a drawback: the script grows substantially, in part
because of spawning the actual readelf call (there is no ELF parser in
the standard python library), and because we want to keep backward
compatible with old pythons that lack proper abstractions like
subprocess.DEVNULL et al.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas De Schampheleire <patrickdepinguin@gmail.com>
---
 support/scripts/check-bin-arch | 205 +++++++++++++++++++++++------------------
 1 file changed, 113 insertions(+), 92 deletions(-)

diff --git a/support/scripts/check-bin-arch b/support/scripts/check-bin-arch
index 66b8d89932..d4902163e7 100755
--- a/support/scripts/check-bin-arch
+++ b/support/scripts/check-bin-arch
@@ -1,92 +1,113 @@
-#!/usr/bin/env bash
-
-# List of hardcoded paths that should be ignored, as they may
-# contain binaries for an architecture different from the
-# architecture of the target.
-declare -a IGNORES=(
-	# Skip firmware files, they could be ELF files for other
-	# architectures
-	"/lib/firmware"
-	"/usr/lib/firmware"
-
-	# Skip kernel modules
-	# When building a 32-bit userland on 64-bit architectures, the kernel
-	# and its modules may still be 64-bit. To keep the basic
-	# check-bin-arch logic simple, just skip this directory.
-	"/lib/modules"
-	"/usr/lib/modules"
-
-	# Skip files in /usr/share, several packages (qemu,
-	# pru-software-support) legitimately install ELF binaries that
-	# are not for the target architecture
-	"/usr/share"
-
-	# Skip files in /lib/grub, since it is possible to have it
-	# for a different architecture (e.g. i386 grub on x86_64).
-	"/lib/grub"
-)
-
-while getopts p:l:r:a:i: OPT ; do
-	case "${OPT}" in
-	p) package="${OPTARG}";;
-	l) pkg_list="${OPTARG}";;
-	r) readelf="${OPTARG}";;
-	a) arch_name="${OPTARG}";;
-	i)
-		# Ensure we do have single '/' as separators,
-		# and that we have a leading and a trailing one.
-		pattern="$(sed -r -e 's:/+:/:g; s:^/*:/:; s:/*$:/:;' <<<"${OPTARG}")"
-		IGNORES+=("${pattern}")
-		;;
-	:) error "option '%s' expects a mandatory argument\n" "${OPTARG}";;
-	\?) error "unknown option '%s'\n" "${OPTARG}";;
-	esac
-done
-
-if test -z "${package}" -o -z "${pkg_list}" -o -z "${readelf}" -o -z "${arch_name}" ; then
-	echo "Usage: $0 -p <pkg> -l <pkg-file-list> -r <readelf> -a <arch name> [-i PATH ...]"
-	exit 1
-fi
-
-exitcode=0
-
-# Only split on new lines, for filenames-with-spaces
-IFS="
-"
-
-while read f; do
-	for ignore in "${IGNORES[@]}"; do
-		if [[ "${f}" =~ ^"${ignore}" ]]; then
-			continue 2
-		fi
-	done
-
-	# Skip symlinks. Some symlinks may have absolute paths as
-	# target, pointing to host binaries while we're building.
-	if [[ -L "${TARGET_DIR}/${f}" ]]; then
-		continue
-	fi
-
-	# Get architecture using readelf. We pipe through 'head -1' so
-	# that when the file is a static library (.a), we only take
-	# into account the architecture of the first object file.
-	arch=$(LC_ALL=C ${readelf} -h "${TARGET_DIR}/${f}" 2>&1 | \
-		       sed -r -e '/^  Machine: +(.+)/!d; s//\1/;' | head -1)
-
-	# If no architecture found, assume it was not an ELF file
-	if test "${arch}" = "" ; then
-		continue
-	fi
-
-	# Architecture is correct
-	if test "${arch}" = "${arch_name}" ; then
-		continue
-	fi
-
-	printf 'ERROR: architecture for "%s" is "%s", should be "%s"\n' \
-	       "${f}" "${arch}" "${arch_name}"
-
-	exitcode=1
-done < <( sed -r -e "/^${package},\.(.+)$/!d; s//\1/;" ${pkg_list} )
-
-exit ${exitcode}
+#!/usr/bin/env python
+
+import argparse
+import os
+import re
+import subprocess
+import sys
+from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
+
+
+# List of hardcoded paths that should be ignored, as they may contain
+# binaries for an architecture different from the architecture of the
+# target
+IGNORES = {
+    # Skip firmware files
+    # They could be ELF files for other architectures.
+    '/lib/firmware',
+    '/usr/lib/firmware',
+
+    # Skip kernel modules
+    # When building a 32-bit userland on 64-bit architectures, the
+    # kernel and its modules may still be 64-bit. To keep the basic
+    # check-bin-arch logic simple, just skip these directories
+    '/lib/modules',
+    '/usr/lib/modules',
+
+    # Skip files in /usr/share
+    # Several packages (qemu, pru-software-support) legitimately install
+    # ELF binaries that are not for the target architecture
+    '/usr/share',
+
+    # Skip files in /lib/grub
+    # It is possible to have it for a different architecture (e.g. i386
+    # grub on x86_64)
+    '/lib/grub',
+}
+
+
+ERROR = 'ERROR: architecture for "%s" is "%s", should be "%s"\n'
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
+    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
+    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
+    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
+    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')
+    args = parser.parse_args()
+
+    if args.ignore is not None:
+        # Ensure we do have single '/' as separators, and that we have a
+        # leading and a trailing one, then append to the global list.
+        for pattern in args.ignore:
+            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
+
+    ignores_re = set()
+    for i in IGNORES:
+        ignores_re.add(re.compile(i))
+
+    arch_re = re.compile('^  Machine: +(.+)')
+
+    target_dir = os.environ['TARGET_DIR']
+
+    exit_code = 0
+    for record in parse_pkg_file_list(args.pkg_list):
+        if record['pkg'] != args.package:
+            continue
+
+        fpath = record['file']
+
+        ignored = False
+        for i in ignores_re:
+            if i.match(fpath):
+                ignored = True
+                break
+        if ignored:
+            continue
+
+        # Skip symlinks. Some symlinks may have absolute paths as
+        # target, pointing to host binaries while we're building.
+        if os.path.islink(os.path.join(target_dir, fpath)):
+            continue
+
+        cmd = [args.readelf, '-h', os.path.join(target_dir, fpath)]
+        try:
+            with open(os.devnull, 'wb') as devnull:
+                p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=devnull,
+                                     universal_newlines=True)
+                stdout = p.communicate()[0].split('\n')[:-1]
+                ret = p.returncode
+        except OSError:
+            ret = 1
+            stdout = list()
+        # If readelf returned in error, or returned nothing on stdout,
+        # then that was not an ELF file, or we may not yet have readelf
+        # (e.g. in skeleton, before toolchain)
+        if ret != 0 or len(stdout) == 0:
+            continue
+
+        for line in stdout:
+            if arch_re.match(line):
+                arch = arch_re.sub(r'\1', line)
+                if arch != args.arch:
+                    print(ERROR.format(fpath, arch, args.arch))
+                    exit_code = 1
+                break
+
+    return exit_code
+
+
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (9 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 15:07   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files Yann E. MORIN
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

The existing format for the packages-files lists has two drawbacks:

  - it is not very resilient against filenames with weird characters,
    like \n or \b or spaces...

  - it is not easily expandable, partly because of the above.

AS such, introduce a new format for those files, that solves both
issues.

First, we must find a way that unambiguously separate fields. There is
one single byte that can never ever occur in filenames or package names,
i.e. the NULL character. So, we use that as a field separator.

Second, we must find a way to unambiguously separate records. Except for
\0, any character may occur in filenames, but the other existing field
we currently have is the package name, which we do know does not contain
any weird byte (e.g. it's basically limited to [[:alnum:]_-]). Thus, we
can't use a single character as record separator. A solution is to use
\0\n as the record separator.

Thirdly, we must ensure that filenames never mess up with our
separators. By making the filename the first field, we can be sure that
it is properly terminated by a field separator, and that any leading \n
does not interfere with a previous field separator to form a spurious
record separator.

So, the new format is now (without spaces):

  filename \0 package-name \0\n

Update the parser accordingly.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
 package/pkg-generic.mk       |  8 ++++++--
 support/scripts/brpkgutil.py | 28 ++++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 7daea190a6..d261b5bf76 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -59,6 +59,10 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
 
 # The suffix is typically empty for the target variant, for legacy backward
 # compatibility.
+# Files are record-formatted, with \0\n as record separator, and \0 as
+# field separator. A record is made of these fields:
+#  - file path
+#  - package name
 # $(1): package name
 # $(2): base directory to search in
 # $(3): suffix of file  (optional)
@@ -66,8 +70,8 @@ define step_pkg_size_inner
 	cd $(2); \
 	find . \( -type f -o -type l \) \
 		-newer $@_before \
-		-exec printf '$(1),%s\n' {} + \
-		>> $(BUILD_DIR)/packages-file-list$(3).txt
+	|sed -r -e 's/$$/\x00$(1)\x00/' \
+	>> $(BUILD_DIR)/packages-file-list$(3).txt
 endef
 
 define step_pkg_size
diff --git a/support/scripts/brpkgutil.py b/support/scripts/brpkgutil.py
index d15b18845b..f6ef4b3dca 100644
--- a/support/scripts/brpkgutil.py
+++ b/support/scripts/brpkgutil.py
@@ -5,6 +5,26 @@ import sys
 import subprocess
 
 
+# Read the binary-opened file object f with \0\n separated records (aka lines).
+# Highly inspired by:
+# https://stackoverflow.com/questions/19600475/how-to-read-records-terminated-by-custom-separator-from-file-in-python
+def _readlines0n(f):
+    buf = b''
+    while True:
+        newbuf = f.read(1048576)
+        if not newbuf:
+            if buf:
+                yield buf
+            return
+        if buf is None:
+            buf = b''
+        buf += newbuf
+        lines = buf.split(b'\x00\n')
+        for line in lines[:-1]:
+            yield line
+        buf = lines[-1]
+
+
 # Iterate on all records of the packages-file-list file passed as filename
 # Returns an iterator over a list of dictionaries. Each dictionary contains
 # these keys (others maybe added in the future):
@@ -12,11 +32,11 @@ import subprocess
 # 'pkg':  the last package that installed that file
 def parse_pkg_file_list(path):
     with open(path, 'rb') as f:
-        for rec in f.readlines():
-            l = rec.split(',0')
+        for rec in _readlines0n(f):
+            srec = rec.split(b'\x00')
             d = {
-                  'file': l[0],
-                  'pkg':  l[1],
+                  'file': srec[0],
+                  'pkg':  srec[1],
                 }
             yield d
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (10 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 15:13   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 13/19] support/check-uniq-file: invert condition logic Yann E. MORIN
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Now that we can accurately list files touched during a package
installation, we can restore the md5sum to those files with a minor
impact on the build time, generally in the order of a few tens of
milliseconds per package at worst, but that his regained later by
providing even more accurate accountability of files to packages, and
will even also allows ignoring packages that install the same file with
the same content.

Update parser accordingly.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: John Keeping <john@metanate.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-generic.mk       | 10 +++++++---
 support/scripts/brpkgutil.py |  2 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index d261b5bf76..dd6650db7f 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -63,14 +63,18 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
 # field separator. A record is made of these fields:
 #  - file path
 #  - package name
+#  - md5sum of file content, as installed by this package
 # $(1): package name
 # $(2): base directory to search in
 # $(3): suffix of file  (optional)
 define step_pkg_size_inner
 	cd $(2); \
-	find . \( -type f -o -type l \) \
-		-newer $@_before \
-	|sed -r -e 's/$$/\x00$(1)\x00/' \
+	{ \
+		find . -type d -newer $@_before -printf 'directory  %p\n'; \
+		find . -xtype f -newer $@_before -print0 \
+		|xargs -0 -r md5sum; \
+	} \
+	|sed -r -e 's/^([^[:space:]]+)  (.+)$$/\2\x00$(1)\x00\1\x00/' \
 	>> $(BUILD_DIR)/packages-file-list$(3).txt
 endef
 
diff --git a/support/scripts/brpkgutil.py b/support/scripts/brpkgutil.py
index f6ef4b3dca..a1114681ea 100644
--- a/support/scripts/brpkgutil.py
+++ b/support/scripts/brpkgutil.py
@@ -30,6 +30,7 @@ def _readlines0n(f):
 # these keys (others maybe added in the future):
 # 'file': the path of the file,
 # 'pkg':  the last package that installed that file
+# 'md5':  the md5sum of the file content
 def parse_pkg_file_list(path):
     with open(path, 'rb') as f:
         for rec in _readlines0n(f):
@@ -37,6 +38,7 @@ def parse_pkg_file_list(path):
             d = {
                   'file': srec[0],
                   'pkg':  srec[1],
+                  'md5':  srec[2],
                 }
             yield d
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
@ 2019-01-07 22:05 Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS Yann E. MORIN
                   ` (19 more replies)
  0 siblings, 20 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Hello All!

Currently, the instrumentation steps, that we run after a package is
installed, get confused about the files that package may have be
responsible for.

The first problem is that all .la files are tweaked after a package is
installed, and thus those files are all then newer than the built
stampfile of that package, and consequently all .la files are accounted
to that package.

The second problem is that, during development and agter a user
requested a package reinstall (but not a rebuild!), then the built
stampfile is much older, and thus all files that have been installed
since the package was last built are accoutned to that package.

Those two problems are caused by 7fb6e782542f, when we switched away
from an md5 comparison between the state before and after the
installation, to a time-based comparison against the bult stampfile.

Furthermore, during development, the list of installed files can get
out of sync with what is really installed. For example, if a user were
to modify the source of a package, and trigger a re-configure, rebuild,
or re-install, then we'd remove the list of previously installed files
before generating the list of currently installed files. If files
installed in the previous installation are no longer installed, they are
still present in the target (or staging or host), but no longer
accounted to the package that instaleld them.

Additionally, when two or more packages install the same file and it has
the same content, we don't care much about which actually installed it,
as they would all have installed the exact same file. The size could be
assigned to any of those packages, and the licensing terms of any of
those package may be applied to that file. The case is mostly prominent
with the fftw familly of packages (soon to come) that install the same
headers and the same utilities.

Finally, there is one prominent file that gets _updated_ (and not
replaced) by many packages: the info page index, which packages update
when they install their own info pages. We currently report that file,
when in fact it does not end up in target, and thus we don't care about
how its content came to be. And more generically, we don't care any file
that we eventually remove as part of our target-finalize cleanups.

This series is thus an attempt at fixing all those issues.

First and foremost, the series addresses the limitation that causes the
first two problems: we do not have a way to know when the install steps
were started (or any other step, for that matters, but we're currently
only interested in the install steps). So, the first few patches make it
so that we can introduce an new timestamp file at the beginning of each
step.

Then, with the information about the beginning of the install step, we
can now limit the .la files tweaking to just those files that were
actually instaleld y a package. And then we use that same stamp file to
limit the listing of installed files accountable to the current package.

Then the series addreses the same-identical-file-from-many-packages. To
do so, it partially restore the md5sum of the files, but this is
limitted to only those files actually touched during the install of the
current package (see above), and is only ran at the end of the install,
not before. As thus, this is much faster than the original situation
that did the md5 of all files before and after, because it now acts on
cache-hot files only.

That part is split in two: first, the formnat of the packages-file-list
files is modified to be more resilient to weird filenames, which then
allows us to expand it with arbitrarily more fields. A python helper is
provided to abstract the new format, and the consumers of those files
are updated to use the helper (with one script being rewritten in
python). Then we make use of this new format to store the md5 of the
files contents, which we eventually use to decide whether to report the
file or not.

Now, files that are missing from the destination directory are no longer
elligible for being reported as being touched by more than ne pacakge
anymore.

And finally, now that we have a dependable check for uniqueness, we can
add an option in the menuconfig to turn the current warning into a hard
error when uniqueness is not met.

Since this is a time-sensitive topic, here are a few timings before and
after this series, over 6 runs on an idle machine, with a configuration:

  - prebuilt glibc toolchain
  - 233 packages, most pretty small and building fast
  - target/:  215MiB, 14922 files, directories, symlinks...
  - staging/: 625MiB, 29029 files, directories, symlinks...
  - host/:    2.1GiB, 44129 files, directories, symlinks...

                best           minutes:seconds          worst   mean
    before:     36:20   36:22   36:23   36:24   36:27   36:28   36:24
    after:      36:29   36:31   36:32   36:33   36:35   36:37   36:33

So, this is a 9s overhead over 2184s (36:24, before), i.e. a mere 0.4%
increase in time over the full build, or just about a 38ms overhead per
package on average. This overhead is real, but is still very far from
the huge one that was choped off by 7fb6e782542f.

Additionally, the time for re-installing the last package does not
suffer from an already large number or size of files already present.
Best result of three builds (to be cache-hot), for one target package
with a staging install, and one for host package:

            skeleton-init-common-reinstall    host-patchelf-reinstall
    before:            8.258s                       4.951s
    after:             4.514s                       5.034s
    delta:             -3.744s                     +0.083s

So, basically, what this means is that, during development, reinstalling
a previous package is faster. This is because, even though we spend (a
little tiny wee bit) more time when lisitings files due to the md5sum
(and really, thats really just a few additional millieconds per package),
we get repaid hundreths-fold because the list is now accurate, and we
can limit ourselves to tweaking only the corresponding .la file, but
also limit the check-bin-arch to only those files actually interesting.

The host packages are still slightly impacted as we can see for
host-patchelf, because the check-bin-arch does not apply to them, so the
gain from running check-bin-arch only on just-installed files can't
apply to host packages. Still, the impact is minor.

I'd like to particularly thank Nicolas Cavallari for their valuable
input about the issues they encountered with the previous and current
situations. Many thanks! :-)


Regards,
Yann E. MORIN.


The following changes since commit 8e928a8389d88e0f64f04ee1b3aa4985dcfd373f

  Makefile, manual, website: Bump copyright year (2019-01-06 21:30:34 +0100)


are available in the git repository at:

  git://git.buildroot.org/~ymorin/git/buildroot.git

for you to fetch changes up to c7478b1fd1c92508f346f1a8626374d742c9c327

  core: add optional failure when 2+ packages touch the same file (2019-01-07 23:04:09 +0100)


----------------------------------------------------------------
Yann E. MORIN (19):
      infra/pkg-generic: display MESSAGE before running PRE_HOOKS
      infra/pkg-generic: create $(@D) before running PRE_HOOKS
      infra/pkg-generic: introduce new stampfile at the beginning of all steps
      infra/pkg-generic: use \0 to separate .la files as they are found
      infra/pkg-generic: tweak only .la files installed by the current package
      infra/pkg-generic: only list files installed by the current package
      infra/pkg-generic: offload same-package filtering to check-uniq-file
      support/check-uniq-files: decode as many strings as possible
      support: add parser in python for packages-file-list files
      support: rewrite check-bin-arch in python
      support: introduce new format for packages-file-list files
      infra/pkg-generic: store md5 of just-installed files
      support/check-uniq-file: invert condition logic
      support/check-uniq-files: don't report files of the same content
      support/check-uniq-files: use argparse to enfore required options
      core: check unique files in the corresponding finalize step
      core: check for unique target files after all our cleanups
      core: ignore non-unique files that have disapeared
      core: add optional failure when 2+ packages touch the same file

 Config.in                        |   8 ++
 Makefile                         |  22 ++++-
 package/pkg-generic.mk           |  41 +++++---
 support/scripts/brpkgutil.py     |  38 ++++++++
 support/scripts/check-bin-arch   | 205 +++++++++++++++++++++------------------
 support/scripts/check-uniq-files |  69 +++++++------
 support/scripts/size-stats       |  14 +--
 7 files changed, 255 insertions(+), 142 deletions(-)

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 13/19] support/check-uniq-file: invert condition logic
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (11 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content Yann E. MORIN
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

We currently have a single condition to test for to decide whether a
file should be reported or not, but in the future we're going to add
more conditions.

Invert the logic so that we can easily add such checks.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/check-uniq-files | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index 7020462981..cf9ea292bc 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -40,10 +40,12 @@ def main():
         file_to_pkg[record['file']].add(record['pkg'])
 
     for file in file_to_pkg:
-        if len(file_to_pkg[file]) > 1:
-            sys.stderr.write(warn.format(args.type, str_decode(file),
-                                         ", ".join([str_decode(p)
-                                                    for p in file_to_pkg[file]])))
+        if len(file_to_pkg[file]) == 1:
+            continue
+
+        sys.stderr.write(warn.format(args.type, str_decode(file),
+                                     ", ".join([str_decode(p)
+                                                for p in file_to_pkg[file]])))
 
 
 if __name__ == "__main__":
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (12 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 13/19] support/check-uniq-file: invert condition logic Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 15:22   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 15/19] support/check-uniq-files: use argparse to enfore required options Yann E. MORIN
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Currently, we check that no two packages write to the same files, as a
sanity check. We do so by checking which files were touched since the
end of the build (aka beginning of the installation).

However, when the packages do install the exact same file, i,e, the
same content, we in fact do not really care what package had provided
said file.

In the past, we avoided that situation because we were md5sum-inf every
files before and after installation. Anything that changed was new or
modified, and everything that did not change was not modified (but could
have been reinstalled).

However, since 7fb6e78254 (core/instrumentation: shave minutes off the
build time), we're now using mtimes, and we're in the situation that the
exact same file installed by two-or-more packages is reported.

In such a situation, it is not very interesting to know what package
installed the file, because whatever the ordering, or whatever the
subset of said packages, we'd have ended up with the same file anyway.
One prominent case where this happens, is the fftw family of packages,
that all install different libraries, but install the same set of
headers and some common utilities, and they are all identical across the
family.

As such, when the packages all installed the same content (same md5), do
not report the file. Only report it if at least two packages installed a
different content.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: John Keeping <john@metanate.com>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/check-uniq-files | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index cf9ea292bc..f42edeb534 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -36,13 +36,18 @@ def main():
         return False
 
     file_to_pkg = defaultdict(set)
+    file_md5 = defaultdict(set)
     for record in parse_pkg_file_list(args.packages_file_list[0]):
         file_to_pkg[record['file']].add(record['pkg'])
+        file_md5[record['file']].add(record['md5'])
 
     for file in file_to_pkg:
         if len(file_to_pkg[file]) == 1:
             continue
 
+        if len(file_md5[file]) == 1:
+            continue
+
         sys.stderr.write(warn.format(args.type, str_decode(file),
                                      ", ".join([str_decode(p)
                                                 for p in file_to_pkg[file]])))
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 15/19] support/check-uniq-files: use argparse to enfore required options
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (13 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step Yann E. MORIN
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
 support/scripts/check-uniq-files | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index f42edeb534..2cee95d048 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -20,21 +20,13 @@ def str_decode(s):
 
 def main():
     parser = argparse.ArgumentParser()
-    parser.add_argument('packages_file_list', nargs='*',
+    parser.add_argument('packages_file_list', nargs=1,
                         help='The packages-file-list to check from')
-    parser.add_argument('-t', '--type', metavar="TYPE",
+    parser.add_argument('-t', '--type', metavar="TYPE", required=True,
                         help='Report as a TYPE file (TYPE is either target, staging, or host)')
 
     args = parser.parse_args()
 
-    if not len(args.packages_file_list) == 1:
-        sys.stderr.write('No packages-file-list was provided.\n')
-        return False
-
-    if args.type is None:
-        sys.stderr.write('No type was provided\n')
-        return False
-
     file_to_pkg = defaultdict(set)
     file_md5 = defaultdict(set)
     for record in parse_pkg_file_list(args.packages_file_list[0]):
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (14 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 15/19] support/check-uniq-files: use argparse to enfore required options Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 15:24   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 17/19] core: check for unique target files after all our cleanups Yann E. MORIN
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Currently, the check for uniq files is done all in one location, during
target-finalize. This is historical, back from when we had a single
finalize step.

But we now have one for each of target, staging, and host.

So, mnove the check to the corresponding finalize steps.

This will help, later, if/when we introduce actions specific to each
step, that would have an impact on the result of check-uniq-files.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
 Makefile | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 9f4d037cca..dcc738a1b6 100644
--- a/Makefile
+++ b/Makefile
@@ -708,19 +708,22 @@ $(TARGETS_ROOTFS): target-finalize
 # Avoid the rootfs name leaking down the dependency chain
 target-finalize: ROOTFS=
 
+.PHONY: staging-finalize
 host-finalize: $(HOST_DIR_SYMLINK)
+	# Check files that are touched by more than one package
+	./support/scripts/check-uniq-files -t host $(BUILD_DIR)/packages-file-list-host.txt
 
 .PHONY: staging-finalize
 staging-finalize:
 	@ln -snf $(STAGING_DIR) $(BASE_DIR)/staging
+	# Check files that are touched by more than one package
+	./support/scripts/check-uniq-files -t staging $(BUILD_DIR)/packages-file-list-staging.txt
 
 .PHONY: target-finalize
 target-finalize: $(PACKAGES) host-finalize
 	@$(call MESSAGE,"Finalizing target directory")
 	# Check files that are touched by more than one package
 	./support/scripts/check-uniq-files -t target $(BUILD_DIR)/packages-file-list.txt
-	./support/scripts/check-uniq-files -t staging $(BUILD_DIR)/packages-file-list-staging.txt
-	./support/scripts/check-uniq-files -t host $(BUILD_DIR)/packages-file-list-host.txt
 	$(foreach hook,$(TARGET_FINALIZE_HOOKS),$($(hook))$(sep))
 	rm -rf $(TARGET_DIR)/usr/include $(TARGET_DIR)/usr/share/aclocal \
 		$(TARGET_DIR)/usr/lib/pkgconfig $(TARGET_DIR)/usr/share/pkgconfig \
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 17/19] core: check for unique target files after all our cleanups
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (15 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-07 22:05 ` [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared Yann E. MORIN
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Packages that install info pages will update the info index when doing
so. But we do not have that file in the target, so we do not care what
content that file has, or whether it is modified by many packages.

This is basically also valid for any file that we remove as part of our
target-finalize cleanups: the files are not in target.

Consequently, we will want to check for that condition to decide whether
a file should, or should not, be reported as being touched by more than
one package.

For that, we'll need check-uniq-files to be run after all our cleanups.
We even do it after the user-provided post-build scripts, in case they
are explicitly doing some cleanups of their own in those scripts.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
 Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index dcc738a1b6..aefbdd0e78 100644
--- a/Makefile
+++ b/Makefile
@@ -722,8 +722,6 @@ staging-finalize:
 .PHONY: target-finalize
 target-finalize: $(PACKAGES) host-finalize
 	@$(call MESSAGE,"Finalizing target directory")
-	# Check files that are touched by more than one package
-	./support/scripts/check-uniq-files -t target $(BUILD_DIR)/packages-file-list.txt
 	$(foreach hook,$(TARGET_FINALIZE_HOOKS),$($(hook))$(sep))
 	rm -rf $(TARGET_DIR)/usr/include $(TARGET_DIR)/usr/share/aclocal \
 		$(TARGET_DIR)/usr/lib/pkgconfig $(TARGET_DIR)/usr/share/pkgconfig \
@@ -786,6 +784,9 @@ endif
 		$(call MESSAGE,"Executing post-build script $(s)"); \
 		$(EXTRA_ENV) $(s) $(TARGET_DIR) $(call qstrip,$(BR2_ROOTFS_POST_SCRIPT_ARGS))$(sep))
 
+	# Check files that are touched by more than one package
+	./support/scripts/check-uniq-files -t target $(BUILD_DIR)/packages-file-list.txt
+
 	touch $(TARGET_DIR)/usr
 
 .PHONY: target-post-image
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (16 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 17/19] core: check for unique target files after all our cleanups Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 15:29   ` Thomas De Schampheleire
  2019-01-07 22:05 ` [Buildroot] [PATCH 19/19] core: add optional failure when 2+ packages touch the same file Yann E. MORIN
  2019-01-08 12:51 ` [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Thomas De Schampheleire
  19 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

Packages that install info pages will update the info index when doing
so. But we do not have that file in the target, so we do not care what
content that file has, or whether it is modified by many packages.

This is basically also valid for any file that we remove as part of our
target-finalize cleanups: the files are not in target.

Therefore, check that a file still exists before reporting it as being
touched by more than one package.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
 Makefile                         | 12 +++++++++---
 support/scripts/check-uniq-files |  6 ++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index aefbdd0e78..cc9ac91647 100644
--- a/Makefile
+++ b/Makefile
@@ -711,13 +711,17 @@ target-finalize: ROOTFS=
 .PHONY: staging-finalize
 host-finalize: $(HOST_DIR_SYMLINK)
 	# Check files that are touched by more than one package
-	./support/scripts/check-uniq-files -t host $(BUILD_DIR)/packages-file-list-host.txt
+	./support/scripts/check-uniq-files \
+		-t host -d $(HOST_DIR) \
+		$(BUILD_DIR)/packages-file-list-host.txt
 
 .PHONY: staging-finalize
 staging-finalize:
 	@ln -snf $(STAGING_DIR) $(BASE_DIR)/staging
 	# Check files that are touched by more than one package
-	./support/scripts/check-uniq-files -t staging $(BUILD_DIR)/packages-file-list-staging.txt
+	./support/scripts/check-uniq-files \
+		-t staging -d $(STAGING_DIR) \
+		$(BUILD_DIR)/packages-file-list-staging.txt
 
 .PHONY: target-finalize
 target-finalize: $(PACKAGES) host-finalize
@@ -785,7 +789,9 @@ endif
 		$(EXTRA_ENV) $(s) $(TARGET_DIR) $(call qstrip,$(BR2_ROOTFS_POST_SCRIPT_ARGS))$(sep))
 
 	# Check files that are touched by more than one package
-	./support/scripts/check-uniq-files -t target $(BUILD_DIR)/packages-file-list.txt
+	./support/scripts/check-uniq-files \
+		-t target -d $(TARGET_DIR) \
+		$(BUILD_DIR)/packages-file-list.txt
 
 	touch $(TARGET_DIR)/usr
 
diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index 2cee95d048..3b33626c73 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -2,6 +2,7 @@
 
 import sys
 import argparse
+import os.path
 from collections import defaultdict
 from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
 
@@ -24,6 +25,8 @@ def main():
                         help='The packages-file-list to check from')
     parser.add_argument('-t', '--type', metavar="TYPE", required=True,
                         help='Report as a TYPE file (TYPE is either target, staging, or host)')
+    parser.add_argument('-d', '--dir', metavar="DIR", required=True,
+                        help='Directory used as base (target/, staging/ or host/))')
 
     args = parser.parse_args()
 
@@ -40,6 +43,9 @@ def main():
         if len(file_md5[file]) == 1:
             continue
 
+        if not os.path.exists(os.path.join(args.dir, file)):
+            continue
+
         sys.stderr.write(warn.format(args.type, str_decode(file),
                                      ", ".join([str_decode(p)
                                                 for p in file_to_pkg[file]])))
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 19/19] core: add optional failure when 2+ packages touch the same file
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (17 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared Yann E. MORIN
@ 2019-01-07 22:05 ` Yann E. MORIN
  2019-01-08 12:51 ` [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Thomas De Schampheleire
  19 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-07 22:05 UTC (permalink / raw)
  To: buildroot

For top-level parallel build, we will really want to avoid the situation
where two packages touch the same file. The two problematic situations
being:

  - different packages install different content for a file, so we would
    not know which to provide: currently build ordering guarantees the
    content to be reproducible (i.e. last wins), but with parallel
    builds, we may lose such a guarantee;

  - pacakge update an existing file, e.g. by registering new content to
    it: currently, build ordering again guarantees that the file is
    properly expanded by successive packages, but with top-level
    parallel build, that would not longer be the case.

Consequently, we can't accept that two packages touch the same file.

We currently only warn about the situation, but we have no real hard
numbers about how many packages are affected and cause the issue.

Add a user-settable option (mostly for the autobuilders for now) to turn
the warning into a hard error.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 Config.in                        |  8 ++++++++
 Makefile                         | 10 +++++++---
 support/scripts/check-uniq-files | 12 ++++++++++--
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/Config.in b/Config.in
index f965e9d6d8..d424ea6b0b 100644
--- a/Config.in
+++ b/Config.in
@@ -677,6 +677,14 @@ config BR2_COMPILER_PARANOID_UNSAFE_PATH
 	  and external toolchain backends (through the toolchain
 	  wrapper).
 
+config BR2_UNIQ_FILES
+	bool "Forbid two packages from touching the same files"
+	help
+	  By default, Buildroot will detect and warn when two packages
+	  (or more) touch the same files.
+
+	  Say 'y' here to turn the warning into an error.
+
 config BR2_REPRODUCIBLE
 	bool "Make the build reproducible (experimental)"
 	# SOURCE_DATE_EPOCH support in toolchain-wrapper requires GCC 4.4
diff --git a/Makefile b/Makefile
index cc9ac91647..cc135e0ad3 100644
--- a/Makefile
+++ b/Makefile
@@ -703,6 +703,10 @@ endef
 TARGET_FINALIZE_HOOKS += PURGE_LOCALES
 endif
 
+ifeq ($(BR2_UNIQ_FILES),y)
+UNIQ_FILES_OPTS = --fail
+endif
+
 $(TARGETS_ROOTFS): target-finalize
 
 # Avoid the rootfs name leaking down the dependency chain
@@ -712,7 +716,7 @@ target-finalize: ROOTFS=
 host-finalize: $(HOST_DIR_SYMLINK)
 	# Check files that are touched by more than one package
 	./support/scripts/check-uniq-files \
-		-t host -d $(HOST_DIR) \
+		-t host -d $(HOST_DIR) $(UNIQ_FILES_OPTS) \
 		$(BUILD_DIR)/packages-file-list-host.txt
 
 .PHONY: staging-finalize
@@ -720,7 +724,7 @@ staging-finalize:
 	@ln -snf $(STAGING_DIR) $(BASE_DIR)/staging
 	# Check files that are touched by more than one package
 	./support/scripts/check-uniq-files \
-		-t staging -d $(STAGING_DIR) \
+		-t staging -d $(STAGING_DIR) $(UNIQ_FILES_OPTS) \
 		$(BUILD_DIR)/packages-file-list-staging.txt
 
 .PHONY: target-finalize
@@ -790,7 +794,7 @@ endif
 
 	# Check files that are touched by more than one package
 	./support/scripts/check-uniq-files \
-		-t target -d $(TARGET_DIR) \
+		-t target -d $(TARGET_DIR) $(UNIQ_FILES_OPTS) \
 		$(BUILD_DIR)/packages-file-list.txt
 
 	touch $(TARGET_DIR)/usr
diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index 3b33626c73..4507cda4a8 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -6,7 +6,7 @@ import os.path
 from collections import defaultdict
 from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
 
-warn = 'Warning: {0} file "{1}" is touched by more than one package: {2}\n'
+warn = '{0}: {1} file "{2}" is touched by more than one package: {3}\n'
 
 
 # If possible, try to decode the binary string s with the user's locale.
@@ -27,6 +27,8 @@ def main():
                         help='Report as a TYPE file (TYPE is either target, staging, or host)')
     parser.add_argument('-d', '--dir', metavar="DIR", required=True,
                         help='Directory used as base (target/, staging/ or host/))')
+    parser.add_argument('-f', '--fail', action='store_true',
+                        help='Fail if a file is touched by more than one package')
 
     args = parser.parse_args()
 
@@ -36,6 +38,7 @@ def main():
         file_to_pkg[record['file']].add(record['pkg'])
         file_md5[record['file']].add(record['md5'])
 
+    ret = 0
     for file in file_to_pkg:
         if len(file_to_pkg[file]) == 1:
             continue
@@ -46,10 +49,15 @@ def main():
         if not os.path.exists(os.path.join(args.dir, file)):
             continue
 
-        sys.stderr.write(warn.format(args.type, str_decode(file),
+        ret = 1 if args.fail else ret
+        sys.stderr.write(warn.format("Error" if args.fail else "Warning",
+                                     args.type,
+                                     str_decode(file),
                                      ", ".join([str_decode(p)
                                                 for p in file_to_pkg[file]])))
 
+    return ret
+
 
 if __name__ == "__main__":
     sys.exit(main())
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
  2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
                   ` (18 preceding siblings ...)
  2019-01-07 22:05 ` [Buildroot] [PATCH 19/19] core: add optional failure when 2+ packages touch the same file Yann E. MORIN
@ 2019-01-08 12:51 ` Thomas De Schampheleire
  2019-01-08 15:53   ` Thomas De Schampheleire
  19 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 12:51 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Hello All!
>
> Currently, the instrumentation steps, that we run after a package is
> installed, get confused about the files that package may have be
> responsible for.
>
> The first problem is that all .la files are tweaked after a package is
> installed, and thus those files are all then newer than the built
> stampfile of that package, and consequently all .la files are accounted
> to that package.
>
> The second problem is that, during development and agter a user
> requested a package reinstall (but not a rebuild!), then the built
> stampfile is much older, and thus all files that have been installed
> since the package was last built are accoutned to that package.
>
> Those two problems are caused by 7fb6e782542f, when we switched away
> from an md5 comparison between the state before and after the
> installation, to a time-based comparison against the bult stampfile.
>
> Furthermore, during development, the list of installed files can get
> out of sync with what is really installed. For example, if a user were
> to modify the source of a package, and trigger a re-configure, rebuild,
> or re-install, then we'd remove the list of previously installed files
> before generating the list of currently installed files. If files
> installed in the previous installation are no longer installed, they are
> still present in the target (or staging or host), but no longer
> accounted to the package that instaleld them.
>
> Additionally, when two or more packages install the same file and it has
> the same content, we don't care much about which actually installed it,
> as they would all have installed the exact same file. The size could be
> assigned to any of those packages, and the licensing terms of any of
> those package may be applied to that file. The case is mostly prominent
> with the fftw familly of packages (soon to come) that install the same
> headers and the same utilities.
>
> Finally, there is one prominent file that gets _updated_ (and not
> replaced) by many packages: the info page index, which packages update
> when they install their own info pages. We currently report that file,
> when in fact it does not end up in target, and thus we don't care about
> how its content came to be. And more generically, we don't care any file
> that we eventually remove as part of our target-finalize cleanups.
>
> This series is thus an attempt at fixing all those issues.
>
> First and foremost, the series addresses the limitation that causes the
> first two problems: we do not have a way to know when the install steps
> were started (or any other step, for that matters, but we're currently
> only interested in the install steps). So, the first few patches make it
> so that we can introduce an new timestamp file at the beginning of each
> step.
>
> Then, with the information about the beginning of the install step, we
> can now limit the .la files tweaking to just those files that were
> actually instaleld y a package. And then we use that same stamp file to
> limit the listing of installed files accountable to the current package.
>
> Then the series addreses the same-identical-file-from-many-packages. To
> do so, it partially restore the md5sum of the files, but this is
> limitted to only those files actually touched during the install of the
> current package (see above), and is only ran at the end of the install,
> not before. As thus, this is much faster than the original situation
> that did the md5 of all files before and after, because it now acts on
> cache-hot files only.
>
> That part is split in two: first, the formnat of the packages-file-list
> files is modified to be more resilient to weird filenames, which then
> allows us to expand it with arbitrarily more fields. A python helper is
> provided to abstract the new format, and the consumers of those files
> are updated to use the helper (with one script being rewritten in
> python). Then we make use of this new format to store the md5 of the
> files contents, which we eventually use to decide whether to report the
> file or not.
>
> Now, files that are missing from the destination directory are no longer
> elligible for being reported as being touched by more than ne pacakge
> anymore.
>
> And finally, now that we have a dependable check for uniqueness, we can
> add an option in the menuconfig to turn the current warning into a hard
> error when uniqueness is not met.
>
> Since this is a time-sensitive topic, here are a few timings before and
> after this series, over 6 runs on an idle machine, with a configuration:
>
>   - prebuilt glibc toolchain
>   - 233 packages, most pretty small and building fast
>   - target/:  215MiB, 14922 files, directories, symlinks...
>   - staging/: 625MiB, 29029 files, directories, symlinks...
>   - host/:    2.1GiB, 44129 files, directories, symlinks...
>
>                 best           minutes:seconds          worst   mean
>     before:     36:20   36:22   36:23   36:24   36:27   36:28   36:24
>     after:      36:29   36:31   36:32   36:33   36:35   36:37   36:33
>
> So, this is a 9s overhead over 2184s (36:24, before), i.e. a mere 0.4%
> increase in time over the full build, or just about a 38ms overhead per
> package on average. This overhead is real, but is still very far from
> the huge one that was choped off by 7fb6e782542f.
>
> Additionally, the time for re-installing the last package does not
> suffer from an already large number or size of files already present.
> Best result of three builds (to be cache-hot), for one target package
> with a staging install, and one for host package:
>
>             skeleton-init-common-reinstall    host-patchelf-reinstall
>     before:            8.258s                       4.951s
>     after:             4.514s                       5.034s
>     delta:             -3.744s                     +0.083s
>
> So, basically, what this means is that, during development, reinstalling
> a previous package is faster. This is because, even though we spend (a
> little tiny wee bit) more time when lisitings files due to the md5sum
> (and really, thats really just a few additional millieconds per package),
> we get repaid hundreths-fold because the list is now accurate, and we
> can limit ourselves to tweaking only the corresponding .la file, but
> also limit the check-bin-arch to only those files actually interesting.
>
> The host packages are still slightly impacted as we can see for
> host-patchelf, because the check-bin-arch does not apply to them, so the
> gain from running check-bin-arch only on just-installed files can't
> apply to host packages. Still, the impact is minor.
>
> I'd like to particularly thank Nicolas Cavallari for their valuable
> input about the issues they encountered with the previous and current
> situations. Many thanks! :-)
>
>
> Regards,
> Yann E. MORIN.
>
>
> The following changes since commit 8e928a8389d88e0f64f04ee1b3aa4985dcfd373f
>
>   Makefile, manual, website: Bump copyright year (2019-01-06 21:30:34 +0100)
>
>
> are available in the git repository at:
>
>   git://git.buildroot.org/~ymorin/git/buildroot.git
>
> for you to fetch changes up to c7478b1fd1c92508f346f1a8626374d742c9c327
>
>   core: add optional failure when 2+ packages touch the same file (2019-01-07 23:04:09 +0100)
>
>
> ----------------------------------------------------------------
> Yann E. MORIN (19):
>       infra/pkg-generic: display MESSAGE before running PRE_HOOKS
>       infra/pkg-generic: create $(@D) before running PRE_HOOKS
>       infra/pkg-generic: introduce new stampfile at the beginning of all steps
>       infra/pkg-generic: use \0 to separate .la files as they are found
>       infra/pkg-generic: tweak only .la files installed by the current package
>       infra/pkg-generic: only list files installed by the current package
>       infra/pkg-generic: offload same-package filtering to check-uniq-file
>       support/check-uniq-files: decode as many strings as possible
>       support: add parser in python for packages-file-list files
>       support: rewrite check-bin-arch in python
>       support: introduce new format for packages-file-list files
>       infra/pkg-generic: store md5 of just-installed files
>       support/check-uniq-file: invert condition logic
>       support/check-uniq-files: don't report files of the same content
>       support/check-uniq-files: use argparse to enfore required options
>       core: check unique files in the corresponding finalize step
>       core: check for unique target files after all our cleanups
>       core: ignore non-unique files that have disapeared
>       core: add optional failure when 2+ packages touch the same file
>
>  Config.in                        |   8 ++
>  Makefile                         |  22 ++++-
>  package/pkg-generic.mk           |  41 +++++---
>  support/scripts/brpkgutil.py     |  38 ++++++++
>  support/scripts/check-bin-arch   | 205 +++++++++++++++++++++------------------
>  support/scripts/check-uniq-files |  69 +++++++------
>  support/scripts/size-stats       |  14 +--
>  7 files changed, 255 insertions(+), 142 deletions(-)
>
> --
> .-----------------.--------------------.------------------.--------------------.
> |  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
> | +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
> | +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
> | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
> '------------------------------^-------^------------------^--------------------'


For reference, the discussion thread when commit 7fb6e782542f was submitted:
http://lists.busybox.net/pipermail/buildroot/2018-March/215979.html
and my comments on moving away from the md5sum to the mtime approach:
http://lists.busybox.net/pipermail/buildroot/2018-March/216331.html

I will be cross-checking this series against these comments, as an
accurate list in packages-file-list.txt is important for me. In my
current tree, I reverted commit 7fb6e782542f and disabled
check-uniq-files on staging and host, to alleviate the time impact but
still have an accurate list.

/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 06/19] infra/pkg-generic: only list files installed by the current package
  2019-01-07 22:05 ` [Buildroot] [PATCH 06/19] infra/pkg-generic: only list " Yann E. MORIN
@ 2019-01-08 13:07   ` Thomas De Schampheleire
  2019-01-08 15:56     ` Thomas De Schampheleire
  2019-01-08 16:08     ` Yann E. MORIN
  0 siblings, 2 replies; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 13:07 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Since 7fb6e782542f (core/instrumentation: shave minutes off the build
> time), the built stampfile is used as a reference to detect files
> installed by a package.
>
> However, this falls short during development, when a user may want to
> re-install a built-early package without rebuilding it (i.e. make
> foo-reinstall). In this case, the built stampfile is not touched, and is
> still dated from way back when the package was first built. As such,
> almost all files in target (or staging or host) are newer than that, and
> so those files are all now accounted for that package, when in fact only
> a minor subset may be accountable to it.
>
> We fix that by limiting the search for files that have been actually
> touched during the install step, now that we have the proper timestamp
> for it.
>
> Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> ---
>  package/pkg-generic.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 3de8a99675..42aebeb49d 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -67,7 +67,7 @@ define step_pkg_size_inner
>         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
>         cd $(2); \
>         find . \( -type f -o -type l \) \
> -               -newer $($(PKG)_DIR)/.stamp_built \
> +               -newer $@_before \
>                 -exec printf '$(1),%s\n' {} + \
>                 >> $(BUILD_DIR)/packages-file-list$(3).txt
>  endef

While it is probably not important, just noting here that if package A
installs a file with a modification time in the future, then package B
installed after A would still get that future file in its list. In
fact, every package built after pkg A would then get that file into
'its' list.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
  2019-01-07 22:05 ` [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files Yann E. MORIN
@ 2019-01-08 13:35   ` Thomas De Schampheleire
  2019-01-08 16:29     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 13:35 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Currently, each script that want to parse the package-files lists has to
> invent its own parsing. So far, this was acceptable, as the format of
> those files was relatively easy (line-based records, comma-separated
> fields).
>
> However, that format is not very resilient against weird filenames (e.g.
> filenames with commas in the, or even with \n chars in them.

in the,   --> in them

>
> Furthermore, that format is not easily extensible.
>
> So, introduce a parser, in python, that abstracts the format of these
> files, and returns a dictionaries. Using dictionaries makes it easy for

returns a dictionary.

> callers to just ignore the fields they are not interested in, or even
> are not aware of. Consequently, it will make it easy for us to introduce
> new fields in the future.
>
> Convert the two existing python scripts that parse those files.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
>
> ---
> Note: a shell-script also parses those files, it will be handled in a
> subsequent change.
> ---
>  support/scripts/brpkgutil.py     | 16 ++++++++++++++++
>  support/scripts/check-uniq-files |  7 +++----
>  support/scripts/size-stats       | 14 +++++++-------
>  3 files changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/support/scripts/brpkgutil.py b/support/scripts/brpkgutil.py
> index e70d525353..d15b18845b 100644
> --- a/support/scripts/brpkgutil.py
> +++ b/support/scripts/brpkgutil.py
> @@ -5,6 +5,22 @@ import sys
>  import subprocess
>
>
> +# Iterate on all records of the packages-file-list file passed as filename
> +# Returns an iterator over a list of dictionaries. Each dictionary contains
> +# these keys (others maybe added in the future):
> +# 'file': the path of the file,
> +# 'pkg':  the last package that installed that file
> +def parse_pkg_file_list(path):
> +    with open(path, 'rb') as f:

Why exactly do you open as binary?
IIRC this will cause the need for decoding the strings on Python 3,
which you don't seem to do. This was also the reason why the orginal
check-uniq-files needed 'b' markers in front of some strings like
b','.

> +        for rec in f.readlines():
> +            l = rec.split(',0')

This looks wrong to me, there is no text ',0' in the input.
I think you meant rec.split(',', 1), no, like in the original code?

> +            d = {
> +                  'file': l[0],
> +                  'pkg':  l[1],

This seems also wrong, 0 and 1 are swapped.
Example input is:

busybox,./usr/bin/eject
busybox,./usr/bin/basename
busybox,./usr/bin/hexedit
busybox,./usr/bin/readlink
busybox,./usr/bin/cmp

so I would expect 'pkg' to be l[0] and 'file' to be l[1].

Note that 'l' could perhaps become a more meaningful variable name. I
also think that one of the python rules is that variable names should
be minimum 3 characters.

> +                }
> +            yield d
> +
> +
>  # Execute the "make <pkg>-show-version" command to get the version of a given
>  # list of packages, and return the version formatted as a Python dictionary.
>  def get_version(pkgs):
> diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
> index e95a134168..7020462981 100755
> --- a/support/scripts/check-uniq-files
> +++ b/support/scripts/check-uniq-files
> @@ -3,6 +3,7 @@
>  import sys
>  import argparse
>  from collections import defaultdict
> +from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
>
>  warn = 'Warning: {0} file "{1}" is touched by more than one package: {2}\n'
>
> @@ -35,10 +36,8 @@ def main():
>          return False
>
>      file_to_pkg = defaultdict(set)
> -    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> -        for line in pkg_file_list.readlines():
> -            pkg, _, file = line.rstrip(b'\n').partition(b',')

Note that the rstrip of newlines is no longer present in parse_pkg_file_list.

> -            file_to_pkg[file].add(pkg)
> +    for record in parse_pkg_file_list(args.packages_file_list[0]):
> +        file_to_pkg[record['file']].add(record['pkg'])
>
>      for file in file_to_pkg:
>          if len(file_to_pkg[file]) > 1:
> diff --git a/support/scripts/size-stats b/support/scripts/size-stats
> index deea92e278..48cd834ab4 100755
> --- a/support/scripts/size-stats
> +++ b/support/scripts/size-stats
> @@ -22,6 +22,7 @@ import os.path
>  import argparse
>  import csv
>  import collections
> +from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
>
>  try:
>      import matplotlib
> @@ -66,13 +67,12 @@ def add_file(filesdict, relpath, abspath, pkg):
>  #
>  def build_package_dict(builddir):
>      filesdict = {}
> -    with open(os.path.join(builddir, "build", "packages-file-list.txt")) as filelistf:
> -        for l in filelistf.readlines():
> -            pkg, fpath = l.split(",", 1)
> -            # remove the initial './' in each file path
> -            fpath = fpath.strip()[2:]
> -            fullpath = os.path.join(builddir, "target", fpath)
> -            add_file(filesdict, fpath, fullpath, pkg)
> +    fname = os.path.join(builddir, "build", "packages-file-list.txt")
> +    for record in parse_pkg_file_list(fname):
> +        # remove the initial './' in each file path
> +        fpath = record['file'].strip()[2:]

The stripping here and the rstrip in check-uniq-files could perhaps
all be moved to parse_pkg_file_list.

> +        fullpath = os.path.join(builddir, "target", fpath)
> +        add_file(filesdict, fpath, fullpath, record['pkg'])
>      return filesdict
>

/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-07 22:05 ` [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python Yann E. MORIN
@ 2019-01-08 14:56   ` Thomas De Schampheleire
  2019-01-08 16:37     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 14:56 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> The existing check-bin-arch script is written in shell, so it can't make
> use of all the helpers we have in python, especially the parser for the
> package-files lists.
>
> Although this script is relatively clean as it is, it is not totally
> fool-proof either, especially against weird filenames (e.g.
> specially-crafted filenames with \n, or \b, etc...).
>
> Also, shell scripts are often frowned upon but for the most mundane
> processing, and this script is definitely not mundane.
>
> Finally, shell scripts are slow, as all the processing the have to do is
> more often than not done by spawning programs, and that is relatively
> expensive.
>
> Rewrite that script in python.
>
> This allows to do cleaner processing and reuse the package-files list
> parsuer.
>
> There's however a drawback: the script grows substantially, in part
> because of spawning the actual readelf call (there is no ELF parser in
> the standard python library), and because we want to keep backward
> compatible with old pythons that lack proper abstractions like
> subprocess.DEVNULL et al.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Thomas De Schampheleire <patrickdepinguin@gmail.com>
> ---
>  support/scripts/check-bin-arch | 205 +++++++++++++++++++++++------------------
>  1 file changed, 113 insertions(+), 92 deletions(-)
>
> diff --git a/support/scripts/check-bin-arch b/support/scripts/check-bin-arch
> index 66b8d89932..d4902163e7 100755
> --- a/support/scripts/check-bin-arch
> +++ b/support/scripts/check-bin-arch
> @@ -1,92 +1,113 @@
> -#!/usr/bin/env bash
> -
> -# List of hardcoded paths that should be ignored, as they may
> -# contain binaries for an architecture different from the
> -# architecture of the target.
> -declare -a IGNORES=(
> -       # Skip firmware files, they could be ELF files for other
> -       # architectures
> -       "/lib/firmware"
> -       "/usr/lib/firmware"
> -
> -       # Skip kernel modules
> -       # When building a 32-bit userland on 64-bit architectures, the kernel
> -       # and its modules may still be 64-bit. To keep the basic
> -       # check-bin-arch logic simple, just skip this directory.
> -       "/lib/modules"
> -       "/usr/lib/modules"
> -
> -       # Skip files in /usr/share, several packages (qemu,
> -       # pru-software-support) legitimately install ELF binaries that
> -       # are not for the target architecture
> -       "/usr/share"
> -
> -       # Skip files in /lib/grub, since it is possible to have it
> -       # for a different architecture (e.g. i386 grub on x86_64).
> -       "/lib/grub"
> -)
> -
> -while getopts p:l:r:a:i: OPT ; do
> -       case "${OPT}" in
> -       p) package="${OPTARG}";;
> -       l) pkg_list="${OPTARG}";;
> -       r) readelf="${OPTARG}";;
> -       a) arch_name="${OPTARG}";;
> -       i)
> -               # Ensure we do have single '/' as separators,
> -               # and that we have a leading and a trailing one.
> -               pattern="$(sed -r -e 's:/+:/:g; s:^/*:/:; s:/*$:/:;' <<<"${OPTARG}")"
> -               IGNORES+=("${pattern}")
> -               ;;
> -       :) error "option '%s' expects a mandatory argument\n" "${OPTARG}";;
> -       \?) error "unknown option '%s'\n" "${OPTARG}";;
> -       esac
> -done
> -
> -if test -z "${package}" -o -z "${pkg_list}" -o -z "${readelf}" -o -z "${arch_name}" ; then
> -       echo "Usage: $0 -p <pkg> -l <pkg-file-list> -r <readelf> -a <arch name> [-i PATH ...]"
> -       exit 1
> -fi
> -
> -exitcode=0
> -
> -# Only split on new lines, for filenames-with-spaces
> -IFS="
> -"
> -
> -while read f; do
> -       for ignore in "${IGNORES[@]}"; do
> -               if [[ "${f}" =~ ^"${ignore}" ]]; then
> -                       continue 2
> -               fi
> -       done
> -
> -       # Skip symlinks. Some symlinks may have absolute paths as
> -       # target, pointing to host binaries while we're building.
> -       if [[ -L "${TARGET_DIR}/${f}" ]]; then
> -               continue
> -       fi
> -
> -       # Get architecture using readelf. We pipe through 'head -1' so
> -       # that when the file is a static library (.a), we only take
> -       # into account the architecture of the first object file.
> -       arch=$(LC_ALL=C ${readelf} -h "${TARGET_DIR}/${f}" 2>&1 | \
> -                      sed -r -e '/^  Machine: +(.+)/!d; s//\1/;' | head -1)
> -
> -       # If no architecture found, assume it was not an ELF file
> -       if test "${arch}" = "" ; then
> -               continue
> -       fi
> -
> -       # Architecture is correct
> -       if test "${arch}" = "${arch_name}" ; then
> -               continue
> -       fi
> -
> -       printf 'ERROR: architecture for "%s" is "%s", should be "%s"\n' \
> -              "${f}" "${arch}" "${arch_name}"
> -
> -       exitcode=1
> -done < <( sed -r -e "/^${package},\.(.+)$/!d; s//\1/;" ${pkg_list} )
> -
> -exit ${exitcode}
> +#!/usr/bin/env python
> +
> +import argparse
> +import os
> +import re
> +import subprocess
> +import sys
> +from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
> +
> +
> +# List of hardcoded paths that should be ignored, as they may contain
> +# binaries for an architecture different from the architecture of the
> +# target
> +IGNORES = {
> +    # Skip firmware files
> +    # They could be ELF files for other architectures.
> +    '/lib/firmware',
> +    '/usr/lib/firmware',
> +
> +    # Skip kernel modules
> +    # When building a 32-bit userland on 64-bit architectures, the
> +    # kernel and its modules may still be 64-bit. To keep the basic
> +    # check-bin-arch logic simple, just skip these directories
> +    '/lib/modules',
> +    '/usr/lib/modules',
> +
> +    # Skip files in /usr/share
> +    # Several packages (qemu, pru-software-support) legitimately install
> +    # ELF binaries that are not for the target architecture
> +    '/usr/share',
> +
> +    # Skip files in /lib/grub
> +    # It is possible to have it for a different architecture (e.g. i386
> +    # grub on x86_64)
> +    '/lib/grub',
> +}
> +
> +
> +ERROR = 'ERROR: architecture for "%s" is "%s", should be "%s"\n'
> +
> +
> +def main():
> +    parser = argparse.ArgumentParser()
> +    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
> +    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
> +    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
> +    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
> +    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')

I think the above 'metavar' options are not necessary, as they are the
default value.

> +    args = parser.parse_args()
> +
> +    if args.ignore is not None:
> +        # Ensure we do have single '/' as separators, and that we have a
> +        # leading and a trailing one, then append to the global list.
> +        for pattern in args.ignore:
> +            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
> +

Note that the global list itself does not have trailing slashes. This
seems inconsistent to me.

> +    ignores_re = set()
> +    for i in IGNORES:
> +        ignores_re.add(re.compile(i))
> +
> +    arch_re = re.compile('^  Machine: +(.+)')
> +
> +    target_dir = os.environ['TARGET_DIR']
> +
> +    exit_code = 0
> +    for record in parse_pkg_file_list(args.pkg_list):
> +        if record['pkg'] != args.package:
> +            continue
> +
> +        fpath = record['file']
> +
> +        ignored = False
> +        for i in ignores_re:
> +            if i.match(fpath):
> +                ignored = True
> +                break
> +        if ignored:
> +            continue

We can never get into this if, right?, because there is already a
break whenever ignored is set to True.

Note that I think that performance will be better with a list
comprehension instead of explicit for's. Something like (untested):

valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
            if record['pkg'] == args.package
            and not any(ignore_re.match(record['file']) for ignore_re
in ignores_re) ]
for record in valid_records:
    ...


You can normally swap the [ ] for ( ) to reduce memory consumption
(generator iso list comprehension, yielding one entry at a time).

You can argue about readability, but for me this is not less readable
than the expanded version.

/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files
  2019-01-07 22:05 ` [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files Yann E. MORIN
@ 2019-01-08 15:07   ` Thomas De Schampheleire
  2019-01-08 19:27     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 15:07 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> The existing format for the packages-files lists has two drawbacks:
>
>   - it is not very resilient against filenames with weird characters,
>     like \n or \b or spaces...
>
>   - it is not easily expandable, partly because of the above.
>
> AS such, introduce a new format for those files, that solves both
> issues.
>
> First, we must find a way that unambiguously separate fields. There is
> one single byte that can never ever occur in filenames or package names,
> i.e. the NULL character. So, we use that as a field separator.
>
> Second, we must find a way to unambiguously separate records. Except for
> \0, any character may occur in filenames, but the other existing field
> we currently have is the package name, which we do know does not contain
> any weird byte (e.g. it's basically limited to [[:alnum:]_-]). Thus, we
> can't use a single character as record separator. A solution is to use
> \0\n as the record separator.
>
> Thirdly, we must ensure that filenames never mess up with our
> separators. By making the filename the first field, we can be sure that
> it is properly terminated by a field separator, and that any leading \n
> does not interfere with a previous field separator to form a spurious
> record separator.
>
> So, the new format is now (without spaces):
>
>   filename \0 package-name \0\n
>
> Update the parser accordingly.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> ---
>  package/pkg-generic.mk       |  8 ++++++--
>  support/scripts/brpkgutil.py | 28 ++++++++++++++++++++++++----
>  2 files changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 7daea190a6..d261b5bf76 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -59,6 +59,10 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
>
>  # The suffix is typically empty for the target variant, for legacy backward
>  # compatibility.
> +# Files are record-formatted, with \0\n as record separator, and \0 as
> +# field separator. A record is made of these fields:
> +#  - file path
> +#  - package name
>  # $(1): package name
>  # $(2): base directory to search in
>  # $(3): suffix of file  (optional)
> @@ -66,8 +70,8 @@ define step_pkg_size_inner
>         cd $(2); \
>         find . \( -type f -o -type l \) \
>                 -newer $@_before \
> -               -exec printf '$(1),%s\n' {} + \
> -               >> $(BUILD_DIR)/packages-file-list$(3).txt
> +       |sed -r -e 's/$$/\x00$(1)\x00/' \
> +       >> $(BUILD_DIR)/packages-file-list$(3).txt
>  endef
>
>  define step_pkg_size
> diff --git a/support/scripts/brpkgutil.py b/support/scripts/brpkgutil.py
> index d15b18845b..f6ef4b3dca 100644
> --- a/support/scripts/brpkgutil.py
> +++ b/support/scripts/brpkgutil.py
> @@ -5,6 +5,26 @@ import sys
>  import subprocess
>
>
> +# Read the binary-opened file object f with \0\n separated records (aka lines).
> +# Highly inspired by:
> +# https://stackoverflow.com/questions/19600475/how-to-read-records-terminated-by-custom-separator-from-file-in-python
> +def _readlines0n(f):
> +    buf = b''
> +    while True:
> +        newbuf = f.read(1048576)

I would find 1024 * 1024 more readable.

> +        if not newbuf:
> +            if buf:
> +                yield buf
> +            return
> +        if buf is None:
> +            buf = b''
> +        buf += newbuf
> +        lines = buf.split(b'\x00\n')
> +        for line in lines[:-1]:
> +            yield line
> +        buf = lines[-1]
> +
> +
>  # Iterate on all records of the packages-file-list file passed as filename
>  # Returns an iterator over a list of dictionaries. Each dictionary contains
>  # these keys (others maybe added in the future):
> @@ -12,11 +32,11 @@ import subprocess
>  # 'pkg':  the last package that installed that file
>  def parse_pkg_file_list(path):
>      with open(path, 'rb') as f:

I now understand why you read as binary.


> -        for rec in f.readlines():
> -            l = rec.split(',0')

I still think this ',0' was wrong.


> +        for rec in _readlines0n(f):
> +            srec = rec.split(b'\x00')
>              d = {
> -                  'file': l[0],
> -                  'pkg':  l[1],
> +                  'file': srec[0],
> +                  'pkg':  srec[1],

and I now see how the swap in a previous commit could go unnoticed in
your testing :-)


/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files
  2019-01-07 22:05 ` [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files Yann E. MORIN
@ 2019-01-08 15:13   ` Thomas De Schampheleire
  2019-01-08 19:31     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 15:13 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Now that we can accurately list files touched during a package
> installation, we can restore the md5sum to those files with a minor
> impact on the build time, generally in the order of a few tens of
> milliseconds per package at worst, but that his regained later by

that is

> providing even more accurate accountability of files to packages, and
> will even also allows ignoring packages that install the same file with
> the same content.
>
> Update parser accordingly.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> Cc: John Keeping <john@metanate.com>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> ---
>  package/pkg-generic.mk       | 10 +++++++---
>  support/scripts/brpkgutil.py |  2 ++
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index d261b5bf76..dd6650db7f 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -63,14 +63,18 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
>  # field separator. A record is made of these fields:
>  #  - file path
>  #  - package name
> +#  - md5sum of file content, as installed by this package
>  # $(1): package name
>  # $(2): base directory to search in
>  # $(3): suffix of file  (optional)
>  define step_pkg_size_inner
>         cd $(2); \
> -       find . \( -type f -o -type l \) \
> -               -newer $@_before \
> -       |sed -r -e 's/$$/\x00$(1)\x00/' \
> +       { \
> +               find . -type d -newer $@_before -printf 'directory  %p\n'; \
> +               find . -xtype f -newer $@_before -print0 \

Note that the original implementation also used '-xtype' on the
directory case, I think it's important.


/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content
  2019-01-07 22:05 ` [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content Yann E. MORIN
@ 2019-01-08 15:22   ` Thomas De Schampheleire
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 15:22 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:07, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Currently, we check that no two packages write to the same files, as a
> sanity check. We do so by checking which files were touched since the
> end of the build (aka beginning of the installation).
>
> However, when the packages do install the exact same file, i,e, the
> same content, we in fact do not really care what package had provided
> said file.
>
> In the past, we avoided that situation because we were md5sum-inf every
> files before and after installation. Anything that changed was new or

md5sum-ing
'all files' or 'every file'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step
  2019-01-07 22:05 ` [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step Yann E. MORIN
@ 2019-01-08 15:24   ` Thomas De Schampheleire
  2019-01-08 19:36     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 15:24 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:07, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Currently, the check for uniq files is done all in one location, during
> target-finalize. This is historical, back from when we had a single
> finalize step.
>
> But we now have one for each of target, staging, and host.
>
> So, mnove the check to the corresponding finalize steps.
>
> This will help, later, if/when we introduce actions specific to each
> step, that would have an impact on the result of check-uniq-files.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> ---
>  Makefile | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 9f4d037cca..dcc738a1b6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -708,19 +708,22 @@ $(TARGETS_ROOTFS): target-finalize
>  # Avoid the rootfs name leaking down the dependency chain
>  target-finalize: ROOTFS=
>
> +.PHONY: staging-finalize

This should be host-finalize.

/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared
  2019-01-07 22:05 ` [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared Yann E. MORIN
@ 2019-01-08 15:29   ` Thomas De Schampheleire
  2019-01-08 19:44     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 15:29 UTC (permalink / raw)
  To: buildroot

El lun., 7 ene. 2019 a las 23:07, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Packages that install info pages will update the info index when doing
> so. But we do not have that file in the target, so we do not care what
> content that file has, or whether it is modified by many packages.
>
> This is basically also valid for any file that we remove as part of our
> target-finalize cleanups: the files are not in target.
>
> Therefore, check that a file still exists before reporting it as being
> touched by more than one package.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> ---
>  Makefile                         | 12 +++++++++---
>  support/scripts/check-uniq-files |  6 ++++++
>  2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index aefbdd0e78..cc9ac91647 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -711,13 +711,17 @@ target-finalize: ROOTFS=
>  .PHONY: staging-finalize
>  host-finalize: $(HOST_DIR_SYMLINK)
>         # Check files that are touched by more than one package
> -       ./support/scripts/check-uniq-files -t host $(BUILD_DIR)/packages-file-list-host.txt
> +       ./support/scripts/check-uniq-files \
> +               -t host -d $(HOST_DIR) \
> +               $(BUILD_DIR)/packages-file-list-host.txt
>
>  .PHONY: staging-finalize
>  staging-finalize:
>         @ln -snf $(STAGING_DIR) $(BASE_DIR)/staging
>         # Check files that are touched by more than one package
> -       ./support/scripts/check-uniq-files -t staging $(BUILD_DIR)/packages-file-list-staging.txt
> +       ./support/scripts/check-uniq-files \
> +               -t staging -d $(STAGING_DIR) \
> +               $(BUILD_DIR)/packages-file-list-staging.txt
>
>  .PHONY: target-finalize
>  target-finalize: $(PACKAGES) host-finalize
> @@ -785,7 +789,9 @@ endif
>                 $(EXTRA_ENV) $(s) $(TARGET_DIR) $(call qstrip,$(BR2_ROOTFS_POST_SCRIPT_ARGS))$(sep))
>
>         # Check files that are touched by more than one package
> -       ./support/scripts/check-uniq-files -t target $(BUILD_DIR)/packages-file-list.txt
> +       ./support/scripts/check-uniq-files \
> +               -t target -d $(TARGET_DIR) \
> +               $(BUILD_DIR)/packages-file-list.txt
>
>         touch $(TARGET_DIR)/usr
>
> diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
> index 2cee95d048..3b33626c73 100755
> --- a/support/scripts/check-uniq-files
> +++ b/support/scripts/check-uniq-files
> @@ -2,6 +2,7 @@
>
>  import sys
>  import argparse
> +import os.path
>  from collections import defaultdict
>  from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
>
> @@ -24,6 +25,8 @@ def main():
>                          help='The packages-file-list to check from')
>      parser.add_argument('-t', '--type', metavar="TYPE", required=True,
>                          help='Report as a TYPE file (TYPE is either target, staging, or host)')
> +    parser.add_argument('-d', '--dir', metavar="DIR", required=True,
> +                        help='Directory used as base (target/, staging/ or host/))')

Like commented elsewhere, metavar should be unnecessary.

In the help text there is a double closing parenthesis.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
  2019-01-08 12:51 ` [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Thomas De Schampheleire
@ 2019-01-08 15:53   ` Thomas De Schampheleire
  2019-01-08 21:30     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 15:53 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 13:51, Thomas De Schampheleire
(<patrickdepinguin@gmail.com>) escribi?:
>
> El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
> >
> > Hello All!
> >
> > Currently, the instrumentation steps, that we run after a package is
> > installed, get confused about the files that package may have be
> > responsible for.
> >
> > The first problem is that all .la files are tweaked after a package is
> > installed, and thus those files are all then newer than the built
> > stampfile of that package, and consequently all .la files are accounted
> > to that package.
> >
> > The second problem is that, during development and agter a user
> > requested a package reinstall (but not a rebuild!), then the built
> > stampfile is much older, and thus all files that have been installed
> > since the package was last built are accoutned to that package.
> >
> > Those two problems are caused by 7fb6e782542f, when we switched away
> > from an md5 comparison between the state before and after the
> > installation, to a time-based comparison against the bult stampfile.
> >
> > Furthermore, during development, the list of installed files can get
> > out of sync with what is really installed. For example, if a user were
> > to modify the source of a package, and trigger a re-configure, rebuild,
> > or re-install, then we'd remove the list of previously installed files
> > before generating the list of currently installed files. If files
> > installed in the previous installation are no longer installed, they are
> > still present in the target (or staging or host), but no longer
> > accounted to the package that instaleld them.
> >
> > Additionally, when two or more packages install the same file and it has
> > the same content, we don't care much about which actually installed it,
> > as they would all have installed the exact same file. The size could be
> > assigned to any of those packages, and the licensing terms of any of
> > those package may be applied to that file. The case is mostly prominent
> > with the fftw familly of packages (soon to come) that install the same
> > headers and the same utilities.
> >
> > Finally, there is one prominent file that gets _updated_ (and not
> > replaced) by many packages: the info page index, which packages update
> > when they install their own info pages. We currently report that file,
> > when in fact it does not end up in target, and thus we don't care about
> > how its content came to be. And more generically, we don't care any file
> > that we eventually remove as part of our target-finalize cleanups.
> >
> > This series is thus an attempt at fixing all those issues.
> >
> > First and foremost, the series addresses the limitation that causes the
> > first two problems: we do not have a way to know when the install steps
> > were started (or any other step, for that matters, but we're currently
> > only interested in the install steps). So, the first few patches make it
> > so that we can introduce an new timestamp file at the beginning of each
> > step.
> >
> > Then, with the information about the beginning of the install step, we
> > can now limit the .la files tweaking to just those files that were
> > actually instaleld y a package. And then we use that same stamp file to
> > limit the listing of installed files accountable to the current package.
> >
> > Then the series addreses the same-identical-file-from-many-packages. To
> > do so, it partially restore the md5sum of the files, but this is
> > limitted to only those files actually touched during the install of the
> > current package (see above), and is only ran at the end of the install,
> > not before. As thus, this is much faster than the original situation
> > that did the md5 of all files before and after, because it now acts on
> > cache-hot files only.
> >
> > That part is split in two: first, the formnat of the packages-file-list
> > files is modified to be more resilient to weird filenames, which then
> > allows us to expand it with arbitrarily more fields. A python helper is
> > provided to abstract the new format, and the consumers of those files
> > are updated to use the helper (with one script being rewritten in
> > python). Then we make use of this new format to store the md5 of the
> > files contents, which we eventually use to decide whether to report the
> > file or not.
> >
> > Now, files that are missing from the destination directory are no longer
> > elligible for being reported as being touched by more than ne pacakge
> > anymore.
> >
> > And finally, now that we have a dependable check for uniqueness, we can
> > add an option in the menuconfig to turn the current warning into a hard
> > error when uniqueness is not met.
> >
> > Since this is a time-sensitive topic, here are a few timings before and
> > after this series, over 6 runs on an idle machine, with a configuration:
> >
> >   - prebuilt glibc toolchain
> >   - 233 packages, most pretty small and building fast
> >   - target/:  215MiB, 14922 files, directories, symlinks...
> >   - staging/: 625MiB, 29029 files, directories, symlinks...
> >   - host/:    2.1GiB, 44129 files, directories, symlinks...
> >
> >                 best           minutes:seconds          worst   mean
> >     before:     36:20   36:22   36:23   36:24   36:27   36:28   36:24
> >     after:      36:29   36:31   36:32   36:33   36:35   36:37   36:33
> >
> > So, this is a 9s overhead over 2184s (36:24, before), i.e. a mere 0.4%
> > increase in time over the full build, or just about a 38ms overhead per
> > package on average. This overhead is real, but is still very far from
> > the huge one that was choped off by 7fb6e782542f.
> >
> > Additionally, the time for re-installing the last package does not
> > suffer from an already large number or size of files already present.
> > Best result of three builds (to be cache-hot), for one target package
> > with a staging install, and one for host package:
> >
> >             skeleton-init-common-reinstall    host-patchelf-reinstall
> >     before:            8.258s                       4.951s
> >     after:             4.514s                       5.034s
> >     delta:             -3.744s                     +0.083s
> >
> > So, basically, what this means is that, during development, reinstalling
> > a previous package is faster. This is because, even though we spend (a
> > little tiny wee bit) more time when lisitings files due to the md5sum
> > (and really, thats really just a few additional millieconds per package),
> > we get repaid hundreths-fold because the list is now accurate, and we
> > can limit ourselves to tweaking only the corresponding .la file, but
> > also limit the check-bin-arch to only those files actually interesting.
> >
> > The host packages are still slightly impacted as we can see for
> > host-patchelf, because the check-bin-arch does not apply to them, so the
> > gain from running check-bin-arch only on just-installed files can't
> > apply to host packages. Still, the impact is minor.
> >
> > I'd like to particularly thank Nicolas Cavallari for their valuable
> > input about the issues they encountered with the previous and current
> > situations. Many thanks! :-)
> >
> >
[..]
>
>
> For reference, the discussion thread when commit 7fb6e782542f was submitted:
> http://lists.busybox.net/pipermail/buildroot/2018-March/215979.html
> and my comments on moving away from the md5sum to the mtime approach:
> http://lists.busybox.net/pipermail/buildroot/2018-March/216331.html
>
> I will be cross-checking this series against these comments, as an
> accurate list in packages-file-list.txt is important for me. In my
> current tree, I reverted commit 7fb6e782542f and disabled
> check-uniq-files on staging and host, to alleviate the time impact but
> still have an accurate list.

I finished reviewing, thanks for this interesting series and clear commits.

Regarding my old comments linked above, I think the packages-file-list
will still be off if a package installs a file with a fixed timestamp,
typically a timestamp in the past. I should test this series against
our use case to see if we actually have such a case, but I'd need to
do some preparatory work first so it may take some time.

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 06/19] infra/pkg-generic: only list files installed by the current package
  2019-01-08 13:07   ` Thomas De Schampheleire
@ 2019-01-08 15:56     ` Thomas De Schampheleire
  2019-01-08 19:51       ` Yann E. MORIN
  2019-01-08 16:08     ` Yann E. MORIN
  1 sibling, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 15:56 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 14:07, Thomas De Schampheleire
(<patrickdepinguin@gmail.com>) escribi?:
>
> El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
> >
> > Since 7fb6e782542f (core/instrumentation: shave minutes off the build
> > time), the built stampfile is used as a reference to detect files
> > installed by a package.
> >
> > However, this falls short during development, when a user may want to
> > re-install a built-early package without rebuilding it (i.e. make
> > foo-reinstall). In this case, the built stampfile is not touched, and is
> > still dated from way back when the package was first built. As such,
> > almost all files in target (or staging or host) are newer than that, and
> > so those files are all now accounted for that package, when in fact only
> > a minor subset may be accountable to it.
> >
> > We fix that by limiting the search for files that have been actually
> > touched during the install step, now that we have the proper timestamp
> > for it.
> >
> > Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> > Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> > Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> > ---
> >  package/pkg-generic.mk | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > index 3de8a99675..42aebeb49d 100644
> > --- a/package/pkg-generic.mk
> > +++ b/package/pkg-generic.mk
> > @@ -67,7 +67,7 @@ define step_pkg_size_inner
> >         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
> >         cd $(2); \
> >         find . \( -type f -o -type l \) \
> > -               -newer $($(PKG)_DIR)/.stamp_built \
> > +               -newer $@_before \
> >                 -exec printf '$(1),%s\n' {} + \
> >                 >> $(BUILD_DIR)/packages-file-list$(3).txt
> >  endef
>
> While it is probably not important, just noting here that if package A
> installs a file with a modification time in the future, then package B
> installed after A would still get that future file in its list. In
> fact, every package built after pkg A would then get that file into
> 'its' list.

Possibly more realistically, a package installing a file with a fixed
timestamp in the past, perhaps a timestamp of the build time preserved
via 'cp -a'.

For this reason I retract the 'probably not important' part.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 06/19] infra/pkg-generic: only list files installed by the current package
  2019-01-08 13:07   ` Thomas De Schampheleire
  2019-01-08 15:56     ` Thomas De Schampheleire
@ 2019-01-08 16:08     ` Yann E. MORIN
  2019-01-08 16:55       ` Thomas De Schampheleire
  1 sibling, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 16:08 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

Thanks for the review! :-)

On 2019-01-08 14:07 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
snip
> > We fix that by limiting the search for files that have been actually
> > touched during the install step, now that we have the proper timestamp
> > for it.
[--SNIP--]
> > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > index 3de8a99675..42aebeb49d 100644
> > --- a/package/pkg-generic.mk
> > +++ b/package/pkg-generic.mk
> > @@ -67,7 +67,7 @@ define step_pkg_size_inner
> >         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
> >         cd $(2); \
> >         find . \( -type f -o -type l \) \
> > -               -newer $($(PKG)_DIR)/.stamp_built \
> > +               -newer $@_before \
> While it is probably not important, just noting here that if package A
> installs a file with a modification time in the future, then package B
> installed after A would still get that future file in its list. In
> fact, every package built after pkg A would then get that file into
> 'its' list.

How is this different from the current situation?

Maybe in that case, it would make sense that we touch all files after a
package installation (limited to that package's files, of course)?

My position is that, if a package voluntarily install stuff in the
future, it is borked. I.e. either it uses an absolute time, in which
case that time will be in the past some time in the future, or it uses
a delta, in which case it is not reproducible.

Furthermore when we do an actual reproducible build, we actually touch
all files at the end of the build, so a package that relies on file's
timestamp is not reproducible.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
  2019-01-08 13:35   ` Thomas De Schampheleire
@ 2019-01-08 16:29     ` Yann E. MORIN
  2019-01-08 17:30       ` Thomas De Schampheleire
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 16:29 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 14:35 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > Furthermore, that format is not easily extensible.
> >
> > So, introduce a parser, in python, that abstracts the format of these
> > files, and returns a dictionaries. Using dictionaries makes it easy for
> returns a dictionary.

Actually, a call to the function will return a list of dictionarries,
one for each file in the list. They are yielded, so returned one by one
to iterate over easily, but many dictionaries are returned.

So:   s/a /an iterator to a list of /

(I believe this to be an iterator, right?)

> > +def parse_pkg_file_list(path):
> > +    with open(path, 'rb') as f:
> 
> Why exactly do you open as binary?

Because it contains filenames, and filenames are a sequence of bytes,
not encioded in any way. Filenames can contain 0xbd (? in iso8859-15)
or it may contain 0x6f65 which is U+0153 encoded in UTF-8. It may even
contain both. It is not a hypotetical situation, as already encountered
it more than once.

There is no way we can know the encoding of a filename, so we treat them
for what they are: sequence of bytes.

> IIRC this will cause the need for decoding the strings on Python 3,
> which you don't seem to do. This was also the reason why the orginal
> check-uniq-files needed 'b' markers in front of some strings like
> b','.

Exactly, and hence the reason that check-uniq-files will try to decide
those strings.

There might be a gap in the transition, though, as size-stat may need to
represent those strings when generating the graphs, or generating the
CVS files... Damned...

> > +        for rec in f.readlines():
> > +            l = rec.split(',0')
> 
> This looks wrong to me, there is no text ',0' in the input.
> I think you meant rec.split(',', 1), no, like in the original code?

Yeah, I borked a rebase at some point...

> > +            d = {
> > +                  'file': l[0],
> > +                  'pkg':  l[1],
> 
> This seems also wrong, 0 and 1 are swapped.

Yeah, I borked a rebase at some point.

> Example input is:
> 
> busybox,./usr/bin/eject
> busybox,./usr/bin/basename
> busybox,./usr/bin/hexedit
> busybox,./usr/bin/readlink
> busybox,./usr/bin/cmp
> 
> so I would expect 'pkg' to be l[0] and 'file' to be l[1].
> 
> Note that 'l' could perhaps become a more meaningful variable name. I
> also think that one of the python rules is that variable names should
> be minimum 3 characters.

So, flake8 does whine about 'l' but not about 'd'. And I borked a rebase
at some point, where I renamed 'l' and it ended up in a later commit.

I'll fix that here.

> > -    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> > -        for line in pkg_file_list.readlines():
> > -            pkg, _, file = line.rstrip(b'\n').partition(b',')
> Note that the rstrip of newlines is no longer present in parse_pkg_file_list.

Good catch. I'll fix.

> > +    fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > +    for record in parse_pkg_file_list(fname):
> > +        # remove the initial './' in each file path
> > +        fpath = record['file'].strip()[2:]
> 
> The stripping here and the rstrip in check-uniq-files could perhaps
> all be moved to parse_pkg_file_list.

So, I am not sure about this one. I'm not even sure why we don't keep
the '/' either. After all, they are target files, so they should be
representable as they appear on the target, i.e. with a leading '/'

In any case, I wanted the introduction of the parser to be a drop-in
replacement for the existing ad-hoc parsers, as much as possible.

The seamantic of stripping the leading './' can be commonalised (or
fixed, or dropped) in a later patch, when this series is already big
enough as it is, and already touching (pun!) touchy (re-pun!) topics.

Regards,
Yann E. MORIN.

> > +        fullpath = os.path.join(builddir, "target", fpath)
> > +        add_file(filesdict, fpath, fullpath, record['pkg'])
> >      return filesdict
> >
> 
> /Thomas

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-08 14:56   ` Thomas De Schampheleire
@ 2019-01-08 16:37     ` Yann E. MORIN
  2019-01-08 17:22       ` Thomas De Schampheleire
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 16:37 UTC (permalink / raw)
  To: buildroot

On 2019-01-08 15:56 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > Rewrite that script in python.
[--SNIP--]
> > +def main():
> > +    parser = argparse.ArgumentParser()
> > +    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
> > +    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
> > +    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
> > +    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
> > +    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')
> I think the above 'metavar' options are not necessary, as they are the
> default value.

I'll check and drop if they indeed are not.

> > +    args = parser.parse_args()
> > +
> > +    if args.ignore is not None:
> > +        # Ensure we do have single '/' as separators, and that we have a
> > +        # leading and a trailing one, then append to the global list.
> > +        for pattern in args.ignore:
> > +            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
> Note that the global list itself does not have trailing slashes. This
> seems inconsistent to me.

It is inconsistent, but I kept the existing behaviour intact as much as
possible, so the python script is a drop-in replacement for the shjell
script, with the same semantics.

> > +    ignores_re = set()
> > +    for i in IGNORES:
> > +        ignores_re.add(re.compile(i))
> > +
> > +    arch_re = re.compile('^  Machine: +(.+)')
> > +
> > +    target_dir = os.environ['TARGET_DIR']
> > +
> > +    exit_code = 0
> > +    for record in parse_pkg_file_list(args.pkg_list):
> > +        if record['pkg'] != args.package:
> > +            continue
> > +
> > +        fpath = record['file']
> > +
> > +        ignored = False
> > +        for i in ignores_re:
> > +            if i.match(fpath):
> > +                ignored = True
> > +                break
> > +        if ignored:
> > +            continue
> 
> We can never get into this if, right?, because there is already a
> break whenever ignored is set to True.

Yes we can, because the break only applied to the inner-most loop, which
is the one that iterates over the ignores regexps.

> Note that I think that performance will be better with a list
> comprehension instead of explicit for's. Something like (untested):
> 
> valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
>             if record['pkg'] == args.package
>             and not any(ignore_re.match(record['file']) for ignore_re
> in ignores_re) ]

Sorry, but this is totally illegible to me.

> for record in valid_records:
>     ...
> 
> 
> You can normally swap the [ ] for ( ) to reduce memory consumption
> (generator iso list comprehension, yielding one entry at a time).
> 
> You can argue about readability, but for me this is not less readable
> than the expanded version.

Well, for me your version is totally unredable, sorry... :-/

The one I wrote at least clearly describes what it is doing:

  - for each record in the list:

    - check if the package is the one we're looking at, if not continue to
      next record

    - for each regecp to ignore, check if the filename matches it, and
      if one is matched, memorizre the fact and break the loop (fast-path).

    - if the file is to be ignored, continue to the next record

and so on...

The size argument is moot IMO, because we do not have thousands of
regexps to ignore, and the parse_pkg_file_list() already yields its
results.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 06/19] infra/pkg-generic: only list files installed by the current package
  2019-01-08 16:08     ` Yann E. MORIN
@ 2019-01-08 16:55       ` Thomas De Schampheleire
  2019-01-08 20:02         ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 16:55 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 17:08, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Thomas DS, All,
>
> Thanks for the review! :-)
>
> On 2019-01-08 14:07 +0100, Thomas De Schampheleire spake thusly:
> > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribi?:
> snip
> > > We fix that by limiting the search for files that have been actually
> > > touched during the install step, now that we have the proper timestamp
> > > for it.
> [--SNIP--]
> > > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > > index 3de8a99675..42aebeb49d 100644
> > > --- a/package/pkg-generic.mk
> > > +++ b/package/pkg-generic.mk
> > > @@ -67,7 +67,7 @@ define step_pkg_size_inner
> > >         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
> > >         cd $(2); \
> > >         find . \( -type f -o -type l \) \
> > > -               -newer $($(PKG)_DIR)/.stamp_built \
> > > +               -newer $@_before \
> > While it is probably not important, just noting here that if package A
> > installs a file with a modification time in the future, then package B
> > installed after A would still get that future file in its list. In
> > fact, every package built after pkg A would then get that file into
> > 'its' list.
>
> How is this different from the current situation?

Yes, it's the same as in the current situation, but not in 'my'
current situation where I'm still relying on md5sum :-)
Part of my interest in reviewing these changes is the hope that they
could remove the need for my revert of the 'shave off' commit that
switches to mtime instead of md5sum, and thus following upstream
Buildroot in this respect.

>
> Maybe in that case, it would make sense that we touch all files after a
> package installation (limited to that package's files, of course)?

But it could be that the package has good reasons to expect a specific
timestamp, or more probably a relative time order between certain
files.
I'm not sure if it is common, like I said I'd need to check in my own case.

>
> My position is that, if a package voluntarily install stuff in the
> future, it is borked. I.e. either it uses an absolute time, in which
> case that time will be in the past some time in the future, or it uses
> a delta, in which case it is not reproducible.

A time delta expected between two files is not irreproducible if both
files come from the same package.

>
> Furthermore when we do an actual reproducible build, we actually touch
> all files at the end of the build, so a package that relies on file's
> timestamp is not reproducible.

I was not aware of this. That changes things, of course.
Can you point me to the exact place in the code?

Thanks,
Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-08 16:37     ` Yann E. MORIN
@ 2019-01-08 17:22       ` Thomas De Schampheleire
  2019-01-08 20:33         ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 17:22 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> On 2019-01-08 15:56 +0100, Thomas De Schampheleire spake thusly:
> > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribi?:
> [--SNIP--]
> > > Rewrite that script in python.
> [--SNIP--]
> > > +def main():
> > > +    parser = argparse.ArgumentParser()
> > > +    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
> > > +    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
> > > +    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
> > > +    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
> > > +    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')
> > I think the above 'metavar' options are not necessary, as they are the
> > default value.
>
> I'll check and drop if they indeed are not.
>
> > > +    args = parser.parse_args()
> > > +
> > > +    if args.ignore is not None:
> > > +        # Ensure we do have single '/' as separators, and that we have a
> > > +        # leading and a trailing one, then append to the global list.
> > > +        for pattern in args.ignore:
> > > +            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
> > Note that the global list itself does not have trailing slashes. This
> > seems inconsistent to me.
>
> It is inconsistent, but I kept the existing behaviour intact as much as
> possible, so the python script is a drop-in replacement for the shjell
> script, with the same semantics.
>
> > > +    ignores_re = set()
> > > +    for i in IGNORES:
> > > +        ignores_re.add(re.compile(i))
> > > +
> > > +    arch_re = re.compile('^  Machine: +(.+)')
> > > +
> > > +    target_dir = os.environ['TARGET_DIR']
> > > +
> > > +    exit_code = 0
> > > +    for record in parse_pkg_file_list(args.pkg_list):
> > > +        if record['pkg'] != args.package:
> > > +            continue
> > > +
> > > +        fpath = record['file']
> > > +
> > > +        ignored = False
> > > +        for i in ignores_re:
> > > +            if i.match(fpath):
> > > +                ignored = True
> > > +                break
> > > +        if ignored:
> > > +            continue
> >
> > We can never get into this if, right?, because there is already a
> > break whenever ignored is set to True.
>
> Yes we can, because the break only applied to the inner-most loop, which
> is the one that iterates over the ignores regexps.

Ok, I see.

>
> > Note that I think that performance will be better with a list
> > comprehension instead of explicit for's. Something like (untested):
> >
> > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
> >             if record['pkg'] == args.package
> >             and not any(ignore_re.match(record['file']) for ignore_re
> > in ignores_re) ]
>
> Sorry, but this is totally illegible to me.

:-D

Ok, I won't argue about the fact itself.
Just, as a reference, small decomposition:

list = [ f(x) for x in otherlist ]

is a 'list comprehension' and is a performant way to generate a list
without requiring a for loop.
f(x) can be any action on x, e.g. x.strip() or more complex
expressions involving x.

The 'for x in otherlist' can be restricted further with 'if' clauses.
In the code I gave above, the base list is the return value of
'parse_pkg_file_list(..)' and it is filtered by two clauses.
The first one:
    record['pkg'] == args.package
only preserves entries in the base list that match the package we are
interested in.
This is the equivalent of:
       if record['pkg'] != args.package:
           continue

and the second clause:
    not any(ignore_re.match(record['file']) for ignore_re in ignores_re)

Here, any(x) is a function that takes an iterable (x) and returns True
if any of the items in 'x' evaluate to True.
Similar to any(x) there is also all(x) which only returns True if
_all_ items in x evaluate to True.

The value passed to 'any' is '(ignore_re.match(record['file']) for
ignore_re in ignores_re)'
which itself is a type of list comprehension, except that it actually
is a generator comprehension. 'generator' is what you called an
'iterator' in another patch: a thing that 'yields' values one by one.
The difference does not really matter for this discussion, so the base
format is still:

f(x) for x in some_iterable

Here f(x) is ignore_re.match(..) and x is 'ignore_re' itself.

So the following code:
    valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
                      if record['pkg'] == args.package
                      and not any(ignore_re.match(record['file']) for
ignore_re in ignores_re) ]

naturally reads as:

valid_records  is the list of records returned by parse_pkg_file_list
for which the 'pkg' field is the one we want, and not any of our
ignore expression matches the 'file' field of the record.

I hope it makes things a bit more clear :)

/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
  2019-01-08 16:29     ` Yann E. MORIN
@ 2019-01-08 17:30       ` Thomas De Schampheleire
  2019-01-08 20:52         ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 17:30 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 17:29, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Thomas DS, All,
>
> On 2019-01-08 14:35 +0100, Thomas De Schampheleire spake thusly:
> > El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribi?:
> [--SNIP--]
> > > Furthermore, that format is not easily extensible.
> > >
> > > So, introduce a parser, in python, that abstracts the format of these
> > > files, and returns a dictionaries. Using dictionaries makes it easy for
> > returns a dictionary.
>
> Actually, a call to the function will return a list of dictionarries,
> one for each file in the list. They are yielded, so returned one by one
> to iterate over easily, but many dictionaries are returned.
>
> So:   s/a /an iterator to a list of /
>
> (I believe this to be an iterator, right?)

I think it should be 'generator' in Python-terms.

>
> > > +def parse_pkg_file_list(path):
> > > +    with open(path, 'rb') as f:
> >
> > Why exactly do you open as binary?
>
> Because it contains filenames, and filenames are a sequence of bytes,
> not encioded in any way. Filenames can contain 0xbd (? in iso8859-15)
> or it may contain 0x6f65 which is U+0153 encoded in UTF-8. It may even
> contain both. It is not a hypotetical situation, as already encountered
> it more than once.
>
> There is no way we can know the encoding of a filename, so we treat them
> for what they are: sequence of bytes.
>
> > IIRC this will cause the need for decoding the strings on Python 3,
> > which you don't seem to do. This was also the reason why the orginal
> > check-uniq-files needed 'b' markers in front of some strings like
> > b','.
>
> Exactly, and hence the reason that check-uniq-files will try to decide
> those strings.
>
> There might be a gap in the transition, though, as size-stat may need to
> represent those strings when generating the graphs, or generating the
> CVS files... Damned...
>
> > > +        for rec in f.readlines():
> > > +            l = rec.split(',0')
> >
> > This looks wrong to me, there is no text ',0' in the input.
> > I think you meant rec.split(',', 1), no, like in the original code?
>
> Yeah, I borked a rebase at some point...
>
> > > +            d = {
> > > +                  'file': l[0],
> > > +                  'pkg':  l[1],
> >
> > This seems also wrong, 0 and 1 are swapped.
>
> Yeah, I borked a rebase at some point.
>
> > Example input is:
> >
> > busybox,./usr/bin/eject
> > busybox,./usr/bin/basename
> > busybox,./usr/bin/hexedit
> > busybox,./usr/bin/readlink
> > busybox,./usr/bin/cmp
> >
> > so I would expect 'pkg' to be l[0] and 'file' to be l[1].
> >
> > Note that 'l' could perhaps become a more meaningful variable name. I
> > also think that one of the python rules is that variable names should
> > be minimum 3 characters.
>
> So, flake8 does whine about 'l' but not about 'd'. And I borked a rebase
> at some point, where I renamed 'l' and it ended up in a later commit.
>
> I'll fix that here.
>
> > > -    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> > > -        for line in pkg_file_list.readlines():
> > > -            pkg, _, file = line.rstrip(b'\n').partition(b',')
> > Note that the rstrip of newlines is no longer present in parse_pkg_file_list.
>
> Good catch. I'll fix.
>
> > > +    fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > > +    for record in parse_pkg_file_list(fname):
> > > +        # remove the initial './' in each file path
> > > +        fpath = record['file'].strip()[2:]
> >
> > The stripping here and the rstrip in check-uniq-files could perhaps
> > all be moved to parse_pkg_file_list.
>
> So, I am not sure about this one. I'm not even sure why we don't keep
> the '/' either. After all, they are target files, so they should be
> representable as they appear on the target, i.e. with a leading '/'
>
> In any case, I wanted the introduction of the parser to be a drop-in
> replacement for the existing ad-hoc parsers, as much as possible.
>
> The seamantic of stripping the leading './' can be commonalised (or
> fixed, or dropped) in a later patch, when this series is already big
> enough as it is, and already touching (pun!) touchy (re-pun!) topics.

Note that I only wanted to refer to the strip() call which only strips
whitespace.
The removal of './' is done with '[2:]' which is called 'slicing'.

But, if you are concerned about special filenames, including those
with spaces, then we should not do any stripping on them.

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files
  2019-01-08 15:07   ` Thomas De Schampheleire
@ 2019-01-08 19:27     ` Yann E. MORIN
  0 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 19:27 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2019-01-08 16:07 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
> > The existing format for the packages-files lists has two drawbacks:
[--SNIP--]
> > AS such, introduce a new format for those files, that solves both
> > issues.
[--SNIP--]
> > +# Read the binary-opened file object f with \0\n separated records (aka lines).
> > +# Highly inspired by:
> > +# https://stackoverflow.com/questions/19600475/how-to-read-records-terminated-by-custom-separator-from-file-in-python
> > +def _readlines0n(f):
> > +    buf = b''
> > +    while True:
> > +        newbuf = f.read(1048576)
> I would find 1024 * 1024 more readable.

We definitely have a different eyesight, as I find this easier to grok! ;-)

I should note that we absolutely do not care about the size we buffer.
We could very well read it whole, or use smaller chunks (the original
inspiration read 4kiB blocks).

But I can switch to 1024*1024.

[--SNIP--]
> >  # Iterate on all records of the packages-file-list file passed as filename
> >  # Returns an iterator over a list of dictionaries. Each dictionary contains
> >  # these keys (others maybe added in the future):
> > @@ -12,11 +32,11 @@ import subprocess
> >  # 'pkg':  the last package that installed that file
> >  def parse_pkg_file_list(path):
> >      with open(path, 'rb') as f:
> I now understand why you read as binary.

Even though here we really want to read a binary file, we already wanted
to in the previous patch, because filenames can be any binary sequence.

> > -        for rec in f.readlines():
> > -            l = rec.split(',0')
> I still think this ',0' was wrong.

Yes it was, I'll fix it (well, I already fixed it now).

> > +        for rec in _readlines0n(f):
> > +            srec = rec.split(b'\x00')
> >              d = {
> > -                  'file': l[0],
> > -                  'pkg':  l[1],
> > +                  'file': srec[0],
> > +                  'pkg':  srec[1],
> and I now see how the swap in a previous commit could go unnoticed in
> your testing :-)

Yeah. Thanks for spotting it. I'll still fix the interim patches to be
consistent.

Thank you! :-)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files
  2019-01-08 15:13   ` Thomas De Schampheleire
@ 2019-01-08 19:31     ` Yann E. MORIN
  0 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 19:31 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 16:13 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> >  define step_pkg_size_inner
> >         cd $(2); \
> > -       find . \( -type f -o -type l \) \
> > -               -newer $@_before \
> > -       |sed -r -e 's/$$/\x00$(1)\x00/' \
> > +       { \
> > +               find . -type d -newer $@_before -printf 'directory  %p\n'; \
> > +               find . -xtype f -newer $@_before -print0 \
> Note that the original implementation also used '-xtype' on the
> directory case, I think it's important.

I pondered whether we wanted to restore it for directories too, but
completely messed the conditions in my head, and I missed the whole
picture. If I can recall why I did not do so, I shall write it in the
commit log.

Otherwise, yes, we also want to use -xtype for directories too.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step
  2019-01-08 15:24   ` Thomas De Schampheleire
@ 2019-01-08 19:36     ` Yann E. MORIN
  0 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 19:36 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 16:24 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:07, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
> > So, mnove the check to the corresponding finalize steps.
[--SNIP--]
> > +.PHONY: staging-finalize
> This should be host-finalize.

Fixed.

I also did not reply to some other typoes or minor fixes, in which case
consider I agree with your comments and fixed those here. Otherwise, I'd
have bikeshedded on the list. ;-)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared
  2019-01-08 15:29   ` Thomas De Schampheleire
@ 2019-01-08 19:44     ` Yann E. MORIN
  0 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 19:44 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 16:29 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:07, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > +    parser.add_argument('-d', '--dir', metavar="DIR", required=True,
> > +                        help='Directory used as base (target/, staging/ or host/))')
> Like commented elsewhere, metavar should be unnecessary.

And as I said elsewhere, I tested and indeed the default for metavar is
already correct for our usecase, so I dropped it everywhere. Thanks!

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 06/19] infra/pkg-generic: only list files installed by the current package
  2019-01-08 15:56     ` Thomas De Schampheleire
@ 2019-01-08 19:51       ` Yann E. MORIN
  0 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 19:51 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 16:56 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 14:07, Thomas De Schampheleire
> (<patrickdepinguin@gmail.com>) escribi?:
[--SNIP--]
> > > -               -newer $($(PKG)_DIR)/.stamp_built \
> > > +               -newer $@_before \
> > While it is probably not important, just noting here that if package A
> > installs a file with a modification time in the future, then package B
> > installed after A would still get that future file in its list. In
> > fact, every package built after pkg A would then get that file into
> > 'its' list.
> Possibly more realistically, a package installing a file with a fixed
> timestamp in the past, perhaps a timestamp of the build time preserved
> via 'cp -a'.

Actually, we already have this very issue with our skeleton: its files
are copied and their timestamps are kept as-is, so the files are almost
certainly older than the built timestamp, and thus are not accounted for
againt the skeleton.

I already identified this problem a few days ago while working on this
series, and I had considered fixing it later, or the series would get
even bigger...

And yes, the problem already happens with the current situation.

If we wanted to be exhaustive, we'd also list files that are not already
in the list (and warn loudly). And since we introduce a new format, we
could more easily add a "metadata" field that specifies why the entry
was added (mtime, new-but-old, etc...)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 06/19] infra/pkg-generic: only list files installed by the current package
  2019-01-08 16:55       ` Thomas De Schampheleire
@ 2019-01-08 20:02         ` Yann E. MORIN
  0 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 20:02 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 17:55 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 17:08, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
> > On 2019-01-08 14:07 +0100, Thomas De Schampheleire spake thusly:
> > > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > > (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > > > -               -newer $($(PKG)_DIR)/.stamp_built \
> > > > +               -newer $@_before \
> > > While it is probably not important, just noting here that if package A
> > > installs a file with a modification time in the future, then package B
> > > installed after A would still get that future file in its list. In
> > > fact, every package built after pkg A would then get that file into
> > > 'its' list.
> > How is this different from the current situation?
> 
> Yes, it's the same as in the current situation, but not in 'my'
> current situation where I'm still relying on md5sum :-)
> Part of my interest in reviewing these changes is the hope that they
> could remove the need for my revert of the 'shave off' commit that
> switches to mtime instead of md5sum, and thus following upstream
> Buildroot in this respect.

Yes, and I thank you a lot for this thorough review! :-)

> > Maybe in that case, it would make sense that we touch all files after a
> > package installation (limited to that package's files, of course)?
> But it could be that the package has good reasons to expect a specific
> timestamp, or more probably a relative time order between certain
> files.
> I'm not sure if it is common, like I said I'd need to check in my own case.

The only case I'm aware of, is that systemd can detect the timestamp of
/usr and decide to run upgrade scripts if /usr is more recent than a
file in /etc, hence the reason we have:

    https://git.buildroot.org/buildroot/tree/Makefile#n786

But see below...

> > My position is that, if a package voluntarily install stuff in the
> > future, it is borked. I.e. either it uses an absolute time, in which
> > case that time will be in the past some time in the future, or it uses
> > a delta, in which case it is not reproducible.
> 
> A time delta expected between two files is not irreproducible if both
> files come from the same package.
> 
> >
> > Furthermore when we do an actual reproducible build, we actually touch
> > all files at the end of the build, so a package that relies on file's
> > timestamp is not reproducible.
> 
> I was not aware of this. That changes things, of course.
> Can you point me to the exact place in the code?

    https://git.buildroot.org/buildroot/tree/fs/common.mk#n39

And now I notice that the systemd stuff is borked in this case, because
/usr must really be the time of build, not the time of the commit date.

I now realise that there is a whole slew of incorrect assumptions we've
made about SOURCE_DATE_EPOCH in the context of Buildroot, but I shall
write about this in another thread.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-08 17:22       ` Thomas De Schampheleire
@ 2019-01-08 20:33         ` Yann E. MORIN
  2019-01-08 20:46           ` Thomas De Schampheleire
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 20:33 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 18:22 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > > Note that I think that performance will be better with a list
> > > comprehension instead of explicit for's. Something like (untested):
> > >
> > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
> > >             if record['pkg'] == args.package
> > >             and not any(ignore_re.match(record['file']) for ignore_re
> > > in ignores_re) ]
> >
> > Sorry, but this is totally illegible to me.
> 
> :-D
> 
> Ok, I won't argue about the fact itself.
> Just, as a reference, small decomposition:
> 
> list = [ f(x) for x in otherlist ]
> 
> is a 'list comprehension' and is a performant way to generate a list
> without requiring a for loop.
> f(x) can be any action on x, e.g. x.strip() or more complex
> expressions involving x.

Yeah, I know what it is [0], and I can actually decipher the code above
with quite a bit of mental processing. But that is it: it's deciphering,
not reading.

[0] I even use them: https://patchwork.ozlabs.org/patch/1021622/

[--SNIP--]
> Here, any(x) is a function that takes an iterable (x) and returns True
> if any of the items in 'x' evaluate to True.

Ah, I did not know about any() (although I did infer its behaviour from
its name!) Thanks! :-)

> Similar to any(x) there is also all(x) which only returns True if
> _all_ items in x evaluate to True.

And is there none() ? ;-) I guess it's "not all()"...

> The value passed to 'any' is '(ignore_re.match(record['file']) for
> ignore_re in ignores_re)'
> which itself is a type of list comprehension, except that it actually
> is a generator comprehension. 'generator' is what you called an
> 'iterator' in another patch: a thing that 'yields' values one by one.

Yep, generator, not iterator.

[--SNIP--]
> I hope it makes things a bit more clear :)

As I said, I can actually decipher it (well, once the indentation if
fixed anyway! ;-) ), but still it is less readable for me...

And since I would have hard a time justifying it, or would have a hard
time maintaining it in the future, I prefer writing (and thus keeping)
code that I can work with later.

If others find the comprehensions easier to comprehend (pun), then fine,
but I would be much less at ease with that.

Thanks! :-)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-08 20:33         ` Yann E. MORIN
@ 2019-01-08 20:46           ` Thomas De Schampheleire
  2019-01-08 21:16             ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-08 20:46 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Thomas DS, All,
>
> On 2019-01-08 18:22 +0100, Thomas De Schampheleire spake thusly:
> > El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribi?:
> [--SNIP--]
> > > > Note that I think that performance will be better with a list
> > > > comprehension instead of explicit for's. Something like (untested):
> > > >
> > > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
> > > >             if record['pkg'] == args.package
> > > >             and not any(ignore_re.match(record['file']) for ignore_re
> > > > in ignores_re) ]
> > >
> > > Sorry, but this is totally illegible to me.
> >
> > :-D
> >
> > Ok, I won't argue about the fact itself.
> > Just, as a reference, small decomposition:
> >
> > list = [ f(x) for x in otherlist ]
> >
> > is a 'list comprehension' and is a performant way to generate a list
> > without requiring a for loop.
> > f(x) can be any action on x, e.g. x.strip() or more complex
> > expressions involving x.
>
> Yeah, I know what it is [0], and I can actually decipher the code above
> with quite a bit of mental processing. But that is it: it's deciphering,
> not reading.
>
> [0] I even use them: https://patchwork.ozlabs.org/patch/1021622/
>
> [--SNIP--]
> > Here, any(x) is a function that takes an iterable (x) and returns True
> > if any of the items in 'x' evaluate to True.
>
> Ah, I did not know about any() (although I did infer its behaviour from
> its name!) Thanks! :-)
>
> > Similar to any(x) there is also all(x) which only returns True if
> > _all_ items in x evaluate to True.
>
> And is there none() ? ;-) I guess it's "not all()"...

none() does not exist but if you intend what I understand by its name
then it is:
'not any()'

>
> > The value passed to 'any' is '(ignore_re.match(record['file']) for
> > ignore_re in ignores_re)'
> > which itself is a type of list comprehension, except that it actually
> > is a generator comprehension. 'generator' is what you called an
> > 'iterator' in another patch: a thing that 'yields' values one by one.
>
> Yep, generator, not iterator.
>
> [--SNIP--]
> > I hope it makes things a bit more clear :)
>
> As I said, I can actually decipher it (well, once the indentation if
> fixed anyway! ;-) ), but still it is less readable for me...
>
> And since I would have hard a time justifying it, or would have a hard
> time maintaining it in the future, I prefer writing (and thus keeping)
> code that I can work with later.
>
> If others find the comprehensions easier to comprehend (pun), then fine,
> but I would be much less at ease with that.

By all means, keep your original code.

At least by talking about it, we have learnt each other's views and
shared some knowledge :-)

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
  2019-01-08 17:30       ` Thomas De Schampheleire
@ 2019-01-08 20:52         ` Yann E. MORIN
  0 siblings, 0 replies; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 20:52 UTC (permalink / raw)
  To: buildroot

Thomas DS, All,

On 2019-01-08 18:30 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 17:29, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > Actually, a call to the function will return a list of dictionarries,
> > one for each file in the list. They are yielded, so returned one by one
> > to iterate over easily, but many dictionaries are returned.
> > So:   s/a /an iterator to a list of /
> > (I believe this to be an iterator, right?)
> I think it should be 'generator' in Python-terms.

Generator that is, then. Thanks.

[--SNIP--]
> > > > +    fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > > > +    for record in parse_pkg_file_list(fname):
> > > > +        # remove the initial './' in each file path
> > > > +        fpath = record['file'].strip()[2:]
> > > The stripping here and the rstrip in check-uniq-files could perhaps
> > > all be moved to parse_pkg_file_list.
[--SNIP--]
> Note that I only wanted to refer to the strip() call which only strips
> whitespace.

Right, but then here it is useless to strip() the string, because there
is no leading spaces in file names (they are anchored with a leadin ./
anyway!), and any trailing spaces would *be* part of the filename.

But I just kpet the code as-is and just made the parser part common.
What caller do with the fields is their concerns! ;-)

> But, if you are concerned about special filenames, including those
> with spaces, then we should not do any stripping on them.

I'm not sure why size-stat needed to strip and slice to begin with...

Maybe it was because it was using f.readlines() and used strip() to get
rid of the trailing '\n' ?

Or maybe the other Thomas has a better reminiscence of why he did that
more than three years ago? ;-]

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-08 20:46           ` Thomas De Schampheleire
@ 2019-01-08 21:16             ` Yann E. MORIN
  2019-01-09 14:47               ` Thomas De Schampheleire
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 21:16 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2019-01-08 21:46 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
> > And is there none() ? ;-) I guess it's "not all()"...
> none() does not exist but if you intend what I understand by its name
> then it is:
> 'not any()'

Dang right. "not all()" would mean at least one is not True, i.e.
"not all(x)" ? "any(not x)"

[--SNIP--]
> > If others find the comprehensions easier to comprehend (pun), then fine,
> > but I would be much less at ease with that.
> 
> By all means, keep your original code.
> 
> At least by talking about it, we have learnt each other's views and
> shared some knowledge :-)

Yes, I guess with writing more python I'd get more used to reading it.
And I certainly did appreciate the walkthrough, thanks. :-)

BTW, do you intend to join after FOSDEM?

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
  2019-01-08 15:53   ` Thomas De Schampheleire
@ 2019-01-08 21:30     ` Yann E. MORIN
  2019-01-09 13:39       ` Thomas De Schampheleire
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-08 21:30 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2019-01-08 16:53 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 13:51, Thomas De Schampheleire
> (<patrickdepinguin@gmail.com>) escribi?:
> > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribi?:
> > > Currently, the instrumentation steps, that we run after a package is
> > > installed, get confused about the files that package may have be
> > > responsible for.
[--SNIP--]
> > > This series is thus an attempt at fixing all those issues.
[--SNIP--]
> > For reference, the discussion thread when commit 7fb6e782542f was submitted:
> > http://lists.busybox.net/pipermail/buildroot/2018-March/215979.html
> > and my comments on moving away from the md5sum to the mtime approach:
> > http://lists.busybox.net/pipermail/buildroot/2018-March/216331.html

I do realise today that I did not reply to you back at that time.
Apologies.

Still I have no idea on how we could have _easily_ fixed the issue
anyway. The solution proposed by Trent would have been way too
complicated to come up with in a timely manner, no md5sum-ing staging/
and host/ would prevent PPD, and thus TLPB, to eventually land.

> > I will be cross-checking this series against these comments, as an
> > accurate list in packages-file-list.txt is important for me. In my
> > current tree, I reverted commit 7fb6e782542f and disabled
> > check-uniq-files on staging and host, to alleviate the time impact but
> > still have an accurate list.
> 
> I finished reviewing, thanks for this interesting series and clear commits.
> 
> Regarding my old comments linked above, I think the packages-file-list
> will still be off if a package installs a file with a fixed timestamp,
> typically a timestamp in the past. I should test this series against
> our use case to see if we actually have such a case, but I'd need to
> do some preparatory work first so it may take some time.

As I already said, we already have the issue with our skeleton, anyway.

For our skeleton, we can probably fix it quite easily with some rsync
trickery (but I could not find how, so far).

But for a more generic solution, we'd need a way to detect files that
were not previously present but now are.

So we'd need to maintain a list of files from before the installation
and another for after the installation. And oh wait, isn't that
basically what we previously did (modulo the md5)? ;-)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
  2019-01-08 21:30     ` Yann E. MORIN
@ 2019-01-09 13:39       ` Thomas De Schampheleire
  2019-01-09 18:10         ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-09 13:39 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 22:30, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Thomas, All,
>
> On 2019-01-08 16:53 +0100, Thomas De Schampheleire spake thusly:
> > El mar., 8 ene. 2019 a las 13:51, Thomas De Schampheleire
> > (<patrickdepinguin@gmail.com>) escribi?:
> > > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > > (<yann.morin.1998@free.fr>) escribi?:
> > > > Currently, the instrumentation steps, that we run after a package is
> > > > installed, get confused about the files that package may have be
> > > > responsible for.
> [--SNIP--]
> > > > This series is thus an attempt at fixing all those issues.
> [--SNIP--]
> > > For reference, the discussion thread when commit 7fb6e782542f was submitted:
> > > http://lists.busybox.net/pipermail/buildroot/2018-March/215979.html
> > > and my comments on moving away from the md5sum to the mtime approach:
> > > http://lists.busybox.net/pipermail/buildroot/2018-March/216331.html
>
> I do realise today that I did not reply to you back at that time.
> Apologies.
>
> Still I have no idea on how we could have _easily_ fixed the issue
> anyway. The solution proposed by Trent would have been way too
> complicated to come up with in a timely manner, no md5sum-ing staging/
> and host/ would prevent PPD, and thus TLPB, to eventually land.
>
> > > I will be cross-checking this series against these comments, as an
> > > accurate list in packages-file-list.txt is important for me. In my
> > > current tree, I reverted commit 7fb6e782542f and disabled
> > > check-uniq-files on staging and host, to alleviate the time impact but
> > > still have an accurate list.
> >
> > I finished reviewing, thanks for this interesting series and clear commits.
> >
> > Regarding my old comments linked above, I think the packages-file-list
> > will still be off if a package installs a file with a fixed timestamp,
> > typically a timestamp in the past. I should test this series against
> > our use case to see if we actually have such a case, but I'd need to
> > do some preparatory work first so it may take some time.
>
> As I already said, we already have the issue with our skeleton, anyway.
>
> For our skeleton, we can probably fix it quite easily with some rsync
> trickery (but I could not find how, so far).
>
> But for a more generic solution, we'd need a way to detect files that
> were not previously present but now are.
>
> So we'd need to maintain a list of files from before the installation
> and another for after the installation. And oh wait, isn't that
> basically what we previously did (modulo the md5)? ;-)

Without md5sum, there is only a find before and after each package
installation, which shouldn't take long.

The md5 is only necessary if you care about changing existing files.
And normally, if you change existing files, their timestamp will be
updated, so your solution with '-newer-than foo_before' should be
fine. Only when you change files and still keep their old timestamps,
would something be missed.

So, a possible solution could be a combination of the original 'find
before+after' solution and your new 'newer-than foo_before' solution.
Am I talking gibberish?

/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python
  2019-01-08 21:16             ` Yann E. MORIN
@ 2019-01-09 14:47               ` Thomas De Schampheleire
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-09 14:47 UTC (permalink / raw)
  To: buildroot

El mar., 8 ene. 2019 a las 22:16, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Thomas, All,
>
> On 2019-01-08 21:46 +0100, Thomas De Schampheleire spake thusly:
> > El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribi?:
> > > And is there none() ? ;-) I guess it's "not all()"...
> > none() does not exist but if you intend what I understand by its name
> > then it is:
> > 'not any()'
>
> Dang right. "not all()" would mean at least one is not True, i.e.
> "not all(x)" ? "any(not x)"
>
> [--SNIP--]
> > > If others find the comprehensions easier to comprehend (pun), then fine,
> > > but I would be much less at ease with that.
> >
> > By all means, keep your original code.
> >
> > At least by talking about it, we have learnt each other's views and
> > shared some knowledge :-)
>
> Yes, I guess with writing more python I'd get more used to reading it.
> And I certainly did appreciate the walkthrough, thanks. :-)
>
> BTW, do you intend to join after FOSDEM?

Normally yes, although I still have to request official permission.
Possibly not all three days though, to be confirmed...

/Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
  2019-01-09 13:39       ` Thomas De Schampheleire
@ 2019-01-09 18:10         ` Yann E. MORIN
  2019-01-10 20:34           ` Thomas De Schampheleire
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-01-09 18:10 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2019-01-09 14:39 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 22:30, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > But for a more generic solution, we'd need a way to detect files that
> > were not previously present but now are.
> >
> > So we'd need to maintain a list of files from before the installation
> > and another for after the installation. And oh wait, isn't that
> > basically what we previously did (modulo the md5)? ;-)
> 
> Without md5sum, there is only a find before and after each package
> installation, which shouldn't take long.
> 
> The md5 is only necessary if you care about changing existing files.
> And normally, if you change existing files, their timestamp will be
> updated, so your solution with '-newer-than foo_before' should be
> fine. Only when you change files and still keep their old timestamps,
> would something be missed.
> 
> So, a possible solution could be a combination of the original 'find
> before+after' solution and your new 'newer-than foo_before' solution.
> Am I talking gibberish?

Now that I slept on it, detecting and fixing this problem appropriately
is impossible, unless we re-instate the leading md5sum as well.

If you allow for a package to install a file in the past, how do you
know that the file indeed did not previously exist?

That is, if the file did not previously exist, then comparing the list
before/after would indeed detect it. However, if the file did previously
exist, then your package would overwrite it *and* set it in the past,
and then comparing the list before/after would not detect it.

And in either case, our find -newer would not detect it either.

So, the only way to properly account for this situation would be, guess
what, to reinstate an md5sum before and after the installation and
compare them file by file.

However, as we've seen, this is totally unacceptable.

So, I hereby declare that I think that I believe that in my opinion I
can only think that a package which installs a file in the past is
broken (or borked, as you may prefer). ;-)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
  2019-01-09 18:10         ` Yann E. MORIN
@ 2019-01-10 20:34           ` Thomas De Schampheleire
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas De Schampheleire @ 2019-01-10 20:34 UTC (permalink / raw)
  To: buildroot

El mi?., 9 ene. 2019 a las 19:10, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribi?:
>
> Thomas, All,
>
> On 2019-01-09 14:39 +0100, Thomas De Schampheleire spake thusly:
> > El mar., 8 ene. 2019 a las 22:30, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribi?:
> [--SNIP--]
> > > But for a more generic solution, we'd need a way to detect files that
> > > were not previously present but now are.
> > >
> > > So we'd need to maintain a list of files from before the installation
> > > and another for after the installation. And oh wait, isn't that
> > > basically what we previously did (modulo the md5)? ;-)
> >
> > Without md5sum, there is only a find before and after each package
> > installation, which shouldn't take long.
> >
> > The md5 is only necessary if you care about changing existing files.
> > And normally, if you change existing files, their timestamp will be
> > updated, so your solution with '-newer-than foo_before' should be
> > fine. Only when you change files and still keep their old timestamps,
> > would something be missed.
> >
> > So, a possible solution could be a combination of the original 'find
> > before+after' solution and your new 'newer-than foo_before' solution.
> > Am I talking gibberish?
>
> Now that I slept on it, detecting and fixing this problem appropriately
> is impossible, unless we re-instate the leading md5sum as well.
>
> If you allow for a package to install a file in the past, how do you
> know that the file indeed did not previously exist?
>
> That is, if the file did not previously exist, then comparing the list
> before/after would indeed detect it. However, if the file did previously
> exist, then your package would overwrite it *and* set it in the past,
> and then comparing the list before/after would not detect it.
>
> And in either case, our find -newer would not detect it either.
>
> So, the only way to properly account for this situation would be, guess
> what, to reinstate an md5sum before and after the installation and
> compare them file by file.
>
> However, as we've seen, this is totally unacceptable.
>
> So, I hereby declare that I think that I believe that in my opinion I
> can only think that a package which installs a file in the past is
> broken (or borked, as you may prefer). ;-)

I have hacked my buildroot repo to use the old (find+md5sum) and
mtime-based solutions to get the list of installed files, and compare
them for each package. While I need to run this test on more
defconfigs to get a full picture, I already have following feedback:

- some packages use 'cp -dpf' to install files, where '-p' means
preserve (among other things) timestamp. Examples are dmalloc and
libfuse. In both cases the cp command is in the .mk file, not in the
package sources itself.

- for emlog, in my testrun, 'nbcat' was _not_ included in the list
based on mtime while it was installed. Inspection with 'ls -l
--full-time' revealed that nbcat had the exact same timestamp as the
_before file, meaning that 'find -newer' did not consider it a match.
We would actually need to allow the same timestamp as well, or make
sure that there is at least some time between the creation of the
_before target and the actual installs (usleep 1 or similar). The
first option has the theoretical problem than on an sufficiently fast
machine, you could match files of the previous package if they also
get the same timestamp as _before (e.g. no compilation just copying of
files).
I had the same situation with 'mtd':

-rw-r--r-- 1 tdescham tdescham      0 2019-01-10 20:28:13.376158950
+0100 output/build/mtd-2.0.1/.stamp_target_installed_before
-rwxr-xr-x 1 tdescham tdescham 115436 2019-01-10 20:28:13.376158950
+0100 output/target/usr/sbin/flashcp*
-rwxr-xr-x 1 tdescham tdescham 149696 2019-01-10 20:28:13.376158950
+0100 output/target/usr/sbin/flash_erase*
-rwxr-xr-x 1 tdescham tdescham  80912 2019-01-10 20:28:13.377158950
+0100 output/target/usr/sbin/flash_lock*
-rwxr-xr-x 1 tdescham tdescham  80896 2019-01-10 20:28:13.377158950
+0100 output/target/usr/sbin/flash_unlock*

Here flashcp and flash_erase have the same timestamp as the stamp
file, and flash_lock and flash_unlock are indeed newer. The latter two
are in the mtime-based list, the former two not.

- unionfs is a pure cmake package, but installs files in the future:

-rw-r--r-- 1 tdescham tdescham     0 2019-01-10 20:29:10.720156598
+0100 output/build/unionfs-2.0/.stamp_target_installed_before
-rwxr-xr-x 1 tdescham tdescham 75968 2019-01-10 20:29:08.000000000
+0100 output/target/usr/bin/unionfsctl*

Note that as far as I could trace, the install of this file is carried
out by cmake itself, not via an external command like 'cp' or
'install'. At least, in 'strace' I could not see any call, just an
open, read/write, close.


So, many borked packages :-)

Regardless if we can find a solution to the issues we encounter, the
fact that we do not notice _that_ there is an issue concerns me. I
think we have to implement such detection into Buildroot itself to
avoid accidents.


For reference, here is the hacked code I was using for this detection.
This is based on 2018.02.x with reverted instrumentation code still
based on find+md5sum:

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -90,8 +90,14 @@ define step_pkg_size_end
     comm -13 $($(PKG)_DIR)/.br_filelist_before
$($(PKG)_DIR)/.br_filelist_after | \
         while read hash file ; do \
             echo "$(1),$${file}" ; \
-        done >> $(BUILD_DIR)/packages-file-list$(3).txt
+        done | sort > $(BUILD_DIR)/.list-$(1)-orig
     rm -f $($(PKG)_DIR)/.br_filelist_before $($(PKG)_DIR)/.br_filelist_after
+    ( cd $(2); find . \( -type f -o -type l -o -type d \) \
+        -newer $@_before \
+        -exec printf '$(1),%s\n' {} + \
+        |sort > $(BUILD_DIR)/.list-$(1)-new \
+    )
+        diff -u $(BUILD_DIR)/.list-$(1)-orig
$(BUILD_DIR)/.list-$(1)-new > $(BUILD_DIR)/.diff-list-$(1) || true
 endef

 define step_pkg_size
@@ -324,6 +330,7 @@ endif
 $(BUILD_DIR)/%/.stamp_target_installed:
     @$(call step_start,install-target)
     @$(call MESSAGE,"Installing to target")
+    $(Q)touch $@_before
     $(foreach hook,$($(PKG)_PRE_INSTALL_TARGET_HOOKS),$(call $(hook))$(sep))
     +$($(PKG)_INSTALL_TARGET_CMDS)
     $(if $(BR2_INIT_SYSTEMD),\


Best regards,
Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible
  2019-01-07 22:05 ` [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible Yann E. MORIN
@ 2019-02-07 23:40   ` Arnout Vandecappelle
  2019-02-08 17:25     ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Arnout Vandecappelle @ 2019-02-07 23:40 UTC (permalink / raw)
  To: buildroot



On 07/01/2019 23:05, Yann E. MORIN wrote:
> +# If possible, try to decode the binary string s with the user's locale.
> +# If s contains characters that can't be decoded with that locale, return
> +# the representation (in the user's locale) of the un-decoded string.
> +def str_decode(s):
> +    try:
> +        return s.decode()
> +    except UnicodeDecodeError:
> +        return repr(s)

 I think s.decode(errors='replace') is exactly what we want: it prints the
question mark character for things that can't be represented, just like ls does.

 Regards,
 Arnout

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible
  2019-02-07 23:40   ` Arnout Vandecappelle
@ 2019-02-08 17:25     ` Yann E. MORIN
  2019-02-08 20:42       ` Arnout Vandecappelle
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-02-08 17:25 UTC (permalink / raw)
  To: buildroot

Arnout, All,

On 2019-02-08 00:40 +0100, Arnout Vandecappelle spake thusly:
> On 07/01/2019 23:05, Yann E. MORIN wrote:
> > +# If possible, try to decode the binary string s with the user's locale.
> > +# If s contains characters that can't be decoded with that locale, return
> > +# the representation (in the user's locale) of the un-decoded string.
> > +def str_decode(s):
> > +    try:
> > +        return s.decode()
> > +    except UnicodeDecodeError:
> > +        return repr(s)
> 
>  I think s.decode(errors='replace') is exactly what we want: it prints the
> question mark character for things that can't be represented, just like ls does.

In the case I used as example, i.e. ? (LATIN SMALL LIGATURE OE) as encoded
in iso8859-15, i.e. \xbd (e.g. stored in a file named 'meh'), with python
2.7:

    >>> with open('meh', 'rb') as f:
    ...    lines = f.readlines()
    ...
    >>> lines
    ['\xbd\n']
    >>> lines[0].decode(errors='replace')
    u'\ufffd\n'
    >>> print('{}'.format(lines[0].decode(errors='replace')))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
    >>>

The output with python3 is indeed what you believe will happen, but I
don't think it is so nice:

    >>> lines
    [b'\xbd\n']
    >>> lines[0].decode(errors='replace')
    '?\n'
    >>> print('{}'.format(lines[0].decode(errors='replace')))
    ?

    >>> 

And anyway, check-uniq file should work with python 2.7, since it is part
of the build tools, and python 2.7 is what we require.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible
  2019-02-08 17:25     ` Yann E. MORIN
@ 2019-02-08 20:42       ` Arnout Vandecappelle
  2019-02-08 21:22         ` Yann E. MORIN
  0 siblings, 1 reply; 57+ messages in thread
From: Arnout Vandecappelle @ 2019-02-08 20:42 UTC (permalink / raw)
  To: buildroot



On 08/02/2019 18:25, Yann E. MORIN wrote:
> Arnout, All,
> 
> On 2019-02-08 00:40 +0100, Arnout Vandecappelle spake thusly:
>> On 07/01/2019 23:05, Yann E. MORIN wrote:
>>> +# If possible, try to decode the binary string s with the user's locale.
>>> +# If s contains characters that can't be decoded with that locale, return
>>> +# the representation (in the user's locale) of the un-decoded string.
>>> +def str_decode(s):
>>> +    try:
>>> +        return s.decode()
>>> +    except UnicodeDecodeError:
>>> +        return repr(s)
>>
>>  I think s.decode(errors='replace') is exactly what we want: it prints the
>> question mark character for things that can't be represented, just like ls does.
> 
> In the case I used as example, i.e. ? (LATIN SMALL LIGATURE OE) as encoded
> in iso8859-15, i.e. \xbd (e.g. stored in a file named 'meh'), with python
> 2.7:
> 
>     >>> with open('meh', 'rb') as f:
>     ...    lines = f.readlines()
>     ...
>     >>> lines
>     ['\xbd\n']
>     >>> lines[0].decode(errors='replace')
>     u'\ufffd\n'
>     >>> print('{}'.format(lines[0].decode(errors='replace')))
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

 Meh, Python2 unicode handling always confuses the hell out of me...

 So, to do it well, in python3 you need to do:

print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(),errors='replace'))

while in python2 the proper thing to do is

print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(), \
	errors='replace').encode(sys.getfilesystemencoding(),errors='replace'))

(sys.getfilesystemencoding() makes sure we use the user's encoding so stuff that
can be printed gets properly printed).

 I couldn't find a way to do the right thing both in python2 and python3...

 Regards,
 Arnout

>     >>>
> 
> The output with python3 is indeed what you believe will happen, but I
> don't think it is so nice:
> 
>     >>> lines
>     [b'\xbd\n']
>     >>> lines[0].decode(errors='replace')
>     '?\n'
>     >>> print('{}'.format(lines[0].decode(errors='replace')))
>     ?
> 
>     >>> 
> 
> And anyway, check-uniq file should work with python 2.7, since it is part
> of the build tools, and python 2.7 is what we require.
> 
> Regards,
> Yann E. MORIN.
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible
  2019-02-08 20:42       ` Arnout Vandecappelle
@ 2019-02-08 21:22         ` Yann E. MORIN
  2019-02-08 22:02           ` Arnout Vandecappelle
  0 siblings, 1 reply; 57+ messages in thread
From: Yann E. MORIN @ 2019-02-08 21:22 UTC (permalink / raw)
  To: buildroot

Arnout, All,

On 2019-02-08 21:42 +0100, Arnout Vandecappelle spake thusly:
> On 08/02/2019 18:25, Yann E. MORIN wrote:
> > On 2019-02-08 00:40 +0100, Arnout Vandecappelle spake thusly:
> >> On 07/01/2019 23:05, Yann E. MORIN wrote:
> >>> +def str_decode(s):
> >>> +    try:
> >>> +        return s.decode()
> >>> +    except UnicodeDecodeError:
> >>> +        return repr(s)
> >>
> >>  I think s.decode(errors='replace') is exactly what we want: it prints the
> >> question mark character for things that can't be represented, just like ls does.
[--SNIP--]
> >     >>> lines[0].decode(errors='replace')
> >     u'\ufffd\n'
> >     >>> print('{}'.format(lines[0].decode(errors='replace')))
> >     Traceback (most recent call last):
> >       File "<stdin>", line 1, in <module>
> >     UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
> 
>  Meh, Python2 unicode handling always confuses the hell out of me...
> 
>  So, to do it well, in python3 you need to do:
> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(),errors='replace'))
> 
> while in python2 the proper thing to do is
> 
> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(), \
> 	errors='replace').encode(sys.getfilesystemencoding(),errors='replace'))
> 
> (sys.getfilesystemencoding() makes sure we use the user's encoding so stuff that
> can be printed gets properly printed).
> 
>  I couldn't find a way to do the right thing both in python2 and python3...

At which point, my proposal is much simpler, and more understandable,
don't you think?

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible
  2019-02-08 21:22         ` Yann E. MORIN
@ 2019-02-08 22:02           ` Arnout Vandecappelle
  0 siblings, 0 replies; 57+ messages in thread
From: Arnout Vandecappelle @ 2019-02-08 22:02 UTC (permalink / raw)
  To: buildroot



On 08/02/2019 22:22, Yann E. MORIN wrote:
> Arnout, All,
> 
> On 2019-02-08 21:42 +0100, Arnout Vandecappelle spake thusly:
>> On 08/02/2019 18:25, Yann E. MORIN wrote:
>>> On 2019-02-08 00:40 +0100, Arnout Vandecappelle spake thusly:
>>>> On 07/01/2019 23:05, Yann E. MORIN wrote:
>>>>> +def str_decode(s):
>>>>> +    try:
>>>>> +        return s.decode()
>>>>> +    except UnicodeDecodeError:
>>>>> +        return repr(s)
>>>>
>>>>  I think s.decode(errors='replace') is exactly what we want: it prints the
>>>> question mark character for things that can't be represented, just like ls does.
> [--SNIP--]
>>>     >>> lines[0].decode(errors='replace')
>>>     u'\ufffd\n'
>>>     >>> print('{}'.format(lines[0].decode(errors='replace')))
>>>     Traceback (most recent call last):
>>>       File "<stdin>", line 1, in <module>
>>>     UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
>>
>>  Meh, Python2 unicode handling always confuses the hell out of me...
>>
>>  So, to do it well, in python3 you need to do:
>> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(),errors='replace'))
>>
>> while in python2 the proper thing to do is
>>
>> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(), \
>> 	errors='replace').encode(sys.getfilesystemencoding(),errors='replace'))
>>
>> (sys.getfilesystemencoding() makes sure we use the user's encoding so stuff that
>> can be printed gets properly printed).
>>
>>  I couldn't find a way to do the right thing both in python2 and python3...
> 
> At which point, my proposal is much simpler, and more understandable,
> don't you think?

 Absolutely. Well, it's imperfect because it prints the ugly b'....' in case
there is an non-decodable character, but it's good enough.

 Regards,
 Arnout

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2019-02-08 22:02 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 02/19] infra/pkg-generic: create $(@D) " Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 03/19] infra/pkg-generic: introduce new stampfile at the beginning of all steps Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 04/19] infra/pkg-generic: use \0 to separate .la files as they are found Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 05/19] infra/pkg-generic: tweak only .la files installed by the current package Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 06/19] infra/pkg-generic: only list " Yann E. MORIN
2019-01-08 13:07   ` Thomas De Schampheleire
2019-01-08 15:56     ` Thomas De Schampheleire
2019-01-08 19:51       ` Yann E. MORIN
2019-01-08 16:08     ` Yann E. MORIN
2019-01-08 16:55       ` Thomas De Schampheleire
2019-01-08 20:02         ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 07/19] infra/pkg-generic: offload same-package filtering to check-uniq-file Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible Yann E. MORIN
2019-02-07 23:40   ` Arnout Vandecappelle
2019-02-08 17:25     ` Yann E. MORIN
2019-02-08 20:42       ` Arnout Vandecappelle
2019-02-08 21:22         ` Yann E. MORIN
2019-02-08 22:02           ` Arnout Vandecappelle
2019-01-07 22:05 ` [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files Yann E. MORIN
2019-01-08 13:35   ` Thomas De Schampheleire
2019-01-08 16:29     ` Yann E. MORIN
2019-01-08 17:30       ` Thomas De Schampheleire
2019-01-08 20:52         ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python Yann E. MORIN
2019-01-08 14:56   ` Thomas De Schampheleire
2019-01-08 16:37     ` Yann E. MORIN
2019-01-08 17:22       ` Thomas De Schampheleire
2019-01-08 20:33         ` Yann E. MORIN
2019-01-08 20:46           ` Thomas De Schampheleire
2019-01-08 21:16             ` Yann E. MORIN
2019-01-09 14:47               ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files Yann E. MORIN
2019-01-08 15:07   ` Thomas De Schampheleire
2019-01-08 19:27     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files Yann E. MORIN
2019-01-08 15:13   ` Thomas De Schampheleire
2019-01-08 19:31     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 13/19] support/check-uniq-file: invert condition logic Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content Yann E. MORIN
2019-01-08 15:22   ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 15/19] support/check-uniq-files: use argparse to enfore required options Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step Yann E. MORIN
2019-01-08 15:24   ` Thomas De Schampheleire
2019-01-08 19:36     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 17/19] core: check for unique target files after all our cleanups Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared Yann E. MORIN
2019-01-08 15:29   ` Thomas De Schampheleire
2019-01-08 19:44     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 19/19] core: add optional failure when 2+ packages touch the same file Yann E. MORIN
2019-01-08 12:51 ` [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Thomas De Schampheleire
2019-01-08 15:53   ` Thomas De Schampheleire
2019-01-08 21:30     ` Yann E. MORIN
2019-01-09 13:39       ` Thomas De Schampheleire
2019-01-09 18:10         ` Yann E. MORIN
2019-01-10 20:34           ` Thomas De Schampheleire

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.