All of lore.kernel.org
 help / color / mirror / Atom feed
From: Herve Codina <herve.codina@bootlin.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH 13/15] Makefile: Rsync global {TARGET, HOST}_DIR using exclusion file list
Date: Fri, 25 Jun 2021 13:59:51 +0200	[thread overview]
Message-ID: <20210625135951.2fb55ea7@fedora> (raw)
In-Reply-To: <20210624203410.GC2852@scaer>

Hi,

On Thu, 24 Jun 2021 22:34:10 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> Herv?, All,
> 
> Additional revview I forgot previously...
> 
> On 2021-06-24 22:20 +0200, Yann E. MORIN spake thusly:
> > On 2021-06-21 16:11 +0200, Herve Codina spake thusly:  
> > > On a per-package build, rsync final {TARGET,HOST}_DIR using
> > > exclusion file list computed for each packages.
> > > As we rsync only files generated by each packages, we need to
> > > to do this rsync recursively on each dependencies to collect
> > > all needed files. This is done based on existing
> > > <PKG>_FINAL_RECURSIVE_DEPENDENCIES  
> > I am not sure I understand why this is needed...
> > 
> > One thing that comes to mind is speedup. Indeed, now that we can ensure
> > that packages only install new files and don;t change existing files, we
> > can speedup the final aggregation by just handling the new files.
> > 
> > But since we are using hardlinks, the aggregation is rather fast...
> > 
> > Can you expand on the rationale in the commit log when you respin,
> > please?  
> 
> Now I have seen the next patch, it makes sense. Yet, it should be
> explained here nonetheless. ;-)

The next patch just breaks the hardlinks had nothing to do with
the use of <PKG>_FINAL_RECURSIVE_DEPENDENCIES here.

Previously files were collected using a simple rsync and iterating on
$(PACKAGES). The rsync collected all files from current package host or target
so it collected files generated by the current package and the packages it
depends on.

Now, we collect only files generated by the current package and we discard
files generated by the packages it depends on.
Iterating on $(PACKAGES) is no longer enough.

Indeed $(PACKAGES) is the list of packages that are enabled in .config file (ie
selected during 'make menuconfig' or defconfig file)
But some packages (sure for host packages) can have some "hidden" dependencies
by hidden I means not seen or selected by config files.
This "hidden" dependencies are created at Makefile level.

HOST_FOO_DEPENDENCIES = host-bar

We need to collect files from this kind on dependency to generate the final
HOST_DIR.

<PKG>_FINAL_RECURSIVE_DEPENDENCIES contains all makefile level dependencies
a given package depends on.

Finally, we do:
for pkg selected in .config ($(PACKAGES))
        # Add current pkg and packages it depends on at makefile level
        FULL_PACKAGES_AND_DEPENDS += pkg + $(<PKG>_FINAL_RECURSIVE_DEPENDENCIES)
And then we call sort $(FULL_PACKAGES_AND_DEPENDS) to remove duplicates


I can add in the commit log message:
---- 8< ----
Using only packages present in $(PACKAGES) is no longer enough.
Indeed, we collect only files generated for a given package
and $(PACKAGES) contains the list of packages selected in .config file
Some dependencies exists at Makefile level that are no present@.config
file level (at least for host packages).
HOST_FOO_DEPENDENCIES += host-bar
To take into account these dependencies, <PKG>_FINAL_RECURSIVE_DEPENDENCIES
is used in addition to $(PACKAGES).
---- 8< ----

Is that ok for you ?

> 
> [--SNIP--]
> > > +	$(foreach pkg,$(1),\
> > > +		rsync -a --link-dest=$(PER_PACKAGE_DIR)/$(pkg)/$(strip $(2))/ \
> > > +		--filter='merge $($(call UPPERCASE,$(pkg))_DIR)/$(strip $(4))' \
> > > +		$(PER_PACKAGE_DIR)/$(pkg)/$(strip $(2))/ \
> > > +		$(3)$(sep))  
> [--SNIP--]
> > Furthermore, this rsync filter is non-obvious. It would be nice to
> > explain the commit log that the list is already a proper filer, not just
> > a list.
> > Additionally, in the commit that generates that list, the commit log
> > should also mention that a list of filters is created, not a list of
> > filenames.  
> 
> And to begin with, why are we generating an _exclude_ list, rather than
> an _include_ list?
> 
> I.e. rather than rsync everything except what we don't want, why not
> just rsync just what we want, and generate a list of filters about
> exactly just the files installed by a package?
> 
> I.e. in previous patch, crea create '+' filters rather than '-' filters.
> Also, I think using the full-length filter 'include' is nicer than the
> shorter '+'.
> 
> Probably something like:
> 
>     LC_ALL=C comm -1 \
>         $($(PKG)_DIR)/.files-final-rsync$(2).before \
>         $($(PKG)_DIR)/.files-final-rsync$(2).after \
>         | sed -r -e 's/^[^,]+,./include /' \
>         > $($(PKG)_DIR)/.files-final-rsync$(2).exclude_rsync  
> 
> (Totally untested, slippery code, excersise caution.)
> 

Inclusion filter is not so simple ...

In rsync man, we can read:
---- 8< ----
Note  that,  when  using the --recursive (-r) option (which is implied by -a),
every subdir component of every path is visited left to right, with each
directory having a chance for exclusion before its content.  In this way
include/exclude patterns are applied recursively to the pathname of each node in
the filesystem's tree (those  inside  the  transfer).
The exclude patterns short-circuit the directory traversal
stage as rsync finds the files to send.

For  instance,  to include "/foo/bar/baz", the directories "/foo" and
"/foo/bar" must not be excluded.  Excluding one of those parent directories
prevents the examination of its content, cutting off rsync's recursion into
those paths and rendering the include for "/foo/bar/baz" ineffectual (since rsync
can't match something it never sees in the cut-off section of the directory
hierarchy).

The concept path exclusion is particularly important when using a trailing '*' rule.
For instance, this won't work:

           + /some/path/this-file-will-not-be-found
           + /file-is-included
           - *

This  fails because the parent directory "some" is excluded by the '*' rule, so
rsync never visits any of the files in the "some" or "some/path" directories.
One solution is to ask for all directories in the hierarchy to be included by
using a single rule: "+ */" (put it somewhere before the "- *" rule), and perhaps
use the  --prune-empty-dirs  option.
Another solution is to add specific include rules for all the parent dirs that
need to be visited.  For instance, this set of rules works fine:

           + /some/
           + /some/path/
           + /some/path/this-file-is-found
           + /file-also-included
           - *

---- 8< ----


I feel tricky and error prone to use include filter and really simpler to take the
problem on the other side.
Instead of including what we want, exclude what we don't want.

The filter file passed to rsync is as simple as:
  - /foo/bar/not_thisone
  - /foo/not_thisone_too
  - /not_thisone

With this filter file, rsync will include all files that are not mentioned without
any more needs.


Regards,
Herv?

-- 
Herv? Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

  reply	other threads:[~2021-06-25 11:59 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 14:11 [Buildroot] [PATCH 00/15] Overwritten file detection and fixes, one more step to TLP build Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 01/15] package/pkg-generic.mk: detect files overwritten in TARGET_DIR and HOST_DIR Herve Codina
2021-06-21 21:31   ` Yann E. MORIN
2021-06-22  7:40     ` Herve Codina
2021-06-22  9:30       ` Thomas Petazzoni
2021-06-22  9:57         ` Nicolas Cavallari
2021-06-22 10:24           ` Yann E. MORIN
2021-06-24 14:09             ` Herve Codina
2021-06-24 16:18               ` Yann E. MORIN
2021-06-21 14:11 ` [Buildroot] [PATCH 02/15] package/e2fsprogs: fix fsck overwrite in HOST_DIR Herve Codina
2021-06-21 20:52   ` Thomas Petazzoni
2021-06-22 16:26     ` Herve Codina
2021-06-22 19:40   ` Yann E. MORIN
2021-06-24 14:13     ` Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 03/15] package/pkg-generic.mk: Remove Info documents dir entry Herve Codina
2021-06-21 20:51   ` Thomas Petazzoni
2021-06-22  8:43     ` Herve Codina
2021-06-22  9:34       ` Thomas Petazzoni
2021-06-22 20:18         ` Yann E. MORIN
2021-06-24 15:03           ` Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 04/15] package/pkg-generic.mk: Fix .la files overwrite detection Herve Codina
2021-06-21 20:54   ` Thomas Petazzoni
2021-06-22 18:01     ` Herve Codina
2021-06-21 21:42   ` Yann E. MORIN
2021-06-22  9:31     ` Herve Codina
2021-06-22  9:56       ` Yann E. MORIN
2021-06-22 10:12         ` Thomas Petazzoni
2021-06-22 10:30           ` Yann E. MORIN
2021-06-24 15:44             ` Herve Codina
2021-06-24 16:22               ` Yann E. MORIN
2021-06-21 14:11 ` [Buildroot] [PATCH 05/15] package/pkg-generic.mk: Perform .la files fixup in per-package HOST_DIR Herve Codina
2021-06-21 20:48   ` Thomas Petazzoni
2021-06-22  9:38     ` Herve Codina
2021-06-22 10:12   ` Yann E. MORIN
2021-06-24 16:20     ` Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 06/15] package/pkg-generic.mk: Introduce <PKG>_PER_PACKAGE_TWEAK_HOOKS Herve Codina
2021-06-21 15:10   ` Thomas Petazzoni
2021-06-22 20:39   ` Yann E. MORIN
2021-06-23 12:40     ` Thomas Petazzoni
2021-06-25  7:15       ` Herve Codina
2021-06-25  7:21     ` Herve Codina
2021-07-02  7:18       ` Herve Codina
2021-07-03  6:21         ` Yann E. MORIN
2021-06-21 14:11 ` [Buildroot] [PATCH 07/15] package/apr-util: Use <PKG>_PER_PACKAGE_TWEAK_HOOKS Herve Codina
2021-06-21 20:56   ` Thomas Petazzoni
2021-06-22  9:47     ` Herve Codina
2021-06-22 20:42       ` Yann E. MORIN
2021-06-25  7:30         ` Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 08/15] package/apache: Move APACHE_FIXUP_APR_LIBTOOL to <PKG>_PER_PACKAGE_TWEAK_HOOKS Herve Codina
2021-06-22 20:43   ` Yann E. MORIN
2021-06-21 14:11 ` [Buildroot] [PATCH 09/15] package/pkg-python: Remove _sysconfigdata*.pyc files when _sysconfigdata*.py are changed Herve Codina
2021-06-21 15:12   ` Thomas Petazzoni
2021-06-22 17:52     ` Herve Codina
2021-06-22 20:50       ` Yann E. MORIN
2021-06-25  8:04         ` Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 10/15] package/pkg-generic.mk: Move python fixup to generic package infrastructure Herve Codina
2021-06-22 21:01   ` Yann E. MORIN
2021-06-25  8:22     ` Herve Codina
2021-06-25  9:27       ` Yann E. MORIN
2021-06-21 14:11 ` [Buildroot] [PATCH 11/15] package/owfs: Remove Python sysconfigdata fixup Herve Codina
2021-06-22 21:02   ` Yann E. MORIN
2021-06-25  8:25     ` Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 12/15] package/pkg-generic.mk: Generate final rsync exclude file list Herve Codina
2021-06-22 21:15   ` Yann E. MORIN
2021-06-25  9:05     ` Herve Codina
2021-06-25  9:32       ` Yann E. MORIN
2021-06-21 14:11 ` [Buildroot] [PATCH 13/15] Makefile: Rsync global {TARGET, HOST}_DIR using exclusion " Herve Codina
2021-06-24 20:20   ` Yann E. MORIN
2021-06-24 20:34     ` Yann E. MORIN
2021-06-25 11:59       ` Herve Codina [this message]
2021-06-25 12:50         ` Yann E. MORIN
2021-06-25 12:00     ` Herve Codina
2021-06-21 14:11 ` [Buildroot] [PATCH 14/15] Makefile: Breaks hardlinks in global {TARGET, HOST}_DIR on per-package build Herve Codina
2021-06-24 20:22   ` Yann E. MORIN
2021-06-21 14:11 ` [Buildroot] [PATCH 15/15] package/pkg-generic.mk: Fix per-package <pkg>-{reconfigure, rebuild, reinstall} Herve Codina
2021-06-24 20:44   ` Yann E. MORIN
2021-06-25 14:00     ` Herve Codina
2021-06-21 20:42 ` [Buildroot] [PATCH 00/15] Overwritten file detection and fixes, one more step to TLP build Arnout Vandecappelle
2021-07-06 14:15   ` Herve Codina
2021-06-24 20:53 ` Yann E. MORIN
2021-06-25  9:08 ` Andreas Naumann
2021-06-25 13:13   ` Herve Codina
2021-06-25 14:55     ` Andreas Naumann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210625135951.2fb55ea7@fedora \
    --to=herve.codina@bootlin.com \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.