All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/4] lib/oe/utils: allow to set a lower bound on returned cpu_count()
@ 2020-03-03 16:05 André Draszik
  2020-03-03 16:05 ` [PATCH v2 2/4] bitbake.conf: more deterministic xz compression (threads) André Draszik
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: André Draszik @ 2020-03-03 16:05 UTC (permalink / raw)
  To: openembedded-core

This will be needed for making xz compression more deterministic,
as xz archives are created differently in single- vs multi-threaded
modes.

This means that due to bitbake's default of using as many threads
as there are cores in the system, files compressed with xz
will be different if built on a multi-core system compared to
single-core systems.

Allowing cpu_count() here to return a lower bound, will allow
forcing xz to always use multi-threaded operation.

Signed-off-by: André Draszik <git@andred.net>
---
 meta/lib/oe/utils.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/meta/lib/oe/utils.py b/meta/lib/oe/utils.py
index e350b05ddf..aee4336482 100644
--- a/meta/lib/oe/utils.py
+++ b/meta/lib/oe/utils.py
@@ -248,9 +248,10 @@ def trim_version(version, num_parts=2):
     trimmed = ".".join(parts[:num_parts])
     return trimmed
 
-def cpu_count():
+def cpu_count(at_least=1):
     import multiprocessing
-    return multiprocessing.cpu_count()
+    cpus = multiprocessing.cpu_count()
+    return max(cpus, at_least)
 
 def execute_pre_post_process(d, cmds):
     if cmds is None:
-- 
2.23.0.rc1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] bitbake.conf: more deterministic xz compression (threads)
  2020-03-03 16:05 [PATCH v2 1/4] lib/oe/utils: allow to set a lower bound on returned cpu_count() André Draszik
@ 2020-03-03 16:05 ` André Draszik
  2020-03-03 16:05 ` [PATCH v2 3/4] bitbake.conf: omit XZ threads and RAM from sstate signatures André Draszik
  2020-03-03 16:05 ` [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives André Draszik
  2 siblings, 0 replies; 7+ messages in thread
From: André Draszik @ 2020-03-03 16:05 UTC (permalink / raw)
  To: openembedded-core

xz archives can be non-deterministic / non-reproducible:
    a) archives are created differently in single- vs
       multi-threaded modes
    b) xz will scale down the compression level so as to
       be try to work within any memory limit given to
       it when operating in single-threaded mode

This means that due to bitbake's default of using as many
threads as there are cores in the system, files compressed
with xz will be different if built on a multi-core system
compared to single-core systems. They will also potentially
be different if built on single-core systems with different
amounts of physical memory, due to bitbake's default of
limiting xz's memory consumption.

Force multi-threaded operation by default, even on single-core
systems, so as to ensure archives are created in the same
way in all cases.

Signed-off-by: André Draszik <git@andred.net>
---
 meta/conf/bitbake.conf | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index e201b671bb..131ba296d3 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -795,7 +795,7 @@ BB_NUMBER_THREADS ?= "${@oe.utils.cpu_count()}"
 PARALLEL_MAKE ?= "-j ${@oe.utils.cpu_count()}"
 
 # Default parallelism and resource usage for xz
-XZ_DEFAULTS ?= "--memlimit=50% --threads=${@oe.utils.cpu_count()}"
+XZ_DEFAULTS ?= "--memlimit=50% --threads=${@oe.utils.cpu_count(at_least=2)}"
 
 ##################################################################
 # Magic Cookie for SANITY CHECK
-- 
2.23.0.rc1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] bitbake.conf: omit XZ threads and RAM from sstate signatures
  2020-03-03 16:05 [PATCH v2 1/4] lib/oe/utils: allow to set a lower bound on returned cpu_count() André Draszik
  2020-03-03 16:05 ` [PATCH v2 2/4] bitbake.conf: more deterministic xz compression (threads) André Draszik
@ 2020-03-03 16:05 ` André Draszik
  2020-03-03 16:05 ` [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives André Draszik
  2 siblings, 0 replies; 7+ messages in thread
From: André Draszik @ 2020-03-03 16:05 UTC (permalink / raw)
  To: openembedded-core

The number of threads used, and the amount of memory allowed
to be used, should not affect sstate signatures, as they
don't affect the outcome of the compression if xz operates
in multi-threaded mode [1].

Otherwise, it becomes impossible to re-use sstate from
automated builders on developer's machines (as the former
might execute bitbake with certain constraints different
compared to developer's machines).

This is in particular a problem with the opkg package writing
backend, as the OPKGBUILDCMD depends on XZ_DEFAULTS. Without
the vardepexclude, there is no re-use possible of the
package_write_ipk sstate.

Whitelist the maximum number of threads and the memory limit
given assumptions outlined in [2] below.

Signed-off-by: André Draszik <git@andred.net>

[1] When starting out in multi-threaded mode, the output is always
deterministic, as even if xz scales down to single-threaded later,
the archives are still split into blocks and size information is
still added, thus keeping them compatible with multi-threaded mode.
Also, when starting out in multi-threaded mode, xz never scales
down the compression level to accomodate memory usage restrictions,
it just scales down the number of threads and errors out if it
can not accomodate the memory limit.

[2] Assumptions
* We only support multi-threaded mode (threads >= 2), builds
  should not try to use xz in single-threaded mode
* The thread limit should be set via XZ_THREADS, not via
  modifying XZ_DEFAULTS or XZ_OPTS, or any other way
* The thread limit should not be set to xz's magic value
  zero (0), as that will lead to single-threaded mode on
  single-core systems.
---
 meta/conf/bitbake.conf | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 131ba296d3..4b544a22cd 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -795,7 +795,10 @@ BB_NUMBER_THREADS ?= "${@oe.utils.cpu_count()}"
 PARALLEL_MAKE ?= "-j ${@oe.utils.cpu_count()}"
 
 # Default parallelism and resource usage for xz
-XZ_DEFAULTS ?= "--memlimit=50% --threads=${@oe.utils.cpu_count(at_least=2)}"
+XZ_MEMLIMIT ?= "50%"
+XZ_THREADS ?= "${@oe.utils.cpu_count(at_least=2)}"
+XZ_DEFAULTS ?= "--memlimit=${XZ_MEMLIMIT} --threads=${XZ_THREADS}"
+XZ_DEFAULTS[vardepsexclude] += "XZ_MEMLIMIT XZ_THREADS"
 
 ##################################################################
 # Magic Cookie for SANITY CHECK
-- 
2.23.0.rc1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives
  2020-03-03 16:05 [PATCH v2 1/4] lib/oe/utils: allow to set a lower bound on returned cpu_count() André Draszik
  2020-03-03 16:05 ` [PATCH v2 2/4] bitbake.conf: more deterministic xz compression (threads) André Draszik
  2020-03-03 16:05 ` [PATCH v2 3/4] bitbake.conf: omit XZ threads and RAM from sstate signatures André Draszik
@ 2020-03-03 16:05 ` André Draszik
  2020-03-03 16:08   ` André Draszik
  2 siblings, 1 reply; 7+ messages in thread
From: André Draszik @ 2020-03-03 16:05 UTC (permalink / raw)
  To: openembedded-core

xz suffers from a reproducibility problem when not using multi-
threaded mode:
a) archives are created differently in single- vs multi-threaded
   modes
b) xz will scale down the compression level so as to be able to
   work within any memory limit given to it when being launched
  in single-threaded mode.

Thus, for reproducible xz archives we need to launch xz with
at least two threads.

Add a little sanity test, and error out otherwise, so as to
guarantee no difference due this fact.

Assumptions:
* The thread limit should be set via XZ_THREADS, not via
  modifying XZ_DEFAULTS or XZ_OPTS, or any other way
* The thread limit should not be set to xz's magic value
  zero (0), as that will lead to single-threaded mode on
  single-core systems

This patch here doesn't prevent people from shooting themselves
into the foot by changing XZ_DEFAULTS to change the number
of threads directly, but it's can serve as a hint at least.

Signed-off-by: André Draszik <git@andred.net>
---
 meta/classes/reproducible_build.bbclass | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/meta/classes/reproducible_build.bbclass b/meta/classes/reproducible_build.bbclass
index 750eb950f2..e07bef87d8 100644
--- a/meta/classes/reproducible_build.bbclass
+++ b/meta/classes/reproducible_build.bbclass
@@ -35,6 +35,7 @@
 # SOURCE_DATE_EPOCH is set for all tasks that might use it (do_configure, do_compile, do_package, ...)
 
 BUILD_REPRODUCIBLE_BINARIES ??= '1'
+BUILD_REPRODUCIBLE_XZ_ARCHIVES ??= '1'
 inherit ${@oe.utils.ifelse(d.getVar('BUILD_REPRODUCIBLE_BINARIES') == '1', 'reproducible_build_simple', '')}
 
 SDE_DIR ="${WORKDIR}/source-date-epoch"
@@ -198,4 +199,8 @@ BB_HASHBASE_WHITELIST += "SOURCE_DATE_EPOCH"
 python () {
     if d.getVar('BUILD_REPRODUCIBLE_BINARIES') == '1':
         d.appendVarFlag("do_unpack", "postfuncs", " do_create_source_date_epoch_stamp")
+
+        if d.getVar('BUILD_REPRODUCIBLE_XZ_ARCHIVES') == '1':
+            if int(d.getVar('XZ_THREADS')) < 2:
+                bb.fatal("Can not build reproducible XZ archives without threading")
 }
-- 
2.23.0.rc1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives
  2020-03-03 16:05 ` [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives André Draszik
@ 2020-03-03 16:08   ` André Draszik
  2020-03-04  0:31     ` Otavio Salvador
  0 siblings, 1 reply; 7+ messages in thread
From: André Draszik @ 2020-03-03 16:08 UTC (permalink / raw)
  To: openembedded-core

On Tue, 2020-03-03 at 16:05 +0000, André Draszik wrote:
> xz suffers from a reproducibility problem when not using multi-
> threaded mode:
> a) archives are created differently in single- vs multi-threaded
>    modes
> b) xz will scale down the compression level so as to be able to
>    work within any memory limit given to it when being launched
>   in single-threaded mode.
> 
> Thus, for reproducible xz archives we need to launch xz with
> at least two threads.
> 
> Add a little sanity test, and error out otherwise, so as to
> guarantee no difference due this fact.
> 
> Assumptions:
> * The thread limit should be set via XZ_THREADS, not via
>   modifying XZ_DEFAULTS or XZ_OPTS, or any other way
> * The thread limit should not be set to xz's magic value
>   zero (0), as that will lead to single-threaded mode on
>   single-core systems
> 
> This patch here doesn't prevent people from shooting themselves
> into the foot by changing XZ_DEFAULTS to change the number
> of threads directly, but it's can serve as a hint at least.

I don't know if this patch is useful, feel free to drop it.

In an ideal world, it'd parse the output of xz --verbose --verbose, to catch
all possible ways people might be adjusting the thread limit, but that's
non-trivial.


Cheers,
Andre'

> 
> Signed-off-by: André Draszik <git@andred.net>
> ---
>  meta/classes/reproducible_build.bbclass | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/meta/classes/reproducible_build.bbclass b/meta/classes/reproducible_build.bbclass
> index 750eb950f2..e07bef87d8 100644
> --- a/meta/classes/reproducible_build.bbclass
> +++ b/meta/classes/reproducible_build.bbclass
> @@ -35,6 +35,7 @@
>  # SOURCE_DATE_EPOCH is set for all tasks that might use it (do_configure, do_compile, do_package, ...)
>  
>  BUILD_REPRODUCIBLE_BINARIES ??= '1'
> +BUILD_REPRODUCIBLE_XZ_ARCHIVES ??= '1'
>  inherit ${@oe.utils.ifelse(d.getVar('BUILD_REPRODUCIBLE_BINARIES') == '1', 'reproducible_build_simple', '')}
>  
>  SDE_DIR ="${WORKDIR}/source-date-epoch"
> @@ -198,4 +199,8 @@ BB_HASHBASE_WHITELIST += "SOURCE_DATE_EPOCH"
>  python () {
>      if d.getVar('BUILD_REPRODUCIBLE_BINARIES') == '1':
>          d.appendVarFlag("do_unpack", "postfuncs", " do_create_source_date_epoch_stamp")
> +
> +        if d.getVar('BUILD_REPRODUCIBLE_XZ_ARCHIVES') == '1':
> +            if int(d.getVar('XZ_THREADS')) < 2:
> +                bb.fatal("Can not build reproducible XZ archives without threading")
>  }



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives
  2020-03-03 16:08   ` André Draszik
@ 2020-03-04  0:31     ` Otavio Salvador
  2020-03-04  6:03       ` Richard Purdie
  0 siblings, 1 reply; 7+ messages in thread
From: Otavio Salvador @ 2020-03-04  0:31 UTC (permalink / raw)
  To: André Draszik; +Cc: Patches and discussions about the oe-core layer

On Tue, Mar 3, 2020 at 1:08 PM André Draszik <git@andred.net> wrote:
> On Tue, 2020-03-03 at 16:05 +0000, André Draszik wrote:
> In an ideal world, it'd parse the output of xz --verbose --verbose, to catch
> all possible ways people might be adjusting the thread limit, but that's
> non-trivial.

Couldn't we just "enforce" at least two threads? It is quite unlikely
we ever use OE on a single core machine (as it'd take few years to
finish the build hehe) it seems like a reasonable assumption.


-- 
Otavio Salvador                             O.S. Systems
http://www.ossystems.com.br        http://code.ossystems.com.br
Mobile: +55 (53) 9 9981-7854          Mobile: +1 (347) 903-9750


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives
  2020-03-04  0:31     ` Otavio Salvador
@ 2020-03-04  6:03       ` Richard Purdie
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Purdie @ 2020-03-04  6:03 UTC (permalink / raw)
  To: Otavio Salvador, André Draszik
  Cc: Patches and discussions about the oe-core layer

On Tue, 2020-03-03 at 21:31 -0300, Otavio Salvador wrote:
> On Tue, Mar 3, 2020 at 1:08 PM André Draszik <git@andred.net> wrote:
> > On Tue, 2020-03-03 at 16:05 +0000, André Draszik wrote:
> > In an ideal world, it'd parse the output of xz --verbose --verbose, 
> > to catch
> > all possible ways people might be adjusting the thread limit, but
> > that's
> > non-trivial.
> 
> Couldn't we just "enforce" at least two threads? It is quite unlikely
> we ever use OE on a single core machine (as it'd take few years to
> finish the build hehe) it seems like a reasonable assumption.

An earlier patch does, unless you actually set XZ_THREADS = "1". If you
do that, things are still reproducible in that the output will be
consistent, just not with any other value of XZ_THREADS. So I think we
should be fine without this patch.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-04  6:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03 16:05 [PATCH v2 1/4] lib/oe/utils: allow to set a lower bound on returned cpu_count() André Draszik
2020-03-03 16:05 ` [PATCH v2 2/4] bitbake.conf: more deterministic xz compression (threads) André Draszik
2020-03-03 16:05 ` [PATCH v2 3/4] bitbake.conf: omit XZ threads and RAM from sstate signatures André Draszik
2020-03-03 16:05 ` [PATCH v2 4/4] reproducible: try to ensure reproducible xz archives André Draszik
2020-03-03 16:08   ` André Draszik
2020-03-04  0:31     ` Otavio Salvador
2020-03-04  6:03       ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.